WO2005031515A2 - Software and data file updating process - Google Patents

Software and data file updating process Download PDF

Info

Publication number
WO2005031515A2
WO2005031515A2 PCT/US2004/031079 US2004031079W WO2005031515A2 WO 2005031515 A2 WO2005031515 A2 WO 2005031515A2 US 2004031079 W US2004031079 W US 2004031079W WO 2005031515 A2 WO2005031515 A2 WO 2005031515A2
Authority
WO
WIPO (PCT)
Prior art keywords
file
checking data
blocks
seed
target file
Prior art date
Application number
PCT/US2004/031079
Other languages
French (fr)
Other versions
WO2005031515A3 (en
Inventor
David Woodhouse
Original Assignee
Red Hat, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Red Hat, Inc. filed Critical Red Hat, Inc.
Priority to EP04784791A priority Critical patent/EP1678572A4/en
Publication of WO2005031515A2 publication Critical patent/WO2005031515A2/en
Publication of WO2005031515A3 publication Critical patent/WO2005031515A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/60Software deployment
    • G06F8/65Updates
    • G06F8/658Incremental updates; Differential updates

Definitions

  • This invention relates to techniques for updating computer files such as those pertaining to software programs and data and, more particularly, to the use of cached checking data to improve the efficiency of file updating processes.
  • rsync the "RSYNC” algorithm
  • rsync has proven to be extremely useful in comparing files whose content differs only partially.
  • rsync compares an original or "seed" file at a client computer with a revised or "target” file at a server and “notices” differences between the two files using checking data (e.g., checksums and the like).
  • checking data e.g., checksums and the like.
  • rsync identifies these differences by generating checking data for blocks of the seed file at the client, which it uses to compare against checking data for blocks of the target file generated at the server. Matches in checking data indicate identical blocks, while differences suggest that changes have been made.
  • Rsync then downloads only those parts of the target file that are actually new, while using any parts of the seed file that are unchanged from the target.
  • One drawback of the rsync algorithm is that generating the checking data at the server requires a large amount of processing by the server CPU. Thus, the server CPU may become overloaded when any more than just a few clients attempt to run the rsync algorithm. In these cases, the network bandwidth overload sought to be addressed by rsync is replaced with a CPU processing overload resulting in negligible improvements in the situation.
  • compression Another technique that is commonly used in the downloading of data, sometimes in conjunction with comparator algorithms like rsync, includes compression. Basically speaking, compression recognizes and eliminates redundancy in the data (i.e., repetitive or identical patterns of bits) to allow reductions in the amount of data to be stored or transmitted. Compression algorithms operate by generating a "history" associated with a piece of repetitive data. These histories are then referred to each time the repetitions are encountered to create a compressed form of the data.
  • the present invention provides a technique for updating software and data files using cached checking data to improve the efficiency of updating processes such as rsync and other similar algorithms.
  • the technique is applicable in situations where checking data are used to identify differences in a seed file located at a client and a target file located at a server.
  • the technique of the present invention involves having the server generate target file checking data for one or more blocks of the target file. These target file checking data are then stored in a cache or other high-speed memory of the server, where it may be easily and rapidly accessed during an updating process. Subsequently, the client generates seed file checking data corresponding to one or more blocks of the seed file.
  • the server compares the seed file checking data with the target file checking data stored in memory to identify any differences between blocks of the seed file and blocks of the target file. If differences between the seed file and the target file are identified, the server transmits information to the client for revising the seed file blocks that are different from the target file blocks in a manner such that the seed file blocks match the target file blocks.
  • Figure 1 is a block diagram illustrating an example of a networked computer system utilizable for implementing an updating process of the present invention.
  • Figure 2 depicts one example of a package file utilizable for transmitting software updates to a client computer.
  • Figure 3 depicts one example of a process utilizable for controlling operation of a client computer in conjunction with the updating process of the present invention.
  • Figure 4 depicts one example of a process utilizable for controlling operation of a server computer in conjunction with the updating process of the present invention.
  • Figure 5 depicts one example of an updating process of the present invention.
  • FIG. 1 is a block diagram illustrating an example of a networked computer system 100 utilizable for implementing an updating process of the present invention.
  • system 100 includes a number of remote client systems (clients 110), which in this embodiment may include personal computers, laptop computers, personal digital assistants, cellular telephones, and other computing devices.
  • clients 110 includes, among other components, a processing unit (processor 114) and a number of memory storage devices (storage device 118).
  • storage device 118 includes, for example, any type of permanent or semi-permanent storage (e.g., hard disc drives, memory cards, and the like) and may be used to store any number of software programs or data files 122, which from time- to-time may need to be updated or revised.
  • permanent or semi-permanent storage e.g., hard disc drives, memory cards, and the like
  • storage device 118 may be used to store any number of software programs or data files 122, which from time- to-time may need to be updated or revised.
  • clients 110 may be linked to a server system or server 150 via a data communications network 130 to facilitate the transmission of information and data between clients 110 and server 150.
  • network 130 may include the Internet or other similar data networks, such as private WANs, LANs and the like.
  • server 150 may represent, for example, a software manufacturer or vendor system from which software revisions may be distributed to clients 110. Like clients 110, server 150 also includes a processor 154 and a number of memory storage devices (e.g., hard disc and the like) (storage device 158). Storage device 158, in many cases, stores software programs or data files 162, which represent newer versions of corresponding programs or files implemented on clients 110. In addition, storage device 158 may also include a high-speed memory such as cache memory 166.
  • cache memory 166 high-speed memory
  • files and programs 122 stored on clients 110 may be updated in a manner such that they match newer versions (e.g., file 162) stored on
  • file 162 may represent a newer version of a software program offered by a software vendor or manufacturer to customers or owners of older versions of the program (e.g., program 122).
  • the newer versions of the program may be available for downloading from a server computer associated with the vendor (e.g., server 150) to the client computers (e.g., clients 110) via the Internet (e.g., network 130).
  • server 150 may perform a number of process steps to prepare for and execute the updating process of the present invention.
  • the files are typically implemented in conventional packages or packets which include a header and a payload (i.e., a compound file).
  • a header i.e., a compound file
  • Figure 2 depicts one example of such a package which includes header 210 and payload 220.
  • the payload includes the bulk of the file (e.g., the compressed software executables and the like)
  • the header is used to identify the package and includes the package name, its version, the other packages on which it depends, and the like.
  • One specific example of a packaging system which may be utilized to compile such a package includes the RPM Package Manager, implementable for use with systems offered by Red Hat, Inc. of Raleigh, NC.
  • RPM Package Manager implementable for use with systems offered by Red Hat, Inc. of Raleigh, NC.
  • FIG. 2 focus on the examples provided in this description focus on the RPM Package Manager format (see, e.g., the example depicted in Figure 2), it is to be understood that at least some embodiments of the present invention contemplate the use of other package formats, including those that omit header files as well as those that do not require the use of compression techniques.
  • FIG. 3 one example of a process utilizable for controlling operation of a client 104 in conjunction with the updating process of the present invention is depicted.
  • client 104 downloads from server 150 a file containing a number of header files or headers (STEP 304).
  • this downloaded header file in some embodiments, may be used to identify the newer version of the program or file to be implemented onto client 104.
  • the present invention makes use of a technique or algorithm that compares and identifies the differences between the program or file to be updated (e.g., file 122) and a newer version (e.g., file 162) (which, as discussed above, may be identified by, for example, the information included in the previously downloaded header file).
  • the algorithm then downloads only the differences between the two files.
  • the algorithm compares a 'seed' file, which includes the file or files to be updated (i.e., file 122), against a 'target' file, which includes the file or files to be implemented on client 104 (i.e., file 162), to identify any differences.
  • a 'seed' file which includes the file or files to be updated (i.e., file 122)
  • a 'target' file which includes the file or files to be implemented on client 104 (i.e., file 162)
  • a seed file is prepared and separated into a number of blocks (STEP 308). Although any size and any number of blocks are possible, the seed file is typically separated into a series of non-overlapping fixed-sized blocks of between 500-1000 bytes. As will be discussed below, the present invention contemplates identifying differences between these blocks and similarly sized blocks of the target file, and subsequently revising any blocks with differences such that they match corresponding blocks of the target file.
  • checking data are generated for each of the seed file blocks (STEP 312). These checking data are indicative of the content of an associated block, and may be used to identify the existence of changes in a block of data. For example, identical checking data between two versions of a block may suggest that the content of the block has not changed. On the other hand, different checking data may suggest that the block has undergone some type of revision. In many cases the checking data may include a simple error-detection scheme in which each block is associated with a numerical or other value based on the number of set bits in the block. Thus, if the numerical value is not the same in two blocks, it can be assumed that the content of the blocks also is not the same.
  • the checking data may include checksums associated with portions of the seed and target file blocks.
  • multiple layers of checking data may be utilized. For instance, checking data that are relatively easy and quick to calculate may be generated for use in determining near or potential matches (i.e., weak or rolling checking data). Checking data that provide a stronger indication of a match but are more time consuming to calculate (i.e., strong checking data) may be generated to definitively confirm matches between only those blocks that have matching weak checking data. Blocks having identical sets of strong checking data may then be deemed to be identical.
  • weak checking data may include a 32-bit checksum and strong checking data may include a 128-bit checksum.
  • FIG. 316 After generating the checking data for the seed file, client 104 establishes a connection with server 150 (STEP 316). Subsequently, the process executes an updating process, one example of which is depicted below with reference to Figure 5, for identifying and downloading the differences between the seed and target files (STEP 320). Once this updating process has been completed, the updated file is compressed using any suitable compression technique (STEP 324). From there, the file is appended to a corresponding header file previously downloaded, for example, in STEP 304 (STEP 328), thereby resulting in an updated copy of the file or files.
  • Figure 4 depicts one example of a process utilizable for controlling operation of a server computer in conjunction with the updating process of the present invention.
  • server 150 retrieves file 162 from memory (STEP 404) and begins preparing a target file which will be compared against a seed file during the updating process.
  • file 162 may resemble, for example, a typical RPM file (see, e.g., Figure 2), and may include a header and a payload (which may or may not be compressed). Because the payload of file 162 constitutes the target file, it is separated from the header (STEP 408), and subsequently decompressed (STEP 412). [0028] Once the payload of file 162 (i.e., the target file) has been decompressed, checking data for the target file may be generated.
  • checking data may be generated for a number of blocks of the target file for use in comparing against blocks of the seed file.
  • this checking data may include checksums associated with portions of the target file blocks.
  • several layers of checking data may be generated, including either or both of weak (e.g., a 32-bit checksum) and strong checking data (e.g., a 128-bit checksum) (STEP 416 and STEP 420).
  • the strong checking data are relatively expensive to generate in the sense that they require a significant amount of processing time to calculate.
  • the weak checking data while easier to generate, nevertheless also require a significant amount of processing resources.
  • the present invention contemplates storing these sets of checking data in a memory such as cache 166, where they may be analyzed without significantly affecting the performance of the server.
  • both the weak and the strong checking data are cached.
  • since updates occur in limited areas of a file it is possible to predict which blocks will match.
  • only those sets of strong checking data expected to match are cached. Without having to generate strong checking data for all of the blocks, additional time savings may be realized.
  • the preparation of the target file may take place at any time.
  • the process occurs during initial attempts to update client versions of program or data files.
  • Other embodiments of the present invention contemplate performing the process before updating procedures are initiated by client systems.
  • FIG. 5 depicts one example of an updating process of the present invention.
  • server 150 may receive both the weak checking data, as well as the strong checking data associated with a block generated by one of clients 104.
  • the checking data associated with the target file blocks are retrieved from memory 166 (STEP 508).
  • this includes both weak checking data as well as strong checking data.
  • only the strong checking data expected to match are generated and stored.
  • only the strong checking data expected to match are retrieved along with the weak checking data.
  • Server 150 stores this checking data in any suitable data structure, such as for example any number of hash tables and the like, where the data may be easily retrieved for comparison.
  • the weak checking data for a block of the seed file received from one of clients 104 are compared against the weak checking data for the blocks of the target file (STEP 516). If the weak checking data for the seed and target file do not match, the process determines that the block of the seed file being analyzed has been revised and must be updated. In these situations, the process downloads the revised block from the target file of server 150 (STEP 528). The block of the seed file is then replaced with this newly downloaded block (STEP 532).
  • instructions for constructing an accurate copy of the target file block may instead be transmitted. The details involved with downloading revised blocks as well as instructions for constructing an accurate copy of the block may be found in Andrew
  • the process determines that a match exists between the weak checking data of the seed and target blocks (STEP 516)
  • the strong checking data in this embodiment is analyzed to confirm, more definitively, whether a match exists.
  • the strong checking data for the block of the seed file are compared against strong checking data for the block of the target file (STEP 524) to confirm whether the blocks are identical.
  • the checking data corresponding to each of the blocks of the seed file may be transmitted and received by server 150 prior to any comparisons. In these embodiments, processing returns to
  • STEP 512 rather than STEP 504.
  • the updating process of the present invention may be implemented in a variety of forms, such as in software or firmware, running on a general purpose computer or a specialized device.
  • the code can be provided in any machine-readable medium, including magnetic or optical disk, or in memory.
  • the present invention is utilizable in conjunction with computer system that operates software which may require periodic updates, as well as any operating system (e.g., Linux,

Abstract

A file updating process where a seed file (122) is to be updated or revised to match a target file (162) utilizes cached checking data to increase efficiency. Initially, target file checking data for one or more blocks of the target file are generated. These target file checking data are then stored to cache memory. In a similar manner, seed file checking data corresponding to one or more blocks of the seed file are generated. Then, during the updating process, the seed file checking data are compared with the target file checking data stored in memory to identify any differences between blocks of the seed file and blocks of the target file. If any differences are identified, the old seed file blocks are replaced with newly downloaded target file blocks. Alternatively, the old seed file blocks may be reconstructed in a manner such that they match the target file blocks.

Description

SOFTWARE AND DATA FILE UPDATING PROCESS
Field of the Invention
[0001] This invention relates to techniques for updating computer files such as those pertaining to software programs and data and, more particularly, to the use of cached checking data to improve the efficiency of file updating processes.
Background of the Invention
[0002] The "updating" or changing of software program files and data files is a normal process in computer science. For instance, updates or revisions to software programs and other files are routinely required to eliminate bugs found during sage or to add newly developed features. Sometimes these revisions may be relatively minor, involving changes in only a small percentage of the data that makes up the file. In other cases, the revisions may be much more extensive and require additional updating technique steps.
[0003] One way to update these files involves creating a completely new file containing all of the desired changes. These new files may then be distributed to the users to replace the existing files. In addition to physically distributing the files using floppy discs, CDs or DVDs, these relatively large files may be distributed from the software manufacturers to the users via a data communications network such as the Internet.
[0004] One obstacle to the frequent revision of large computer files by a manufacturer is the cost of delivering the updated file to the user. With new revised files, the amount of data can be substantial. For example, large files typically are as large as ten million characters (10 Megabytes) or larger. The distribution of such large files over a medium such as the Internet can take an undesirably long time from the point of view of the customer and can consume a large amount of server resources from the point of view of the file provider.
[0005] One solution to the problem of distributing large computer files over networks such as the Internet is the use of differencing programs or comparator algorithms. These applications compare an old file to a new revised file in order to determine how the files differ. Once identified, only the differences between the two files are transmitted.
[0006] One example of such a technique includes the "RSYNC" algorithm ("rsync"), which is utilizable with any conventional operating system including, for example, UNIX-like and Microsoft Windows operating systems. Rsync has proven to be extremely useful in comparing files whose content differs only partially. Generally speaking, rsync compares an original or "seed" file at a client computer with a revised or "target" file at a server and "notices" differences between the two files using checking data (e.g., checksums and the like). Specifically, rsync identifies these differences by generating checking data for blocks of the seed file at the client, which it uses to compare against checking data for blocks of the target file generated at the server. Matches in checking data indicate identical blocks, while differences suggest that changes have been made. Rsync then downloads only those parts of the target file that are actually new, while using any parts of the seed file that are unchanged from the target.
[0007] One drawback of the rsync algorithm is that generating the checking data at the server requires a large amount of processing by the server CPU. Thus, the server CPU may become overloaded when any more than just a few clients attempt to run the rsync algorithm. In these cases, the network bandwidth overload sought to be addressed by rsync is replaced with a CPU processing overload resulting in negligible improvements in the situation.
[0008] Another technique that is commonly used in the downloading of data, sometimes in conjunction with comparator algorithms like rsync, includes compression. Basically speaking, compression recognizes and eliminates redundancy in the data (i.e., repetitive or identical patterns of bits) to allow reductions in the amount of data to be stored or transmitted. Compression algorithms operate by generating a "history" associated with a piece of repetitive data. These histories are then referred to each time the repetitions are encountered to create a compressed form of the data. While compression is, in many cases, effective in reducing the amount of data to be transmitted, changes to just a few bytes in the beginning of an updated or revised file can result in a compressed file that is entirely different from the compressed version of a file to be updated (even though the uncompressed versions of the original file and revised file may be quite similar). As a result, this tends to defeat much of the optimization offered by comparator algorithms like rsync, which rely on similarities between the original and revised files.
Summary of the Invention
[0009] The present invention provides a technique for updating software and data files using cached checking data to improve the efficiency of updating processes such as rsync and other similar algorithms. The technique is applicable in situations where checking data are used to identify differences in a seed file located at a client and a target file located at a server. In at least some embodiments, the technique of the present invention involves having the server generate target file checking data for one or more blocks of the target file. These target file checking data are then stored in a cache or other high-speed memory of the server, where it may be easily and rapidly accessed during an updating process. Subsequently, the client generates seed file checking data corresponding to one or more blocks of the seed file. Then, during the updating process, the server compares the seed file checking data with the target file checking data stored in memory to identify any differences between blocks of the seed file and blocks of the target file. If differences between the seed file and the target file are identified, the server transmits information to the client for revising the seed file blocks that are different from the target file blocks in a manner such that the seed file blocks match the target file blocks.
Brief Description of the Drawings
[0010] Figure 1 is a block diagram illustrating an example of a networked computer system utilizable for implementing an updating process of the present invention.
[0011] Figure 2 depicts one example of a package file utilizable for transmitting software updates to a client computer.
[0012] Figure 3 depicts one example of a process utilizable for controlling operation of a client computer in conjunction with the updating process of the present invention.
[0013] Figure 4 depicts one example of a process utilizable for controlling operation of a server computer in conjunction with the updating process of the present invention. [0014] Figure 5 depicts one example of an updating process of the present invention.
Detailed Description of Embodiments of the Present Invention
[0015] The present invention provides a technique for updating software and data files using cached checking data. The following description provides one example of an implementation of the technique of the present invention. [0016] Figure 1 is a block diagram illustrating an example of a networked computer system 100 utilizable for implementing an updating process of the present invention. As shown in Figure 1, system 100 includes a number of remote client systems (clients 110), which in this embodiment may include personal computers, laptop computers, personal digital assistants, cellular telephones, and other computing devices. Each of clients 110 includes, among other components, a processing unit (processor 114) and a number of memory storage devices (storage device 118). In typical cases, storage device 118 includes, for example, any type of permanent or semi-permanent storage (e.g., hard disc drives, memory cards, and the like) and may be used to store any number of software programs or data files 122, which from time- to-time may need to be updated or revised.
[0017] As also shown in Figure 1, clients 110 may be linked to a server system or server 150 via a data communications network 130 to facilitate the transmission of information and data between clients 110 and server 150. Examples of network 130 may include the Internet or other similar data networks, such as private WANs, LANs and the like.
[0018] In the embodiment of Figure 1, server 150 may represent, for example, a software manufacturer or vendor system from which software revisions may be distributed to clients 110. Like clients 110, server 150 also includes a processor 154 and a number of memory storage devices (e.g., hard disc and the like) (storage device 158). Storage device 158, in many cases, stores software programs or data files 162, which represent newer versions of corresponding programs or files implemented on clients 110. In addition, storage device 158 may also include a high-speed memory such as cache memory 166.
[0019] As mentioned above, files and programs 122 stored on clients 110 may be updated in a manner such that they match newer versions (e.g., file 162) stored on
4
BOSTON 2293093vl server 150. For example, file 162 may represent a newer version of a software program offered by a software vendor or manufacturer to customers or owners of older versions of the program (e.g., program 122). Thus, the newer versions of the program may be available for downloading from a server computer associated with the vendor (e.g., server 150) to the client computers (e.g., clients 110) via the Internet (e.g., network 130). In these and other embodiments, server 150 may perform a number of process steps to prepare for and execute the updating process of the present invention.
[0020] In the context of software and data file updates, the files are typically implemented in conventional packages or packets which include a header and a payload (i.e., a compound file). For instance, Figure 2 depicts one example of such a package which includes header 210 and payload 220. Whereas the payload includes the bulk of the file (e.g., the compressed software executables and the like), the header is used to identify the package and includes the package name, its version, the other packages on which it depends, and the like.
[0021] One specific example of a packaging system which may be utilized to compile such a package includes the RPM Package Manager, implementable for use with systems offered by Red Hat, Inc. of Raleigh, NC. Although the examples provided in this description focus on the RPM Package Manager format (see, e.g., the example depicted in Figure 2), it is to be understood that at least some embodiments of the present invention contemplate the use of other package formats, including those that omit header files as well as those that do not require the use of compression techniques.
[0022] Referring to Figure 3, one example of a process utilizable for controlling operation of a client 104 in conjunction with the updating process of the present invention is depicted. Initially, after determining that revisions to file 122 are necessary (e.g., due to the discovery of bugs or other problems), client 104 downloads from server 150 a file containing a number of header files or headers (STEP 304). Thus, this downloaded header file, in some embodiments, may be used to identify the newer version of the program or file to be implemented onto client 104. [0023] As mentioned above, the present invention makes use of a technique or algorithm that compares and identifies the differences between the program or file to be updated (e.g., file 122) and a newer version (e.g., file 162) (which, as discussed above, may be identified by, for example, the information included in the previously downloaded header file). The algorithm then downloads only the differences between the two files. In some embodiments, the algorithm compares a 'seed' file, which includes the file or files to be updated (i.e., file 122), against a 'target' file, which includes the file or files to be implemented on client 104 (i.e., file 162), to identify any differences. One specific example of an algorithm utilizable for determining the differences between the seed and target files is the RSYNC algorithm described in Andrew Tridgell, "The rsync algorithm" (Australian National University, 1996), which is incorporated herein by reference.
[0024] Accordingly, a seed file is prepared and separated into a number of blocks (STEP 308). Although any size and any number of blocks are possible, the seed file is typically separated into a series of non-overlapping fixed-sized blocks of between 500-1000 bytes. As will be discussed below, the present invention contemplates identifying differences between these blocks and similarly sized blocks of the target file, and subsequently revising any blocks with differences such that they match corresponding blocks of the target file.
[0025] To facilitate this comparison process, checking data are generated for each of the seed file blocks (STEP 312). These checking data are indicative of the content of an associated block, and may be used to identify the existence of changes in a block of data. For example, identical checking data between two versions of a block may suggest that the content of the block has not changed. On the other hand, different checking data may suggest that the block has undergone some type of revision. In many cases the checking data may include a simple error-detection scheme in which each block is associated with a numerical or other value based on the number of set bits in the block. Thus, if the numerical value is not the same in two blocks, it can be assumed that the content of the blocks also is not the same. In at least some examples, the checking data may include checksums associated with portions of the seed and target file blocks. Furthermore, multiple layers of checking data may be utilized. For instance, checking data that are relatively easy and quick to calculate may be generated for use in determining near or potential matches (i.e., weak or rolling checking data). Checking data that provide a stronger indication of a match but are more time consuming to calculate (i.e., strong checking data) may be generated to definitively confirm matches between only those blocks that have matching weak checking data. Blocks having identical sets of strong checking data may then be deemed to be identical. As an example, weak checking data may include a 32-bit checksum and strong checking data may include a 128-bit checksum. [0026] After generating the checking data for the seed file, client 104 establishes a connection with server 150 (STEP 316). Subsequently, the process executes an updating process, one example of which is depicted below with reference to Figure 5, for identifying and downloading the differences between the seed and target files (STEP 320). Once this updating process has been completed, the updated file is compressed using any suitable compression technique (STEP 324). From there, the file is appended to a corresponding header file previously downloaded, for example, in STEP 304 (STEP 328), thereby resulting in an updated copy of the file or files. [0027] Figure 4 depicts one example of a process utilizable for controlling operation of a server computer in conjunction with the updating process of the present invention. Initially, server 150 retrieves file 162 from memory (STEP 404) and begins preparing a target file which will be compared against a seed file during the updating process. At this point, file 162 may resemble, for example, a typical RPM file (see, e.g., Figure 2), and may include a header and a payload (which may or may not be compressed). Because the payload of file 162 constitutes the target file, it is separated from the header (STEP 408), and subsequently decompressed (STEP 412). [0028] Once the payload of file 162 (i.e., the target file) has been decompressed, checking data for the target file may be generated. As discussed above, checking data may be generated for a number of blocks of the target file for use in comparing against blocks of the seed file. In particular, this checking data may include checksums associated with portions of the target file blocks. As with the seed file, several layers of checking data may be generated, including either or both of weak (e.g., a 32-bit checksum) and strong checking data (e.g., a 128-bit checksum) (STEP 416 and STEP 420).
[0029] The strong checking data, as discussed above, are relatively expensive to generate in the sense that they require a significant amount of processing time to calculate. The weak checking data, while easier to generate, nevertheless also require a significant amount of processing resources. Thus, the present invention contemplates storing these sets of checking data in a memory such as cache 166, where they may be analyzed without significantly affecting the performance of the server. In at least some embodiments of the present invention, both the weak and the strong checking data are cached. In other cases, since updates occur in limited areas of a file, it is possible to predict which blocks will match. Thus, in certain embodiments, only those sets of strong checking data expected to match are cached. Without having to generate strong checking data for all of the blocks, additional time savings may be realized.
[0030] The preparation of the target file (described above) may take place at any time. For example, in at least some embodiments, the process occurs during initial attempts to update client versions of program or data files. Other embodiments of the present invention contemplate performing the process before updating procedures are initiated by client systems.
[0031] Figure 5 depicts one example of an updating process of the present invention. In this example, after a connection has been established by one of clients 104 (e.g., STEP 316 in Figure 3), processing starts with server 150 receiving the checking data associated with a seed block (STEP 504). Specifically, server 150 may receive both the weak checking data, as well as the strong checking data associated with a block generated by one of clients 104.
[0032] Subsequently, the checking data associated with the target file blocks are retrieved from memory 166 (STEP 508). In one example, this includes both weak checking data as well as strong checking data. In other examples, only the strong checking data expected to match are generated and stored. Hence, in these examples, only the strong checking data expected to match are retrieved along with the weak checking data. Server 150 then stores this checking data in any suitable data structure, such as for example any number of hash tables and the like, where the data may be easily retrieved for comparison.
[0033] After the checking data have been retrieved, the weak checking data for a block of the seed file received from one of clients 104 are compared against the weak checking data for the blocks of the target file (STEP 516). If the weak checking data for the seed and target file do not match, the process determines that the block of the seed file being analyzed has been revised and must be updated. In these situations, the process downloads the revised block from the target file of server 150 (STEP 528). The block of the seed file is then replaced with this newly downloaded block (STEP 532). In alternate embodiments, rather than downloading the entire revised block, instructions for constructing an accurate copy of the target file block may instead be transmitted. The details involved with downloading revised blocks as well as instructions for constructing an accurate copy of the block may be found in Andrew
Tridgell, "The rsync algorithm" (1996).
[0034] If, on the other hand, the process determines that a match exists between the weak checking data of the seed and target blocks (STEP 516), the strong checking data (in this embodiment) is analyzed to confirm, more definitively, whether a match exists. Specifically, the strong checking data for the block of the seed file are compared against strong checking data for the block of the target file (STEP 524) to confirm whether the blocks are identical.
[0035] If the strong checking data for the block of the seed file match with the strong checking data of the block of the target file, the process concludes that the blocks are identical. In these situations, the process keeps the existing copy of the seed file block (STEP 536).
[0036] If the strong checking data do not match, the process concludes that the block of the seed file has been updated, and therefore requires revision. The process then downloads the revised block and uses it to replace the outdated seed file block
(STEP 528 and STEP 532). The process continues in this manner until each of the blocks in the seed file has been considered (STEP 540).
[0037] In alternate embodiments, instead of receiving the seed block checking data in multiple steps and transmissions, the checking data corresponding to each of the blocks of the seed file (or subsets thereof) may be transmitted and received by server 150 prior to any comparisons. In these embodiments, processing returns to
STEP 512 rather than STEP 504.
[0038] The updating process of the present invention may be implemented in a variety of forms, such as in software or firmware, running on a general purpose computer or a specialized device. The code can be provided in any machine-readable medium, including magnetic or optical disk, or in memory. Furthermore, the present invention is utilizable in conjunction with computer system that operates software which may require periodic updates, as well as any operating system (e.g., Linux,
Unix, MS Windows, MacOS, etc.).
[0039] While there have been shown and described examples of the present invention, it will be readily apparent to those skilled in the art that various changes and modifications may be made without departing from the scope of the invention as defined by the following claims. For example, any number of levels of checking data may be used including the omission of either a strong or weak level. Specifically, embodiments of the present invention specifically contemplate situations where only a single form of checking data is utilized, such as, for example, a single 32-bit checksum. Accordingly, the invention is limited only by the following claims and equivalents thereto.
What is claimed is:

Claims

Claims
1. A method for updating a seed file to match a target file, said method comprising: generating target file checking data for one or more blocks of said target file; storing at least a portion of said target file checking data in a cache; receiving seed file checking data corresponding to one or more blocks of said seed file; comparing said seed file checking data with said target file checking data to identify differences in blocks of said seed file and blocks of said target file; and transmitting information for revising seed file blocks which are different from target file blocks such that said seed file blocks match said target file blocks.
2. The method of claim 1, wherein said target file checking data and said seed file checking data each comprise weak level checking data and strong level checking data, and wherein said comparing comprises comparing said weak level checking data and next comparing strong level checking data only if a match is identified in said weak level checking data.
3. The method of claim 1 , wherein said target file checking data and said seed file checking data each comprise a 32-bit checksum and a 128-bit checksum.
4. The method of claim 1 , wherein said target file checking data and said seed file checking data each comprise weak level checking data and strong level checking data, and wherein said storing comprises storing said weak level checking data associated with said target file and storing only said strong level checking data associated with said target file expected to match strong level checking data associated with said seed file.
5. The method of claim 1, wherein said target file checking data and said seed file checking data each comprise a checksum.
6. The method of claim 1, wherein said target file checking data stored in a cache are used with multiple updating requests received from a plurality of clients.
7. The method of claim 1, further comprising decompressing said target file prior to said generating.
8. The method of claim 1, wherein said seed file and said target file are decompressed prior to said generating, wherein said seed file blocks are revised in accordance with said transmitted information to match said target file blocks, and wherein said revised seed file blocks are recompressed after revising.
9. The method of claim 8, wherein said seed file comprises a compressed payload, previously separated from a compound file, and wherein said revised seed file is appended to a header file after said recompressing to constitute a revised compound file.
10. The method of claim 9, wherein said compound file and said revised compound file comport with an RPM Package Manager format.
11. A method for updating a seed file to match a target file, said method comprising: generating seed file checking data for one or more blocks of said seed file; transmitting said seed file checking data for comparison against cached target file checking data corresponding to one or more blocks of said target file to identify differences in blocks of said seed file and blocks of said target file; and receiving information for revising seed file blocks which are different from target file blocks such that said seed file blocks match said target file blocks.
12. The method of claim 11 , further comprising: decompressing said seed file prior to said generating; revising said seed file blocks in accordance with said information to match said target file blocks; and recompressing said revised seed file blocks.
13. The method of claim 12, wherein said seed file comprises a compressed payload, previously separated from a compound file, and wherein said revised seed file blocks are appended to a header file after said recompressing to constitute a revised compound file.
14. The method of claim 13, wherein said compound file and said revised compound file comport with an RPM Package Manager format.
15. A computer program product, residing on a computer-readable medium, for use in updating a seed file to match a target file, said computer program product comprising instructions for causing a computer to: generate target file checking data for one or more blocks of said target file; store at least a portion of said target file checking data in a cache; receive seed file checking data corresponding to one or more blocks of said seed file; compare said seed file checking data with said target file checking data to identify differences in blocks of said seed file and blocks of said target file; and transmit information for revising seed file blocks which are different from, target file blocks such that said seed file blocks match said target file blocks.
16. The computer program product of claim 15, wherein said target file checking data and said seed file checking data each comprise weak level checking data and strong level checking data, and wherein said computer program product further comprises instructions for causing said computer to compare said weak level checking data and to compare said strong level checking data only if a match is identified in said weak level checking data.
17. The computer program product of claim 15, wherein said target file checking data and said seed file checking data each comprise weak level checking data and strong level checking data, and wherein said computer program product further comprises instructions for causing said computer to store said weak level checking data associated with said target file and to store only said strong level checking data associated with said target file expected to match strong level checking data associated with said seed file.
18. The computer program product of claim 15, wherein said target file checking data and said seed file checking data each comprise a checksum.
19. A computer program product, residing on a computer-readable medium, for use in updating a seed file to match a target file, said computer program product comprising instructions for causing a computer to: generate seed file checking data for one or more blocks of said seed file; transmit said seed file checking data for comparison against cached target file checking data corresponding to one or more blocks of said target file to identify differences in blocks of said seed file and blocks of said target file; and receive information for revising seed file blocks which are different from target file blocks such that said seed file blocks match said target file blocks.
20. A system for updating a seed file to match a target file, said system comprising: means for generating target file checking data for one or more blocks of said target file; means for storing at least a portion of said target file checking data in a cache; means for receiving seed file checking data corresponding to one or more blocks of said seed file; means for comparing said seed file checking data with said target file checking data to identify differences in blocks of said seed file and blocks of said target file; and means for transmitting information for revising seed file blocks which are different from target file blocks such that said seed file blocks match said target file
blocks.
PCT/US2004/031079 2003-09-26 2004-09-22 Software and data file updating process WO2005031515A2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP04784791A EP1678572A4 (en) 2003-09-26 2004-09-22 Software and data file updating process

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US10/672,921 US7509635B2 (en) 2003-09-26 2003-09-26 Software and data file updating process
US10/672,921 2003-09-26

Publications (2)

Publication Number Publication Date
WO2005031515A2 true WO2005031515A2 (en) 2005-04-07
WO2005031515A3 WO2005031515A3 (en) 2007-08-16

Family

ID=34376505

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2004/031079 WO2005031515A2 (en) 2003-09-26 2004-09-22 Software and data file updating process

Country Status (4)

Country Link
US (1) US7509635B2 (en)
EP (1) EP1678572A4 (en)
TW (1) TWI285824B (en)
WO (1) WO2005031515A2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101832125A (en) * 2010-04-16 2010-09-15 电子科技大学 Remotely updating device of EDIB (Electronic Data Interchange Bus) based down-hole program

Families Citing this family (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003244446A (en) * 2002-02-21 2003-08-29 Canon Inc Image processor and image processing method
US20060036579A1 (en) * 2004-08-10 2006-02-16 Byrd Stephen A Apparatus, system, and method for associating resources using a time based algorithm
US7555532B2 (en) * 2004-09-23 2009-06-30 Orbital Data Corporation Advanced content and data distribution techniques
US20060218200A1 (en) * 2005-03-24 2006-09-28 International Business Machines Corporation Application of log records by storage servers
US7650389B2 (en) * 2006-02-01 2010-01-19 Subhashis Mohanty Wireless system and method for managing logical documents
CN101553167B (en) * 2006-05-12 2013-04-24 因维沃公司 Method of interfacing a detachable display system to a base unit for use in MRI
US8707297B2 (en) * 2006-07-26 2014-04-22 Dell Products L.P. Apparatus and methods for updating firmware
US7711760B2 (en) * 2006-11-30 2010-05-04 Red Hat, Inc. File update availability checking in a hierarchal file store
US8578332B2 (en) * 2007-04-30 2013-11-05 Mark Murray Universal microcode image
TWI416327B (en) * 2009-03-25 2013-11-21 Wistron Corp Data backup method
CN102479093A (en) * 2010-11-25 2012-05-30 英业达股份有限公司 Software installing system for providing verification and updating original file and register table and method thereof
CN102955816B (en) 2011-08-30 2016-04-20 国际商业机器公司 String matching is utilized to carry out the method and system of data syn-chronization
TWI579707B (en) * 2011-11-04 2017-04-21 優必達公司 System and method of leveraging gpu resources to enhance performance of an interact-able content browsing service
CN105095226B (en) * 2014-04-25 2019-08-02 广州市动景计算机科技有限公司 Web page resources loading method and device
CN107077396B (en) * 2015-01-26 2020-12-04 日立汽车系统株式会社 In-vehicle control device, program writing device, program generating device, and method
US10585654B2 (en) * 2015-12-04 2020-03-10 Vmware, Inc. Deployment of processing components of computing infrastructure using annotated command objects
CN109716289B (en) 2016-09-23 2021-01-12 华为技术有限公司 Binary image differential inpainting
CN111866099B (en) * 2020-07-07 2022-09-20 锐捷网络股份有限公司 Method, device, system, equipment and storage medium for downloading mirror image file
CN113553090B (en) * 2021-07-26 2023-07-25 网易(杭州)网络有限公司 Update control method and device for client application program

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5446888A (en) * 1994-01-14 1995-08-29 Pyne; Charles F. Remote file transfer method and apparatus
US5978805A (en) * 1996-05-15 1999-11-02 Microcom Systems, Inc. Method and apparatus for synchronizing files
US6216175B1 (en) * 1998-06-08 2001-04-10 Microsoft Corporation Method for upgrading copies of an original file with same update data after normalizing differences between copies created during respective original installations
US6952823B2 (en) * 1998-09-01 2005-10-04 Pkware, Inc. Software patch generator using compression techniques
US20030182414A1 (en) * 2003-05-13 2003-09-25 O'neill Patrick J. System and method for updating and distributing information
US7096311B2 (en) * 2002-09-30 2006-08-22 Innopath Software, Inc. Updating electronic files using byte-level file differencing and updating algorithms
US7644406B2 (en) * 2003-01-21 2010-01-05 Hewlett-Packard Development Company, L.P. Update system capable of updating software across multiple FLASH chips

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of EP1678572A4 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101832125A (en) * 2010-04-16 2010-09-15 电子科技大学 Remotely updating device of EDIB (Electronic Data Interchange Bus) based down-hole program
CN101832125B (en) * 2010-04-16 2012-11-07 电子科技大学 Remotely updating device of EDIB (Electronic Data Interchange Bus) based down-hole program

Also Published As

Publication number Publication date
EP1678572A2 (en) 2006-07-12
EP1678572A4 (en) 2008-11-26
WO2005031515A3 (en) 2007-08-16
US20050071371A1 (en) 2005-03-31
TW200525391A (en) 2005-08-01
TWI285824B (en) 2007-08-21
US7509635B2 (en) 2009-03-24

Similar Documents

Publication Publication Date Title
US7509635B2 (en) Software and data file updating process
US6493871B1 (en) Method and system for downloading updates for software installation
US7849462B2 (en) Image server
US10853054B2 (en) Updating a file using sync directories
US7665081B1 (en) System and method for difference-based software updating
US20060112152A1 (en) Smart patching by targeting particular prior versions of a file
US8073926B2 (en) Virtual machine image server
AU2004279202B2 (en) System and method for updating installation components in a networked environment
AU2004279173B2 (en) System and method for updating files utilizing delta compression patching
US7676448B2 (en) Controlling installation update behaviors on a client computer
US7051315B2 (en) Network streaming of multi-application program code
AU2004279170C1 (en) System and method for managing and communicating software updates
AU2002300771B2 (en) Software Patch Generator
US7856439B2 (en) Method and system for using semantic information to improve virtual machine image management
US8219592B2 (en) Method and system for using overlay manifests to encode differences between virtual machine images
US20070094348A1 (en) BITS/RDC integration and BITS enhancements
US7856440B2 (en) Method and system for separating content identifiers from content reconstitution information in virtual machine images
US7996414B2 (en) Method and system for separating file system metadata from other metadata in virtual machine image format
US20050203968A1 (en) Update distribution system architecture and method for distributing software
US20060258344A1 (en) Mobile handset update package generator that employs nodes technique
US20090313322A1 (en) Application Streaming Over HTTP
JP2005044360A (en) System and method for intra-package delta compression (intra-packetdeltacompression) of data
EP1577766A2 (en) Side-by-side drivers
US7143405B2 (en) Methods and arrangements for managing devices
US20020091720A1 (en) Methods and arrangements for providing improved software version control in managed devices

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): BW GH GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
WWE Wipo information: entry into national phase

Ref document number: 2004784791

Country of ref document: EP

WWP Wipo information: published in national office

Ref document number: 2004784791

Country of ref document: EP