JP2007527572A - Emulated storage system that supports instant volume recovery - Google Patents

Emulated storage system that supports instant volume recovery Download PDF

Info

Publication number
JP2007527572A
JP2007527572A JP2006534090A JP2006534090A JP2007527572A JP 2007527572 A JP2007527572 A JP 2007527572A JP 2006534090 A JP2006534090 A JP 2006534090A JP 2006534090 A JP2006534090 A JP 2006534090A JP 2007527572 A JP2007527572 A JP 2007527572A
Authority
JP
Japan
Prior art keywords
data
file
backup
storage system
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP2006534090A
Other languages
Japanese (ja)
Inventor
ミクロス サンドルフィ,
Original Assignee
セパトン,インコーポレイテッド
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority to US50732903P priority Critical
Priority to US10/911,987 priority patent/US7146476B2/en
Application filed by セパトン,インコーポレイテッド filed Critical セパトン,インコーポレイテッド
Priority to PCT/US2004/032122 priority patent/WO2005033945A1/en
Publication of JP2007527572A publication Critical patent/JP2007527572A/en
Application status is Granted legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1448Management of the data involved in backup or backup restore
    • G06F11/1451Management of the data involved in backup or backup restore by selection of backup contents
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1448Management of the data involved in backup or backup restore
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1456Hardware arrangements for backup
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1469Backup restoration techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2201/00Indexing scheme relating to error detection, to error correction, and to monitoring
    • G06F2201/815Virtual
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from or digital output to record carriers, e.g. RAID, emulated record carriers, networked record carriers
    • G06F3/0601Dedicated interfaces to storage systems
    • G06F3/0628Dedicated interfaces to storage systems making use of a particular technique
    • G06F3/0662Virtualisation aspects
    • G06F3/0664Virtualisation aspects at device level, e.g. emulation of a storage device or system
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from or digital output to record carriers, e.g. RAID, emulated record carriers, networked record carriers
    • G06F3/0601Dedicated interfaces to storage systems
    • G06F3/0668Dedicated interfaces to storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]

Abstract

An apparatus and method for mounting a data volume corresponding to a backup data set to a host computer in a backup storage system, the method comprising: one or more data files of the most recently backed up version stored in the backup storage system Mounting a data volume containing at least one data file corresponding to the host computer and the most recently backed up version of the one or more data files while storing the most recently backed up one or more data files. Data corresponding to the one or more data files of the second version that is more recent than the one or more data files of the recently backed up version is transferred to the backup storage system. Comprising the step to save the arm.
[Selection] Figure 2

Description

Field of Invention

  The present invention relates to data storage. In particular, to provide the equivalent of a full backup using an existing full backup (full back-up) and subsequent incremental back-up, emulating a tape storage system, The present invention relates to an apparatus and method for enabling an end user to recover data from the above backup.

Explanation of related technology

  Most computer systems include one or more host computers and one or more data storage systems that store data used by the host computers. The host computer and storage system are typically networked together, such as using a fiber channel network, an Ethernet network, or other form of communication network. Fiber channel combines the speed of channel-based transfer methods with the flexibility of network-based transfer methods so that multiple initiators can communicate with multiple targets over the network. The initiator and the target can be any device connected to the network. The fiber channel is generally selected as a storage system network for transferring large-capacity data by being implemented using a fast transmission medium such as an optical fiber cable.

  FIG. 1 illustrates an example of a generally networked computing environment that includes various host computers and backup storage systems. One or more application servers 102 are connected to a plurality of user computers 104 through a near field communication network (LAN) 103. Application server 102 and user computer 104 may all be considered “host computers”. An application server 102 is connected to one or more first storage devices 106 through a SAN (storage area network) 108. The first storage device 106 can be, for example, a disk array that can be used by EMC Corporation, IBM Corporation, or the like. Alternatively, a bus (not shown) or other network link may provide an interconnection between the application server and the first storage system 106. The bus and / or fiber channel network connection is performed using a host computer [eg, using a protocol such as the SCSI (Small Component System Interconnect) protocol that indicates the format of the packets transmitted between the application server 102 and the storage system 106. Can work.

  The networked computing environment illustrated in FIG. 1 is a typical example of a large system that can be used, for example, by a large financial institution or large enterprise. Most networked computing environments need not include all of the elements illustrated in FIG. For example, a small networked computing environment can simply include a host computer that is coupled directly or through a LAN to a storage system. Alternatively, although user computer 104, application server 102, and media server are shown separately in FIG. 1, these functions may be combined into one or more computers.

  Most networked computing environments, not just the first storage device 106, include one or more second or backup storage systems 110. Although the backup storage system 110 has a large capacity, a reliable second storage system can be used, but can generally be a tape library. Generally, the second storage system is slower than the first storage device, but some form of separable media (eg, tape, magnetic disk, or optical disk) that can be stored and deleted off-site. )including.

  In the illustrated example, the application server 102 may communicate directly with the backup storage system 110 via, for example, an Ethernet or other communication link 112. However, such connections are relatively slow and may consume resources such as processor time or network bandwidth. Thus, a system as shown may include, for example, one or more media servers that may provide a communication link using Fiber Channel between the SAN 108 and the backup storage system 110.

  The media server 114 controls the transfer of data between the host computer (such as the user computer 104, the media server 114, and / or the application server 102), the first storage device 106, and the backup storage system 110. Software including applications can be executed. Examples of backup / recovery applications include products from Veritas, Legato, etc. For data protection, data from various host computers and / or first storage devices in a networked computing environment can be periodically backed up to a backup storage system 110 using known backup / recovery applications.

  Of course, as noted above, most networked computing environments may include fewer components than the exemplary networked computing environment illustrated in FIG. Thus, the media server 114 can also be combined with the application server 102 in a substantially single host computer, and the backup / recovery application is directly or indirectly coupled to the backup storage system 110 over the network. It should be appreciated that it can be executed on any host computer.

  An example of a typical backup storage system is a tape library that includes a number of tape cartridges, one or more tape drives, and a robotic mechanism that controls the loading and unloading of cartridges into the tape drives. The backup / recovery application directs the data to be recorded on the tape by the robotic mechanism determining the position of a particular tape cartridge, eg, tape number 0001, and loading the tape cartridge into the tape drive . The backup / restore application also controls the format in which data is recorded on the tape. In general, backup / recovery applications use SCSI commands or other standardized commands to direct robotic mechanisms, control tape drives, record data on tape, and record data from tape. Restore in advance.

  Conventional tape library backup systems have various problems, including speed, reliability, and fixed capacity. Most large companies need to back up terabytes of data every week. However, despite the high cost, high-end tapes typically have a speed of 30-40 megabytes (MB / s) per second, which translates to about 50 gigabytes (GB / hr) per hour. Only can read / record data. Therefore, the continuous data transfer time for backing up 1 or 2 terabytes of data to the tape backup system can be at least 10 to 20 hours.

  Or, most tape manufacturers have relatively frequent occurrences in a typical tape library because tapes can drop (a person or robotic mechanism can carry the tape or drop it during a loading operation). Obtain) If the tape is exposed to non-ideal environmental conditions such as extreme temperature and humidity, it does not guarantee that data can be stored or restored on or from the tape. Therefore, considerable care is required to store the storage tape in a regulated environment. Also, complex libraries of tape libraries (including robotic mechanisms) are expensive to maintain, and each tape cartridge is relatively expensive and has a limited lifetime.

Summary of the Invention

  Embodiments of the present invention alleviate or overcome some or all of the problems of conventional tape library systems and provide a more reliable backup storage system than conventional tape library systems.

  Overall, embodiments of the present invention are randomized, which emulate traditional tape backup storage systems so that backup / recovery applications view devices and media identically to physical tape libraries. Provide an access-based storage system. The storage system of the present invention uses software and hardware to emulate physical tape media, one or more random access disk arrays, translating tape formats, linear, Replace a series of data with data suitable for saving to disk. In addition, an application implemented in hardware and / or software is provided to restore data stored in the backup storage system.

  According to various embodiments of the present invention, a mechanism is provided for converting a series of tape formatted data into a format compatible with random access I / O. In one embodiment, a mechanism for mounting a converted representation of tape formatted data on a host computer as a network file system (NFS) or a common Internet file system (CIFS) mounted volume Is provided.

  In accordance with another embodiment of the present invention, a mechanism is provided for keeping the original data unchanged by converting records for the mounted file system to safe storage. In one embodiment, a mechanism is provided for tracking real-time changes to the original data to allow random access I / O. In another embodiment, a mechanism is provided for converting a newly recorded data back into a series of tapes and tape formatted data compatible with specific I / O.

  In one embodiment, a method includes a data volume on a host computer that includes one or more data files corresponding to the most recently backed up version of one or more data files stored in a backup storage system. More than the most recently backed up version of one or more data files stored in the backup storage system while mounting and storing one or more data files of the most recently backed up version And storing data corresponding to the one or more data files of the recent second version in a backup storage system. The method may also include linking the most recently backed up version of one or more data files and the second version of one or more data files. In one example, the method may include generating a data structure that considers the most recently backed up version of one or more data files and the second version of one or more data files to be the same. In other examples, the second version of the one or more data files may be a modified version of the most recently backed up version of the one or more data files.

  In another embodiment, a backup storage system includes a backup storage medium for storing a backup data set, and a controller including one or more processors configured to execute a set of instructions embodying the above method. including.

  According to another embodiment, a computer readable medium having a data structure stored thereon is provided, the data structure uniquely identifying a system file corresponding to a backup data set that includes one or more data files. A first identifier and one or more second identifiers that identify individual storage locations on the storage medium in which the most recent version of each of the one or more data files in the backup data set is stored.

  The accompanying drawings are not shown to scale. In the drawings, each identical or nearly identical component that is illustrated in various figures is represented by a like numeral. For clarity, reference numerals are not assigned to all components shown in all drawings.

Detailed explanation

  Various embodiments will be described in more detail with reference to the accompanying drawings. The present invention is not limited to the details of the arrangement of components and the structure in the description shown in the drawings or described later. The present invention can be implemented in various ways and forms. Also, the expressions and terms used herein are for the purpose of explanation rather than limitation of the present invention. Expressions such as “include”, “have”, “configured”, “consist of” are not only equivalent to the items described below, but also include additional items.

  As used herein, the term “host computer” refers to one or more storage systems or personal computers, workstations, mainframes, networked clients, servers, etc. that can communicate with other host computers. Means any computer having a processor. The host computer can include not only user computers (which can be user workstations, PCs, mainframes, etc.), but also media servers and application servers (as described above with reference to FIG. 1). Also, within this specification, the term “networked computer environment” refers to any arbitrary connection of a plurality of host computers to one or more shared storage systems in such a way that the storage system can communicate with each host computer. Includes computing environment. Fiber Channel is an example of a communication network that can be used in embodiments of the present invention. However, this network is not limited to Fiber Channel, and various network components communicate with each other instead of or in addition to Fiber Channel through any network such as Token Ring, Ethernet, etc., or through other combinations of network connections. It should be understood that Embodiments of the invention can also be used in bus topologies such as SCSI or parallel SCSI.

  According to various embodiments of the present invention, a virtual separable media library backup storage system is provided that can use one or more disk arrays to emulate a separable media based storage system. . According to an embodiment of the present invention, data is backed up to a separable medium (tape, magnetic disk, optical disk, etc.) without the need to modify or adjust an existing backup procedure or purchase a new backup / recovery application. Data can be backed up to a disk array using the same backup / recovery application used for In the above-described embodiment, the separable medium on which the tape is emulated is a tape, and the backup storage system of the present invention is a robot used for handling tape and a tape in a conventional tape library system. Emulate a tape library system that includes a mechanism.

  A storage system according to an embodiment of the present invention includes hardware and software that interface a host computer (which drives a backup / recovery application) and a backup storage medium together. Storage systems emulate tapes or other forms of separable storage media, and backup / recovery applications now consider devices and media identical to physical tape libraries, linear, series of tapes It can be designed to convert the format data into data suitable for storage on a random access disk. In this way, the storage system of the present invention does not require any new backup / recovery application software or policy, and has improved functionality (as described below, the user searches for user files that have been personally backed up Functions such as making it possible)

  FIG. 2 depicts a block diagram of one embodiment of a networked computing environment that includes a backup storage system 170 according to an embodiment of the present invention. As shown, the host computer 120 is connected to the storage system 170 through the network connection 121. The network connection 121 may be, for example, a fiber channel connection that enables high-speed data transfer between the host computer 120 and the storage system 170. Host computer 120 can be or include one or more application servers 102 (FIG. 1) and / or media server 114 (FIG. 1), and can be any computer present in a networked computing environment, or It should be appreciated that data backup from the first storage system 110 (FIG. 1) can be enabled. One or more user computers 136 may also be connected to the storage system 170 through other network connections 138 such as Ethernet connections. As will be described later, the storage system may allow a user of the user computer 136 to selectively recover by looking at a user file backed up from the storage system.

  The storage system includes, for example, a backup storage medium 126 that can be one or more disk arrays as described in more detail below. The backup storage medium 126 provides an actual storage space for data backed up from the host computer 120. However, the storage system 170 emulates a separable media storage system such as a tape library, and by executing a backup / recovery application on the host computer 120, data is backed up to a conventional separable storage medium. Additional hardware and software can also be included to make it look like. Thus, as illustrated in FIG. 2, the storage system 170 includes an “emulated medium” 134 that means a virtual or emulated separable storage medium such as, for example, tape. This “emulated medium” 134 is provided to the host computer by storage system software and / or hardware and appears on the host computer as a physical storage medium. Interfacing between the actual backup storage medium 126 and the emulated medium 134 is a switching network 132 that receives data from the host computer 120 and stores the data in the backup storage medium 126 as described in detail below. And a storage system controller (not shown). In this manner, the storage system emulates a conventional tape storage system in the host computer 120.

  According to one embodiment, the storage system may include a “radical metadata cache” 242 that stores metadata associated with user data backed up from the host computer 120 on the storage system 170. As used herein, the term “metadata” refers to information about user data and refers to data that describes the characteristics of actual user data. The radical metadata cache 242 is searchable data that allows users and / or software applications to randomly place backed user files, compare user files with each other, or access and adjust backed up user files. Means a gathering of Two examples of software applications that can use data stored in the radical metadata cache 242 include an end-user recovery application 300 and a comprehensive full backup application 240, which will be described in more detail below.

  In summary, the integrated full backup application 240 can generate an integrated full backup data set from one or more existing full backup data sets and one or more incremental backup data sets. Comprehensive full backups do not need to perform periodic (eg weekly) full backups, saving considerable time and network resources. The comprehensive full backup application 240 will be described in detail later. The end user recovery application 300 allows an end user (eg, an operator of the user computer 136) to browse, locate, view, and / or recover previously backed up user files from the storage system 170. This will be described in detail later.

  As described above, the storage system 170 includes hardware and software that interfaces the host computer 120 and the backup storage medium 126. The hardware and software according to embodiments of the present invention emulate a conventional tape library backup system, and in terms of the host computer 120, it appears that data has been backed up on tape, but in practice multiple disks Backups are made on other storage media such as arrays.

  FIG. 3 is a block diagram illustrating an embodiment of a storage system 170 according to an embodiment of the invention. In one embodiment, the storage system 170 hardware includes a storage system controller 122 and a switching network 132 that couples the storage system controller 122 to a backup storage medium 126. The storage system controller 122 includes a processor 127 (can be a single processor or a plurality of processors) capable of driving all or part of the storage system software, and a memory 129 (RAM, ROM, PROM, EEPROM, flash memory, and the like). Including combinations thereof). The memory 129 may be used to store metadata about data stored on the backup storage medium 126. Software including programming code for implementing embodiments of the present invention is typically stored on a non-volatile recording medium that can be recorded and / or read by a computer, such as RAM, ROM, optical disk, magnetic disk, or tape, and subsequent processors. 127 to a memory 129 that can be executed by 127. Such programming code can be recorded in any one of a plurality of programming languages such as Java, Visual Basic, C, C #, C ++, Fortran, Pascal, Eiffel, Basic, COBAL, or a combination thereof. Is not limited to a particular programming language. In general, in operation, the processor 127 can be the same data as the code that implements embodiments of the present invention, such as a RAM that allows a processor to access information faster from a non-volatile storage medium than a non-volatile storage medium. To be read by memory in the form of

  As shown in FIG. 3, the controller 122 includes a number of port adapters 124 a, 124 b, 124 c that couple the controller 122 to the host computer 120 and the switching network 132. As shown in the figure, the host computer 120 is connected to the storage system through a port adapter 124a such as a fiber channel / port adapter. Through the storage system controller 122, the host computer 120 can back up data to the backup storage medium 126 and recover data from the backup storage medium 126.

  In the illustrated example, the switching network 132 may include one or more Fiber Channel switches 128a, 128b. The storage system controller 122 includes a plurality of fiber channel port adapters 124b, 124c that connect the storage system controller to the fiber channel switches 128a, 128b. Through the Fiber Channel switches 128a and 128b, the storage system controller 122 allows data to be backed up to the backup storage medium 126. As illustrated in FIG. 3, the switching network 132 may further include one or more Ethernet switches 130a, 130b connected to the storage system controller 122 through Ethernet port adapters 125a, 125b. In one example, the storage system controller 122 further includes, for example, another Ethernet port adapter 125c that connects to the LAN 103 and allows the storage system 170 to communicate with a host computer (eg, a user computer) as described below.

  In the example illustrated in FIG. 3, the storage system controller 122 is connected to the backup storage medium 126 through a switching network that includes two Fiber Channel switches and two Ethernet switches. Providing two or more of each form of switch in storage system 170 eliminates all single point failures in the system. That is, if one switch (eg, Fiber Channel switch 128a) fails, the storage system controller 122 can still communicate with the backup storage medium 126 through the other switch. Such an arrangement is advantageous in terms of reliability and speed. For example, as described above, reliability is improved by providing extra components and eliminating single point failures. In some embodiments, the storage system controller can back up data on the backup storage medium 126 using all or part of the parallel Fiber Channel switch, thus increasing the overall backup speed. However, the system need not include more than one form of each switch, and the switching network need not include Fiber Channel and Ethernet switches. Also, no switch is required in the example where the backup storage medium 126 includes a single disk array.

  As described above, in one embodiment, the backup storage medium 126 may include one or more disk arrays. In one preferred embodiment, the backup storage medium 126 includes a plurality of ATA or SATA disks. Such discs are easily available in the market and are relatively inexpensive compared to conventional storage array products from manufacturers such as EMC and IBM. Also, if you keep in mind the price of such detachable media (e.g. tape) and the fact that such media has a limited lifetime, such media is priceless with traditional tape-based media. Comparable to backup storage system. Further, such a disc can be read / recorded at a higher speed than a tape. For example, data can be backed up on a disk at a speed of at least 150 MB / s, which is converted to about 540 GB / hr, which is reliably faster (eg, about 10 times) than the tape backup speed through a single fiber channel connection. In addition, some Fiber Channel connections can be implemented in parallel, which is even faster. According to an embodiment of the present invention, the backup storage medium can be configured to implement a plurality of RAID (Redundant Array of Independent Disks) systems. For example, in one embodiment, the backup storage medium can be configured as a RAID-5 implementation.

  As described above, embodiments in accordance with the present invention emulate a “virtual tape” by emulating a conventional tape library backup system using a disk array to replace a tape cartridge as a physical backup storage medium. Library "is provided. The physical tape cartridge provided in a conventional tape library is replaced by the term “virtual cartridge”. The term “virtual tape library” should be recognized as meaning an emulated tape library that can be implemented in software and / or physical hardware as one or more disk arrays, for example. It is. Although we mainly refer to emulated tapes here, the storage system can emulate other storage media such as CD-ROM, DVD-ROM, etc. The term “virtual cartridge” is common Should be recognized as meaning emulated storage media such as tape or emulated CD. In one embodiment, the virtual cartridge actually corresponds to one or more hard disks.

  Thus, in one embodiment, a software interface is provided to emulate a tape library so that in a backup / recovery application, data appears to be backed up to tape. However, an actual tape library allows data to be actually backed up on this disk array, but is replaced by one or more disk arrays. Hereinafter, various forms, characteristics, and operations of software included in the storage system 170 will be described.

  The software can be described as “included” in the storage system 170 and can be described as being executed by the processor 127 of the storage system controller 122 (FIG. 3), but all software needs to be executed on the storage system controller 122 There is no. Software programs such as comprehensive full backup applications and end-user recovery applications can be run on the host computer and / or user computer, via all or part of the storage system controller, host computer and user computer. This part can be distributed. Thus, the storage system controller need not be an included physical entity such as a computer. The storage system 170 can communicate with software residing on the media server 114 or a host computer such as the application server 102. The storage system may also include several software applications that may reside on or be driven by the same or different host computers. Although the storage system 170 can be implemented as a separate device in some embodiments, it is not limited to a separate device. As an example, the storage system 170 can be provided as an independent unit that acts as a replacement for the traditional tape library backup system “plug and play”. (That is, existing backup procedures and policies do not need to be modified.) Such storage system units are used in networked computing environments that include traditional backup systems and have extra or additional storage capacity. Can also be provided.

  As described above, according to one embodiment, the host computer 120 (e.g., can be the application server 102 or the media server 114. See FIG. 1) is a network link (e.g., Fiber Channel network) that connects the host computer 120 to the storage system 170. Data can be backed up on the backup storage medium 126 via the link 121. Although primarily described below for backing up data on emulated media, it should be recognized that this principle also applies to recovering backup data from emulated media. The data flow between the host computer 120 and the emulated medium 134 can be controlled by the backup / recovery application as described above. From the point of view of a backup / recovery application, data may appear to have actually been backed up on a physical version of the emulated media.

  As illustrated in FIG. 4, storage system software 150 refers to an emulated medium and provides one or more interfaces that provide an interface between backup / recovery application 140 and backup storage medium 126 residing on host computer 120. Includes a logical abstraction layer. Software 150 receives tape format data from backup / restore application 140 and converts it into data suitable for storage on a random access disk (eg, hard disk, optical disk, etc.). In one example, the software 150 is executed on the processor 127 of the storage system controller 122 and can be stored on the memory 129 (FIG. 3).

  According to one embodiment, the software 150 described above is a virtual tape library (VTL) that can provide tape, tape drive, and SCSI emulation of robotic mechanisms used to transfer tapes to and from tape drives. ) Layer 142 may be included. The backup / restore application 140 can communicate (eg, back up or record data on an emulated medium) with the VTL 142 using, for example, a SCSI command indicated by arrow 144. Therefore, VTL can be emulated by providing other storage system software and software interface between hardware and backup / recovery application, providing emulated storage media 134 to backup / recovery application Allow media to appear in backup / recovery applications as traditional separable backup storage media.

  A second software layer, referred to as the file system layer 146, can provide an interface between the emulated storage medium (denoted VTL) and the physical backup storage medium 126. In one example, the file system 146 operates as a small operating system and communicates with the backup storage medium 126 using, for example, SCSI instructions as indicated by arrows 148 so that data is transferred to or from the backup storage medium 126. Can be read and recorded.

  In one embodiment, the VTL provides general tape library support and can support any SCSI media changer. Emulated tape devices may include, but are not limited to, IBM LTO-1, LTO-2 tape devices, Quantum SuperDLT320 tape devices, Quantum P3000 tape library systems, or StorageTek L180 tape library systems. Each virtual cartridge in the VTL is a file that can grow dynamically as data is stored. This is quite different from a conventional tape cartridge having a fixed size. One or more virtual cartridges can be stored in a system file described below with reference to FIG.

  FIG. 5 is a diagram illustrating an example of a data structure in the file system software 146 representing the system file 200 according to the embodiment of the present invention. In this embodiment, the system file 200 includes a header 202 and data 204. The header 202 may include information identifying each virtual cartridge stored in the system file. The header 202 may include information such as whether or not the virtual cartridge can be prevented from being recorded, and the creation / modification date of the virtual cartridge. In one example, the header 202 includes information that uniquely identifies each virtual cartridge and distinguishes each virtual cartridge from other virtual cartridges stored in the storage system. For example, this information may include the name of the virtual cartridge and an identification number (eg, corresponding to a barcode typically provided on the physical tape so that the tape can be identified by a robotic mechanism). The header 202 may also include additional information such as the capacity of each virtual cartridge, the date of last modification.

  According to one embodiment of the present invention, the size of the header 202 is the number of unique sets of data that the system can track and the form of stored data (e.g., virtual data representing data backups from one or more host computer systems). Can be maximized to represent a cartridge). For example, data typically backed up to a tape storage system is typically characterized by a large number of systems and large data sets representing user files. Because the data set is large, the number of non-consecutive data files tracked against it can be small. Thus, in one embodiment, the size of the header 202 is sufficient to store too much data to effectively track (ie, the header is too large) and there is not enough space to store a sufficient number of cartridge identifiers. You can choose through the compromises (ie headers are too small). In one exemplary embodiment, header 202 utilizes the first 32 MB of system file 200. However, it should be appreciated that the header 202 can have various sizes based on the needs of the system and the needs and capacity of the system, and that various sizes for the header 202 can be selected.

  From a backup / restore application perspective, virtual cartridges all appear as physical tape cartridges with the same attributes and characteristics. That is, in backup / recovery applications, the virtual cartridge appears as a series of recorded tapes. However, in one preferred embodiment, the data stored on the virtual cartridge is not stored on the backup storage medium 126 in a series of formats. In return, the data that appears to be recorded on the virtual cartridge is actually disk format data that can be randomly accessed and stored in the storage system file. Metadata is used to link stored data to a virtual cartridge so that backup / recovery applications can read and record the data in cartridge format.

  Thus, to summarize one preferred embodiment, user and / or system data (meaning “file data”) is received by the storage system 170 from the host computer 120 and stored in a disk array comprising the backup storage medium 126. Is done. As will be described later, the storage system software 150 (FIG. 4) and / or hardware records this file data in the backup storage medium 126 in the form of a system file. The metadata is extracted from the file data backed up by the storage system controller and tracks the attributes of the backed up user and / or system file. For example, for each file, this metadata may include the file name, file creation date or last modification date, encryption information for the file, and other information. Also, metadata can be generated by the storage system for each file that links the file to the virtual cartridge. Using such metadata, the software provides emulation of the tape cartridge to the host computer, but the file data is not actually stored in tape format, but is returned and stored in a system file as described below. Rather than a series of cartridge formats, storing data in a system file has the advantage of allowing fast and efficient random access to individual files without having to scan through the series of data to find a specific file.

  As described above, according to one embodiment, file data (ie, user and / or system data) is stored as a system file on a backup storage medium, and each system file is an actual user and / or system file data. And header. The header 202 of each system file 200 includes a tape directory 206 that contains metadata that links the user and / or system file to the virtual cartridge. The term “metadata” refers to data that represents attributes of actual users and / or system data that are not user and / or system file data. According to one example, a tape directory may define a data layout on a virtual cartridge that is below the byte level. In one embodiment, the tape directory 206 has a table structure as illustrated in FIG. The above table includes a column 220 for the stored information type (eg, data, file marker (FM), etc.), a column 222 for the size of the disk block used in bytes, and a disk block in which the file data is stored. It includes a column 224 that reflects the number. Thus, the tape directory allows the controller to randomly (opposite continuous) access any data file stored on the backup storage medium 126. For example, as shown in FIG. 6, the tape directory indicates that the file data 226 starts a block from the beginning of the system file 200, so that the data file 226 can be quickly placed on the virtual tape. Since this one block corresponds to the file marker (FM), it has no size. File markers are not saved in system files. That is, the file marker corresponds to zero data. Used by traditional tape and backup / recovery applications, it records file markers along with taper files (even though the tape directory contains file markers) and also sees file markers when viewing virtual cartridges. This is because Thus, file markers track within the tape directory. However, since the file marker does not indicate any data, it is not stored in the data section of the system file. File data 226 begins at the first portion of the system file data section indicated by arrow 205 and is 1024 bytes long (ie, one disk block is 1024 bytes in size). It should be appreciated that other file data can be stored in other block sizes, not 1024 bytes depending on the amount of data, ie the size of the data file. For example, a larger data file can be stored using a larger block size for efficiency.

  In one example, the tape directory may be included in a “file descriptor” for each data file backed up to the storage system. The file descriptor includes metadata regarding the data file 204 stored in the storage system. In one embodiment, the file descriptor may be implemented in a standard format such as a tape archive (tar) format used in most Unix-based computer systems. Each file descriptor may include information such as the name corresponding to the user file, the creation / modification date of the user file, the size of the user file, and whether or not access to the user file can be restricted. The additional information stored in the file descriptor may further include information describing the directory structure where the data is copied. Thus, the file descriptor can include searchable metadata regarding the corresponding data file, as described below.

  From a backup / recovery application perspective, any virtual cartridge may contain multiple data files corresponding to file descriptors. From a storage system software perspective, data files are stored in system files that can be linked to specific backup operations. For example, a backup executed by one host computer at a specific time can generate one system file corresponding to one or more virtual cartridges. The virtual cartridge can be of any size and can grow dynamically as the number of user files stored on the virtual cartridge increases.

  Referring to FIG. 3 above, the storage system 70 may include a comprehensive full backup software application 240. In one embodiment, the host computer 120 backs up data on the emulated medium 134 to form one or more virtual cartridges. In some computing environments, a “full backup”, ie a backup copy of all data stored in the first storage system (FIG. 1) in the network, can be performed periodically (eg, weekly). This process is very time consuming because generally the data to be copied is large. Thus, in most computing environments, additional backups, single incremental backups, can be performed during continuous full backups, for example daily full backups. Since incremental backup is a process, only data that has changed since the last backup was performed, whether incremental or full, is backed up. Generally, even if a lot of data in a file is not changed frequently, the changed data is backed up to the file base. Therefore, incremental backups are performed faster because they are less than full backup cases. It should be recognized that most environments typically perform a full backup once a week and incremental backups daily, but such time frames do not need to be used. For example, some environments require incremental backups many times a day. The principles of the invention apply to all environments that use full backups (selective incremental backups) regardless of how often they are implemented.

  During the full backup procedure, the host computer can generate one or more virtual cartridges containing backed up data consisting of multiple data files. For clarity, the following description assumes that a full backup simply creates one virtual cartridge. However, it should be recognized that a full backup creates one or more virtual cartridges and the principles of the present invention are not limited to the number of virtual cartridges.

  According to one embodiment, a method is provided for generating an overall full backup data set from one existing full backup data set and one or more incremental backup data sets. This method does not require periodic (eg weekly) full backups, thus saving users time and network resources. Also, as will be apparent to those skilled in the art, for example, if the latest version of a file is present in an incremental backup, the backup / recovery application generally recovers the file based on the last full backup and then removes the file from the incremental backup. To apply all changes, recovery data based on a full backup and one or more incremental backups can be a time consuming process. Therefore, the provision of a comprehensive full backup allows backup recovery applications to recover data files more quickly based solely on a comprehensive full backup without having to recover from the full backup and one or more incremental backups You can have additional benefits. The term “latest version” generally refers to the most recent copy of a data file (ie, the most recent time the data file was saved), regardless of whether the file has a new version number. Should be recognized. The term “version” means a copy of the same file that can be modified in several ways or saved many times.

  FIG. 7 is a diagram schematically showing a comprehensive full backup procedure. The host computer 120 can perform a full backup 230 at an initial time, for example, at the weekend. The host computer 120 can perform continuous incremental backups 232a, 232b, 232c, 232d, 232e, for example, daily for a week. Subsequently, the storage system 170 can generate a comprehensive full backup data set 234 as described below.

  According to one embodiment, the storage system 170 may include the software application referred to as a comprehensive full backup application 240 (FIG. 3). The integrated full backup application 240 can be run on the storage system controller 122 (FIG. 2) or the host computer 120. The integrated full backup application 240 includes the software instructions and interfaces necessary to generate the integrated full backup data set 234. As an example, the comprehensive full backup application performs a logical merge of the metadata representations of each of the full backup data set 230 and the incremental backup data set 232 to create a new virtual A cartridge can be created.

  For example, as illustrated in FIG. 8, an existing full backup data set may include user files (F1, F2, F3, F4). The first incremental backup data set 232a may include a modified version of user file F2 F2 'and a modified version of F3 F3'. The second incremental backup data set 232b may include a modified version of user file F1, F1 ′, a further modified version of F2, F2 ″, and a new user file, F5. The overall full backup data set 234 is formed from a logical merge of the full backup data set 230 and the two incremental data sets 232a, 232b, and the final version of each user file (F1, F2, F3, F4, F5). Thus, as illustrated in FIG.

  As shown in FIGS. 3 and 4, the file system software 146 can generate a logical metadata cache 242 that stores metadata about each user file stored on the emulated medium 134. The logical metadata cache need not be a physical data cache, but can instead be a searchable collection of data stored on the storage medium 126. In another example, the logical metadata cache 242 can be implemented as a database. When metadata is stored in the database, traditional database instructions (e.g., SQL instructions) perform a logical merge of a full backup dataset and one or more incremental backup datasets to create a comprehensive full backup A data set can be generated.

  As described above, each data file stored on the emulated medium 134 includes a file descriptor that includes metadata about the data file and may include the location of the file on the backup storage medium 126. In one embodiment, a backup / recovery application running on the host computer 120 stores data in the streaming tape format on the emulated media 134. FIG. 9 is a diagram illustrating an example of a data structure 250 representing the tape format. As noted above, the system file data structure is not limited to the file descriptor for the data file, file creation and / or modification date, security information, directory structure of the host system from which the file originated, as well as other virtual cartridges. Includes a header with information about the data file, such as information linking the files. Such a header relates to the actual user backed up (copied) from the host computer, the first storage system, etc., and the data 254 which is a system file. The system file data structure may optionally include a pad 256 that can properly align the next header to the block boundary.

  As illustrated in FIG. 9, in one embodiment, the header data is placed in a logical metadata cache 242 to enable fast retrieval and random access to other series of tape data formats. By using the file system software 148 on the storage system controller 122, the use of the implemented logical metadata cache can be used to back up a linear, series of tape data formats stored on the emulated medium 134. 126 can be converted into a random access data format stored on the physical disks constituting the system. The logical metadata cache 242 includes a header 252 that includes a file descriptor for the data file, security information that can be used to control access to the data file, and a pointer 256 as discussed below for the virtual cartridge and backup. Save to the actual location of the data file on the storage medium 126. In one embodiment, the logical metadata cache stores data regarding all data files backed up to the full backup data set 230 and each incremental data set 232.

  According to one embodiment, the aggregate full backup application software 240 uses the information stored in the logical metadata cache to generate an aggregate full backup data set. Subsequently, this total full backup data set is linked to the total virtual cartridge generated by the total full backup application 240. In backup / recovery applications, the overall full backup data set appears to be stored on this overall virtual cartridge. As described above, the overall full backup data set can be generated by performing a logical merge of the existing full backup data set and the incremental backup data set. Such logical merging involves comparing each data file contained in each existing full backup dataset and each incremental backup dataset, and each of the final modified versions described with reference to FIG. May include creating a mix of user files.

  According to one embodiment, as illustrated in FIG. 10, the overall virtual cartridge 260 is a data file on another virtual cartridge, in particular a virtual cartridge containing an existing full backup data set and an incremental backup data set. Contains a pointer that points to a location. Considering the example given with respect to FIG. 8 above, the general virtual cartridge 260 is the user file (F4) in the existing full backup data set on the virtual cartridge 262 (the existing full backup data set is the latest version of the user file (F4)). And a pointer 266 pointing to (indicated by arrow 268) the location of the user file (F3 ′) in the incremental data set 232a on the virtual cartridge 264, for example.

  The total virtual cartridge also includes a list 270 that includes the identification numbers of all virtual cartridges that contain the data that the pointer 266 points to. This dependent cartridge list 270 may be important for tracking the location of actual data and preventing deletion of dependent virtual cartridges. In this embodiment, the overall full backup data set does not include the actual user file, but includes a set of pointers that indicate the location of the user file on the backup storage medium 126. Therefore, it is possible to prevent deletion of actual user files (stored on other virtual cartridges). This can be achieved in part by maintaining a record of the virtual cartridge containing the data (subordinate cartridge list 270) and preventing overwriting or deletion of each virtual cartridge. The total virtual cartridge may include cartridge data 272 that is the same as the size of the total virtual cartridge and the position of the total virtual cartridge on the backup storage medium 126. The total virtual cartridge may also have an identification number and / or name 274.

  According to other embodiments, the overall virtual cartridge may include a combination of pointers and actually stored user files. As shown in FIG. 11, in one example, the overall virtual cartridge points to the location of the data file (latest version as described with reference to FIG. 9) in the existing full backup data set 230 on the virtual cartridge 262. Pointer 266 to be included. The overall virtual cartridge can include data 278 that includes actual data files copied from the incremental data set 232 indicated by arrow 280. In this way, incremental backup data sets can be deleted after the overall full backup data set 276 is generated, thus saving storage space. The total virtual cartridge described above is smaller than the total virtual cartridge including a whole or part pointer instead of a copy of all user files.

  It should be recognized that a comprehensive full backup includes a combination of pointers and stored file data and is not limited to the above example. As an example, a comprehensive full backup may contain pointers to data files for a number of files stored in one incremental and / or full backup, copied from other existing full and / or incremental backups And stored file data. Alternatively, a comprehensive full backup does not contain any pointers, but all relevant incremental backups that contain the latest version of actual file data copied from the appropriate full and / or incremental backups, and the previous Can be generated based on a full backup.

  In one embodiment, the overall full backup application software compares the user and system file metadata for each existing full backup data set and incremental backup data set to determine where the latest version of each data file is located. To be able to determine. A differencing algorithm may be included. For example, a differentiating algorithm may be used to select the latest version of a data file by comparing generation date and / or modification date among different versions of the same data file in other backup sets. Can be used. However, the user can often open the user file and save the file (thus changing the modified data) without actually changing any data in the file. Thus, the system may implement a further improved differentiation algorithm that can analyze data in the system or user file to determine if the data has actually changed. Variations on such a differencing algorithm and other forms of comparison algorithms will be apparent to those skilled in the art. Also, as described above, when metadata is stored in a database format, database instructions such as SQL instructions can be used when performing a logical merge. The present invention can be applied to all algorithms that allow the latest or final version of each user file to be selected from an overall compared existing backup set so that a comprehensive full backup data set can be accurately generated.

  As will be apparent to those skilled in the art, a comprehensive full backup application allows a full backup data set to be generated and made available without the host computer having to perform a physical full backup. In addition to not burdening the host computer due to the burden of the processor that transfers data to the backup storage system, the utilization of network bandwidth is significantly reduced in embodiments where a comprehensive full backup application is run on the storage system. Let As illustrated in FIG. 7, an additional total full backup data set is generated using the first total full backup data set 234 and a series of incremental backup data sets 236. This can provide considerable time benefit to files or objects that are not frequently modified and not frequently copied. Instead, the overall full backup data set can simply maintain a pointer to the file that was copied once.

  As described above with reference to FIG. 3, the storage system may include a software application as the end user recovery application 300. Thus, according to another embodiment, a method is provided for an end user to locate and restore backup data without the need for IT stub interference and without having to change existing backup / recovery procedures and / or policies. In a typical backup storage system, the backup / recovery application that is driven by the host computer 120 is controlled by IT stubs, and it may not be possible for end users to access the backed up data without interference by IT stubs. Can be difficult. According to embodiments of the present invention, the storage system software provides for end users to locate and recover their files through, for example, a web infrastructure or other interface with the backup storage medium 126.

  It should be appreciated that the end-user recovery application 300 as well as the integrated full backup application 240 can be run on the storage system controller 122 or the host computer 120. The end-user recovery application includes software instructions and interfaces necessary for an authenticated user to search the logical metadata cache to find and selectively recover files backed up from the backup storage medium 126.

  According to one embodiment, software is provided that includes a user interface that is installed and / or executed on the user computer 136. The user interface can be any form of interface that allows the user to locate files on the backup storage medium. For example, the user interface can be a graphic user interface, a web infrastructure, a text interface, or the like. The user computer is connected to the storage system 170 through a network connection 138 such as an Ethernet connection. Through this network connection 138, the operator of the user computer 136 can access data stored in the storage system 170.

  In one example, the end user recovery application 300 includes user authentication and / or authentication features. For example, a user may be requested to log in through a user interface on a user computer that uses a username and password. The user computer can transfer the username and password with a storage system (eg, an end user recovery application) that can determine whether the user has accessed the storage system using an appropriate user verification mechanism. Some examples that may be included in, but not limited to, the user verification mechanism include Microsoft Active Directory server, Unix “yellow pages” server, or Lightweight Directory Acess Ptotocol. The login / user verification mechanism can communicate with the end-user recovery application to switch user rights. For example, some users may be able to search only for files they have created, or may have a predetermined authority or be identified as an owner. For example, a system operator or other user such as an administrator may be allowed access to all backed up files.

  According to one embodiment, the end user recovery application uses a logical metadata cache to obtain information for all data files backed up on a backup storage medium. End-user recovery applications are, for example, users categorized by backup time, backup date, user name, original user's computer directory structure (obtained when the file is backed up), or other file characteristics, etc. Provide the user with a hierarchical directory structure of files through the user interface. In one example, the directory structure provided to the user can vary depending on the privileges granted to the user. The end-user recovery application can receive a browsing request (ie, the user browses the directory structure to find a desired file through the user interface), or the user can search for a file by name, date, or the like.

  According to one embodiment, the user can recover files that were backed up from the storage system. For example, when a user searches for a desired file, the user can download the file from the storage system over the network connection 138 as described above. In one example, such a download procedure may be implemented in a manner comparable to web-based download, as known to those skilled in the art.

  By allowing end users with viewing / downloading permissions to access the file and allowing this access through the user interface, the end-user recovery application allows the user to back up his / her files, policies or procedures. Can be retrieved and restored without changing the password.

  According to other embodiments, methods and mechanisms are provided that allow a user to “mount” a network to which a view of a backup dataset stored on a backup storage medium 126 is attached. This is similar to a user viewing and accessing data on any other local or network drive connected to his computer, viewing data in a mounted dataset, So that, for example, the user does not perform the recovery process through the media server 144 (FIG. 1), and the application server [eg, if the system first storage device 106 (FIG. 1) fails] Data can be recovered effectively. Data recovery to an application server using the mounting procedure as described above can be tens of times faster than a typical media server that facilitates volume recovery. The term “mounting” should be recognized to mean making a network component, such as a network drive, or a data volume available to the host computer operating system. The data volume can include, for example, a single data file, a system file, a plurality of files, a directory structure including a plurality of files, or the like. The common mounting protocol (NFS) includes NFS (network file system) or CIFS (common internet file system) sharing. Such a protocol allows a host computer to access resources on other computers via a network connection through an interface where a remote resource appears to be provided locally on the host computer.

  FIG. 12 is a flowchart illustrating a method for performing volume mounting according to an embodiment of the present invention. In the first step 290, the user selects and mounts a data volume and communicates a volume mount request to the backup storage system controller 122 (FIG. 3). Generally, users want to recover data from a full backup data set (not an incremental backup data set) so that an overall and accurate representation of the backed up information can be captured Sometimes. If there is no current full backup dataset (for example, if the network manager wants to recover the data during the week by performing a full backup every week, but the current full backup is not available) ), A comprehensive full backup can be generated and used to recover the selected data.

  According to one embodiment, the backup storage system 170 can include a software application that is a volume recovery application 310 (FIG. 13) that can be implemented by controlling the method of performing data volume mounting and recovery procedures. A volume recovery application 310, similar to a comprehensive full backup and end user recovery application, can be run on a host computer and / or user computer, some of which include a storage system controller, a host computer, and Can be distributed to all or part of a user computer.

  Referring back to FIG. 12 above, after volume mounting is requested, the volume recovery application may query whether the current full backup data set is available (step 292). If not available, the volume recovery application can communicate with the general full backup application 240 to perform a general full backup process (see FIG. 1) and generate a current backup data set (step 294). The volume recovery application exports a regular full backup data set or a general full backup data set, and can perform the requested volume mounting by NFS or CIFS sharing. In particular, the volume recovery application queries the logical metadata cache 242 for appropriate metadata that indicates the full backup volume identified and selected in step 290.

  According to one embodiment, the mount request (stage 290) facilitates volume export to the volume recovery application creating one or more file descriptor structures for volume or NFS or CIFS share mounting (stage 296). FIG. 14 is a diagram illustrating one embodiment of a file descriptor structure 320 that may be generated by a volume recovery application, and the file descriptor 320 corresponds to a system file (eg, system file 322, see FIG. 15) in tape format. . As described above, the file descriptor includes searchable metadata corresponding to system files and data files stored in the storage system. The file descriptor 320 can include a plurality of fields including information such as a file permission (access control file) 324 and a file name 322 for a data file included in the mounted volume, for example. The file descriptor also includes one or more pointers 326 to the data file length 328, the location of the source data of the data file (ie, to identify the location where the data file is stored on the storage medium 126), and Contains a pointer 330 to the next entry (eg, the next data file) in the linked list file descriptor structure. For example, if the “next” field indicated by reference number 331 is null, the data file is the latest data file known to the system file indicated by file descriptor 320 (eg, recently linked). This is a list entry). Each system file included in the mounted data volume is represented by a file descriptor structure as shown in FIG. If each system file in the requested volume has a generated file descriptor 320, the file descriptor can be used to locate and export an associated data file that answers an NFS or CIFS request.

  As described above, in one embodiment, the file descriptor may be embodied in a standardized format, such as a tape archive (tar) format used in most Unix-based computer systems. FIG. 15 shows an exemplary system file 332 recorded in a tape format with segments of a tape (eg, tar) data stream. FIG. 16 shows a corresponding file descriptor 340 for the system file 332. As shown in FIG. 15, the file recorded in the tape format includes actual data 338 and a header 336 stored in the system file 332. Data 338 can correspond to one or more data files. In the illustrated example, the length of the system file 332 is 1032 bytes, but the file may have any length depending on the size of the file and the recorded format.

  A file descriptor 340 for the file 332 is included in the header 336. As shown in FIG. 16, and similar to the general example shown in FIG. 14, the file descriptor 340 stores a file name 341, security information 344, and each data known to the system file. And a "next" entry identifying the next data file known to the system file, which is null 348 in the illustrated example.

  Referring to FIG. 12 again, when all file descriptors for the files in the mounted data volume are generated, the volume recovery application allows the user to set the file system based on the generated file descriptor to a specific mount point. Exporting by NFS or CIFS share (step 298). At this point, the mount is complete (step 299) and the mounted data volume is available for the user to read and / or record data, as described below.

  According to one embodiment, the NFS or CIFS interpretation operation [ie, the user wants to view the data in the mounted data volume] is retrieved through a file descriptor to match the file specification. Serviced at. According to one embodiment, the user must be aware that he does not have to actually retrieve the file descriptor directly. Instead, the volume recovery application may include a user interface that provides data to the user, for example, in a typical directory structure format. The volume recovery application may include software that converts a user request for a particular file into a search command that accesses a logical metadata cache to search for a file descriptor 320 for a matching system file. If the file can be found, the data transfer to the user computer can be done by following the linked list [ie, following the pointer stored in the file descriptor to find the actual data]. Create a buffer for file data that can be achieved and sent to the requesting user.

  According to other embodiments, a mechanism may also be provided for the user to record new data in the mounted volume. As described above, the mounted volume data may be viewed by the user as a normal network drive or other network-stored data. In practice, however, the original mounted volume data is generally the actual backup data that needs to be protected, at least until another backup data set is generated. Therefore, it may not be desirable to allow the user to actually modify the original backup data. While the user can modify the data corresponding to the mounted volume, a mechanism is provided to switch to recording to other storage media, as described below, to prevent modification of the backup data. .

  FIG. 17 is a flowchart illustrating a method for processing a recording request according to an exemplary embodiment of the present invention. In an initial step 350, the user requests an NFS or CIFS recording operation (typically by selecting the “Save” option while editing or viewing a data file). The volume recovery application searches for an available storage space, records data in the space, and executes a recording request by updating a suitable file descriptor for referring to the newly recorded data.

  According to one embodiment, the volume recovery application queries whether storage space for recording data has already been distributed (step 352) and, if not, distributes the storage space (step 354). The storage space can be distributed to the backup storage medium 126 (FIG. 13). The distributed storage space can be specifically indicated to hold only the recorded data (relevant metadata is optional).

  FIG. 18 is a diagram showing an example of NFS or CIFS recording data stored in the backup storage medium 126. The recorded data 360 includes, for example, two recorded portions, w1_362 and w2_364, corresponding to stored data generated as a result of a recording command serviced by the volume recovery application. For example, w1 and w2 can correspond to modified data files contained within the mounted data volume. Although illustrated in response to two recording requests, the principles of the present invention can be applied without limitation to the number of recording requests, and the file can be modified to suit the number of recording requests. It must be recognized. The recorded data 360 also includes a header that includes metadata that forms a magnetic display relationship between the original data (eg, file 332) and the newly recorded data 360. In particular, as will be described later with further reference to FIG. 19, the header may include offset information indicating where the recorded data portions w1, w2 are logically present in relation to the original data.

  FIG. 19 is a diagram illustrating an example of a system file layout after two recording requests are serviced. The original system file 332 is stored in the backup storage medium 126 (FIG. 13) and provided to the user through the mounting procedure described above. The system file 332 illustrated in FIG. 19 is in a data format, and the data portion 338 can include multiple data files (eg, user files). The data starts at offset zero byte (point 370) and ends at point 372 later. The recorded file 360 records data in the file 332 in response to a user request. For example, the user can modify two data files included in the system file 332, and as a result, the recorded file 360 includes w1 and w2. As described above, the recorded file 360 can be separated from the file 332 on the storage medium and stored so as not to change the original backup data. A logically modified system file 380 is shown and shows a file 332 that includes changes by the user (ie, recorded file 360) through a recording request. That is, in the modified system file 380, w1 and w2 (user modified data files) replace the original data file included in the data portion of the original system file 332 without removing the backed up data. Can be used for that.

  As shown in FIG. 19, the modified system file corresponds to the logical summation of the original system file 332 and the recorded file 360. As shown, the original system file data 338 starts at offset zero within the original file. At offset 64 (reference number 384), the first portion (W1) of the modified data begins and ends at offset 73 (reference number 386) with 9 bytes added. Accordingly, the user modified data file W1 according to the user's recording request can be used to replace the original data file located at the offset 64 in the original system file 332. Since W1 exists from offset 0 (390) in the recorded file 360 and ends with offset 9 (392) in the recorded file 360, the length of W1 is 9 bytes. The start position (in the illustrated example, offset 64) of the modified in-file W1 is determined by the information stored in the header 366, ie, the relative relationship between the recorded file 360 and the original file 332. The The W2 portion is also included in the modified file 380, starting at offset 1032 (original end of file, reference number 372) and logically extending the file by 100 bytes. Also, the length of W2 is determined from the information located in the header 366. The new end point of the file is indicated by reference number 388.

  The modified file is logically generated and represented in the user modified version of the original file, but the newly recorded data represented by the file 360 is not actually saved as part of the original file 332. Instead, as described above, the newly recorded data is stored at a specific location on the identified storage medium for recording the data. In this manner, like a general local or network drive, the user can mount the volume on which the user is mounted, while maintaining the integrity of the original backup data.

  The modified file 380 includes a header 382 that includes a file descriptor that indicates the modified file. FIG. 20 shows an example of such a file descriptor 400. The file descriptor 400 includes a name field 402 that identifies the file name of the modified file 380 and a security field 404 that identifies the allowed attributes of the modified file 380. File descriptor 400 also includes a plurality of data fields including a pointer to original file 332 for capturing data stored in each original file and recorded file, and a pointer to recorded file 360. By continuously following the linked list of pointers provided to the file descriptor 400, an indication of the modified file 380 is provided.

  19 and 20 illustrate an example of a file descriptor for the modified file. In the first data field 406 is located a pointer to the first data file location in the modified file 380 at the offset zero byte identified by reference numeral 408 in FIG. The following field 410 displays the length of the data file whose position is specified by the pointer 406. In the illustrated example, as seen in FIG. 19, the length is 64 bytes (data is extended between a zero offset point 408 and a 64 byte offset 384). The next field 412 indicates that the next data file in the modified file 380 is W1, as illustrated in FIG. Therefore, the pointer 414 indicates that the position of the data corresponding to W1 is stored in the file 360 newly recorded with the zero offset pointer (reference number 390, FIG. 19). The length field 416, as seen in FIG. 19, is that W1 is extended in the modified file 380 between offset 64 (384) and offset 73 (386), and the length of W1 is 9 bytes. Display. Next field 418 indicates that the next data file in modified file 380 is a data file from original system file 332. The pointer in field 420 indicates that the next data file is located at offset 73 (reference number 386 in FIG. 19) in the modified file 380. Field 422 indicates that the length of the data file is 959 bytes, as shown in FIG. The next field 424 indicates that the following data file is W2. The pointer in the field 426 displays the position of W2, that is, the offset 9 of the newly recorded file 360, as shown in FIG. Field 428 has a length of W2 of 100 bytes, the next field 430 contains a null, and W2 is the final data file in the modified file 380 as illustrated in FIG. Display that there is. Accordingly, the file descriptor 400 includes a “roadmap” indicating the structure of the modified file 380 and the location of the data contained in the modified file 380.

  The volume recovery application and method described above display a series of tape format data in a form compatible with a random access I / O system such as NFS or CIFS. A linked list file descriptor, such as file descriptor 400, along with the location on the storage medium of each data file in a particular tar stream, for example, in a tar stream associated with other data files in the tar stream. By recording the location of each data file, a series of tape format data can be used to convert the data into randomly accessible data. Also, according to one embodiment, the volume recovery application can store the modified (ie, recorded) data back on tape (eg, tar) so that the backup / recovery application can access the data in the normal manner described above. ) May include provisions that are displayed in a format. According to one embodiment, the instant recovery application includes facilities for generating a virtual cartridge appropriately formatted with tape headers, pads, data, and file markers in the manner described above with respect to file system software. In other embodiments, the volume recovery application can interface with file system software to create a virtual cartridge as described above that includes newly recorded and modified files.

  In the present invention, software terms such as comprehensive full backup application, end user recovery application, and volume recovery application are mainly used, but other forms are selectively used in software, hardware or firmware, or a combination thereof. It must be recognized that it can be implemented. Accordingly, the embodiment of the present invention is at least partially executed by the processor of the storage system and is encoded by a computer program when performing the functions of the comprehensive full backup application and / or end user recovery application as described above. Any computer readable medium (eg, computer memory, floppy disk, compact disk, tape, etc.) may be included.

  In summary, embodiments according to the present invention emulate a conventional tape backup system, but allow end-users to view or recover backed up files and provide enhanced functionality that can generate a comprehensive backup. Systems and methods are included. However, various forms according to the present invention can be used other than backup of computer data. The storage system according to the present invention can also be used to economically store large amounts of data where the stored data can be accessed randomly rather than continuously during hard disk access time. It can be implemented with other than the backup storage system. For example, embodiments according to the present invention can be used to store video and / or audio on demand capable video and / or audio data, meaning a wide selection of movies and music.

  It should be appreciated that those skilled in the art can make various variations, modifications, and improvements from the detailed description of several aspects of one or more embodiments of the present invention. Such variations, modifications, and improvements are intended as part of this detailed description, and are intended within the spirit of the invention. Accordingly, the foregoing description and drawings are for illustrative purposes only.

1 is a block diagram illustrating an example of a large networked computing environment that includes a backup storage system. FIG. 1 is a block diagram of one embodiment of a networked computing environment including a storage system according to the present invention. 1 is a block diagram of an embodiment of a storage system according to the present invention. FIG. It is the block diagram which showed the virtual layout of one Embodiment of the storage system by this invention. 3 is a schematic layout of an example of a system file according to an embodiment of the present invention. 3 is a diagram illustrating an example of a tape directory structure according to an exemplary embodiment of the present invention. 3 is a diagram illustrating an example of a method for generating a comprehensive full backup according to an exemplary embodiment of the present invention. 2 is a schematic drawing of an example of a series of backup data sets including a comprehensive full backup according to an embodiment of the present invention. 2 is a diagram illustrating an example of a metadata cache structure. It is drawing which showed an example of the virtual cartridge which preserve | saves a comprehensive full backup data set. It is drawing which showed the other example of the virtual cartridge which preserve | saves a comprehensive full backup data set. 2 is a flowchart of one embodiment of a method for recovering data from a backup storage system according to an embodiment of the present invention; FIG. 6 is a block diagram of another embodiment of a networked computing environment including a backup storage system according to an embodiment of the present invention. 3 is a diagram illustrating an example of a file descriptor structure according to an exemplary embodiment of the present invention. 6 is a diagram illustrating an example of a method in which file data can be stored in a tape format. FIG. 16 is a diagram illustrating a file descriptor for the file illustrated in FIG. 15. 4 is a flowchart of a method for recording data in a data volume mounted according to an embodiment of the present invention. It is drawing which showed an example of the newly recorded file. 6 is a diagram illustrating an example of a relationship between an original file, a newly recorded file, and a finally modified file according to an exemplary embodiment of the present invention. FIG. 20 is a diagram illustrating an example of a file descriptor indicating the modified file illustrated in FIG. 19. FIG.

Claims (11)

  1. Mounting on a host computer a data volume that includes one or more data files corresponding to one or more data files that were most recently backed up stored in a backup storage system; and one of the most recently backed up versions While maintaining one or more data files, data corresponding to one or more data files of the second version more recent than the one or more data files of the most recently backed up version stored in the backup storage system Storing in the backup storage system.
  2.   The method of claim 1, further comprising linking the most recently backed up version of one or more data files with the second version of one or more data files.
  3.   The method of claim 1, further comprising generating a data structure that identifies one or more data files of the most recently backed up version and one or more data files of the second version. the method of.
  4.   The method of claim 3, wherein the one or more data files of the second version are modified versions of the one or more data files of the most recently backed up version.
  5.   The method of claim 1, wherein mounting the data volume includes performing one of NFS mounting and CIFS mounting.
  6.   Mounting the data volume includes generating a file descriptor that includes metadata associated with one or more data files of the most recently backed up version, wherein the metadata includes the version of the most recently backed up version. The method of claim 1, including an identifier identifying a storage location of one or more data files on a backup storage medium.
  7. A backup storage medium for storing the backup data set, and
    A backup storage system comprising a controller including one or more processors configured to execute a set of instructions embodying the method of claim 1.
  8.   The backup storage system according to claim 7, wherein the backup data set is a comprehensive full backup data set.
  9.   A computer readable medium encoded with a plurality of instructions embodying the method of claim 1 when executed on one or more processors.
  10.   The computer-readable medium of claim 9, wherein the processor is included in a backup storage system.
  11. A first identifier uniquely identifying a system file corresponding to a backup data set including one or more data files, and an individual on the storage medium each storing one or more data files of the latest version of the backup data set A computer readable medium having stored therein a data structure including one or more second identifiers for identifying a storage location.
JP2006534090A 2003-08-05 2004-09-30 Emulated storage system that supports instant volume recovery Granted JP2007527572A (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
US50732903P true 2003-09-30 2003-09-30
US10/911,987 US7146476B2 (en) 2003-08-05 2004-08-05 Emulated storage system
PCT/US2004/032122 WO2005033945A1 (en) 2003-09-30 2004-09-30 Emulated storage system supporting instant volume restore

Publications (1)

Publication Number Publication Date
JP2007527572A true JP2007527572A (en) 2007-09-27

Family

ID=34426002

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2006534090A Granted JP2007527572A (en) 2003-08-05 2004-09-30 Emulated storage system that supports instant volume recovery

Country Status (4)

Country Link
EP (1) EP1683028A4 (en)
JP (1) JP2007527572A (en)
KR (1) KR20060080239A (en)
WO (1) WO2005033945A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2013531321A (en) * 2010-07-20 2013-08-01 中▲興▼通▲訊▼股▲フン▼有限公司 Database backup data restoration method and apparatus

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8280926B2 (en) 2003-08-05 2012-10-02 Sepaton, Inc. Scalable de-duplication mechanism
US8938595B2 (en) 2003-08-05 2015-01-20 Sepaton, Inc. Emulated storage system
GB2431770B (en) 2005-10-31 2011-09-21 Hewlett Packard Development Co Emulated tape-based storage media
US7835900B2 (en) 2007-04-27 2010-11-16 Hewlett-Packard Development Company, L.P. Emulated tape-based storage media
US8495312B2 (en) 2010-01-25 2013-07-23 Sepaton, Inc. System and method for identifying locations within data
US8688651B2 (en) 2011-01-25 2014-04-01 Sepaton, Inc. Dynamic deduplication
US9766832B2 (en) 2013-03-15 2017-09-19 Hitachi Data Systems Corporation Systems and methods of locating redundant data using patterns of matching fingerprints
US9256611B2 (en) 2013-06-06 2016-02-09 Sepaton, Inc. System and method for multi-scale navigation of data
US9678973B2 (en) 2013-10-15 2017-06-13 Hitachi Data Systems Corporation Multi-node hybrid deduplication
KR101594804B1 (en) 2014-07-11 2016-02-17 주식회사 코인스정보기술 Assembly for cctv installation

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001031431A2 (en) * 1999-10-26 2001-05-03 Storage Technology Corporation Management of virtual tape volumes using data page atomic units
WO2003014909A2 (en) * 2001-08-08 2003-02-20 International Business Machines Corporation Method and system for accessing tape devices in a computer system
JP2003058326A (en) * 2001-08-17 2003-02-28 Hitachi Ltd Method and device for acquiring backup

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5778395A (en) * 1995-10-23 1998-07-07 Stac, Inc. System for backing up files from disk volumes on multiple nodes of a computer network
US6230190B1 (en) * 1998-10-09 2001-05-08 Openwave Systems Inc. Shared-everything file storage for clustered system
US20030105912A1 (en) * 2001-11-30 2003-06-05 Noren Gregory T. Space efficient backup technique in a storage system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001031431A2 (en) * 1999-10-26 2001-05-03 Storage Technology Corporation Management of virtual tape volumes using data page atomic units
JP2003513352A (en) * 1999-10-26 2003-04-08 ストレイジ・テクノロジー・コーポレイション Virtual tape volume management device that uses a data page atomic unit
WO2003014909A2 (en) * 2001-08-08 2003-02-20 International Business Machines Corporation Method and system for accessing tape devices in a computer system
JP2004538569A (en) * 2001-08-08 2004-12-24 インターナショナル・ビジネス・マシーンズ・コーポレーションInternational Business Maschines Corporation Method and system for accessing a tape device in the computer system
JP2003058326A (en) * 2001-08-17 2003-02-28 Hitachi Ltd Method and device for acquiring backup

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2013531321A (en) * 2010-07-20 2013-08-01 中▲興▼通▲訊▼股▲フン▼有限公司 Database backup data restoration method and apparatus

Also Published As

Publication number Publication date
KR20060080239A (en) 2006-07-07
EP1683028A4 (en) 2009-12-02
WO2005033945A1 (en) 2005-04-14
EP1683028A1 (en) 2006-07-26

Similar Documents

Publication Publication Date Title
US7487228B1 (en) Metadata structures and related locking techniques to improve performance and scalability in a cluster file system
US7222194B2 (en) Backup system
US8229897B2 (en) Restoring a file to its proper storage tier in an information lifecycle management environment
US7680842B2 (en) Systems and methods for a snapshot of data
US7308528B2 (en) Virtual tape library device
US8230195B2 (en) System and method for performing auxiliary storage operations
US6269431B1 (en) Virtual storage and block level direct access of secondary storage for recovery of backup data
US7953945B2 (en) System and method for providing a backup/restore interface for third party HSM clients
US7092976B2 (en) Parallel high speed backup for a storage area network (SAN) file system
US7580950B2 (en) Clustered hierarchical file system
US7831793B2 (en) Data storage system including unique block pool manager and applications in tiered storage
US6185574B1 (en) Multiple display file directory and file navigation system for a personal computer
US6353878B1 (en) Remote control of backup media in a secondary storage subsystem through access to a primary storage subsystem
US7801993B2 (en) Method and apparatus for storage-service-provider-aware storage system
EP0733235B1 (en) Incremental backup system
US5819296A (en) Method and apparatus for moving large numbers of data files between computer systems using import and export processes employing a directory of file handles
CA2153769C (en) Apparatus and method for transferring and storing data from an arbitrarily large number of networked computer storage devices
US8478729B2 (en) System and method for controlling the storage of redundant electronic files to increase storage reliability and space efficiency
US20050119994A1 (en) Storage device
AU2010339584B2 (en) Systems and methods for performing data management operations using snapshots
US20110276540A1 (en) Method, system, and program for archiving files
US9652335B2 (en) Systems and methods for restoring data from network attached storage
US8386733B1 (en) Method and apparatus for performing file-level restoration from a block-based backup file stored on a sequential storage device
US20050204108A1 (en) Apparatus and method for copying, backing up and restoring logical objects in a computer storage system by transferring blocks out of order or in parallel backing up and restoring
CN100419664C (en) Incremental backup operations in storage networks

Legal Events

Date Code Title Description
A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20071001

A521 Written amendment

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20071012

A524 Written submission of copy of amendment under section 19 (pct)

Free format text: JAPANESE INTERMEDIATE CODE: A524

Effective date: 20071025

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20100713

A601 Written request for extension of time

Free format text: JAPANESE INTERMEDIATE CODE: A601

Effective date: 20101012

A602 Written permission of extension of time

Free format text: JAPANESE INTERMEDIATE CODE: A602

Effective date: 20101019

A02 Decision of refusal

Free format text: JAPANESE INTERMEDIATE CODE: A02

Effective date: 20110125