US20070198690A1 - Data Management System - Google Patents

Data Management System Download PDF

Info

Publication number
US20070198690A1
US20070198690A1 US11/733,305 US73330507A US2007198690A1 US 20070198690 A1 US20070198690 A1 US 20070198690A1 US 73330507 A US73330507 A US 73330507A US 2007198690 A1 US2007198690 A1 US 2007198690A1
Authority
US
United States
Prior art keywords
data
server
information
storage
stored
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US11/733,305
Inventor
Shoji Kodama
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hitachi Ltd
Original Assignee
Hitachi Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hitachi Ltd filed Critical Hitachi Ltd
Priority to US11/733,305 priority Critical patent/US20070198690A1/en
Publication of US20070198690A1 publication Critical patent/US20070198690A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/9032Query formulation
    • G06F16/90324Query formulation using system suggestions
    • G06F16/90328Query formulation using system suggestions using search space presentation or visualization, e.g. category or range presentation and selection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99941Database schema or data structure
    • Y10S707/99942Manipulating data structure, e.g. compression, compaction, compilation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99941Database schema or data structure
    • Y10S707/99943Generating database or data structure, e.g. via user interface
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99941Database schema or data structure
    • Y10S707/99944Object-oriented database structure
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99941Database schema or data structure
    • Y10S707/99944Object-oriented database structure
    • Y10S707/99945Object-oriented database structure processing
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99951File or database maintenance
    • Y10S707/99952Coherency, e.g. same view to multiple users
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99951File or database maintenance
    • Y10S707/99952Coherency, e.g. same view to multiple users
    • Y10S707/99953Recoverability

Definitions

  • This invention relates to systems for storing data, and in particular to storage systems in which data is distributed among large numbers of hard disk drives or other storage media.
  • an e-mail server In a typical data storage network, data from many different applications is stored and retrieved, and it is difficult to track the relationships among all of the data stored.
  • an e-mail server generates original data and provides it to a storage system.
  • An archive server may archive some parts of the data to different parts of the storage system or to different storage systems.
  • a replication server may replicate the original data to different storage, and the data may be backed up by a backup server to yet further storage. While each of these data handling processes operate on the data associated with that process in an appropriate manner, the archive server, the replication server and the backup server each operate independently. Each has its own catalog or other mechanism for managing how the data is stored and retrieved. Because of the distributed nature of the system and the lack of consolidated catalogs, a user of a storage system typically cannot understand where data is situated in that storage system on a reliable basis.
  • the complexity of storage systems increases the probability of mistakes.
  • some parts of the original data are not stored in the original storage, but instead have been stored in the archive storage.
  • a replication of the original data will not contain the archive data.
  • the backup data will also not contain the archive data. Therefore, when a user restores data from the backup, because the backup data is not a complete backup of the original data, not all of the original data will be restored. All of this complexity makes managing the data in a coherent manner difficult and error-prone.
  • SANPoint Control a commercially available tool for use in management of a data storage system
  • AppIQ storage authority suite
  • This system provides information about the hardware in the storage system, including hosts, bus adapters, switches, disk subsystems, etc. It also provides capabilities for management of particular applications running on the storage system, for example, Oracle databases, file servers, etc.
  • the Aptare StorageConsole Another commercially available tool for use in storage systems is the Aptare StorageConsole.
  • This application software provides increased reliability for backup and restore operations in a storage system.
  • the Storage Resource Broker from Nirvana is software that enables users of systems to share and manage filed stored in various locations. It provides various searching and presentation functions to enable users to find particular files or information stored in various portions of large data storage units.
  • a system which enables a user of the system to have a complete view of the data handling processes and the relationships among processes for management of the data to reduce the chance of error and improve the efficiency with which the data is managed.
  • a system provides a method for collecting information about data and data handling processes from different types of data applications.
  • This invention enables a user of the system to appreciate relationships among the data. It shows the data in a system view and can illustrate the relationships among the data stored in the system with a graphical user interface.
  • a data manager collects information about the relationships among data and files stored therein and presents them to a user.
  • the graphical user interface provides the user with the option of choosing from among three different views of data handling processes. These include a data view which illustrates how data are related to each other, for example, by showing where a particular file has been archived, replicated, or backed up. Preferably the system also provides a storage view which illustrates how the data volumes are related, for example, indicating which volumes in the storage system have the original data, the archived data, replica data, and backed up data.
  • a third view for information in the storage system is referred to as the path view.
  • the path view illustrates how data is transferred through the system by various data handling processes, for example indicating which ports, switches, and storage handle particular files or other data.
  • a system according to this invention provides a way to detect erroneous configurations of backup data by comparison of the amount of backup data with the amount of original data.
  • a storage system having a replication server, a backup server, and an archive server further includes a data manager which tracks the stored data in at least two of three approaches.
  • the stored data is tracked by presenting file name relationships among the replicated, backup, or archived copies of the stored data.
  • the physical locations within the storage system for example, in terms of volumes, are presented.
  • path information depicting the processes by which the data arrived at its storage location are provided for the replicated, backup, or archived copies of the stored data.
  • FIG. 1 is a block diagram illustrating a system configuration for a typical storage area network including a data manager according to this invention
  • FIG. 2 illustrates an archive catalog for an archive profile
  • FIG. 3 illustrates an archive catalog for media information
  • FIG. 4 illustrates an archive catalog for archived data
  • FIG. 5 illustrates a backup catalog for a backup profile
  • FIG. 6 illustrates a backup catalog for media information
  • FIG. 7 illustrates a backup catalog for backup data
  • FIG. 8 illustrates a replication catalog
  • FIG. 9 illustrates a device catalog for a volume
  • FIG. 10 illustrates a device catalog for storage
  • FIG. 11 illustrates a device catalog for a file system
  • FIG. 12 illustrates a device catalog for a path
  • FIG. 13 illustrates a device catalog for an application
  • FIG. 14 illustrates an archive catalog for an archive profile
  • FIG. 15 illustrates an archive catalog for archived data
  • FIG. 16 is a block diagram of one example of interconnections in a storage system
  • FIG. 17 illustrates a data descriptor
  • FIG. 18 illustrates a relationship descriptor for archived data
  • FIG. 19 illustrates a relationship descriptor for backup data
  • FIG. 20 illustrates a relationship descriptor for replication data
  • FIG. 21 illustrates a relationship descriptor for application data
  • FIG. 22 illustrates another relationship descriptor for archived data
  • FIG. 23 illustrates a discovered configuration table
  • FIG. 24 is an example of a discovered data table
  • FIG. 25 is an example of a discovered relationship table
  • FIG. 26 is an example of a GUI for a view of the data
  • FIG. 27 is an illustration of a GUI for a view of the storage system
  • FIG. 28 is an example of a GUI for a view of the path information
  • FIG. 29 illustrates a process for data discovery
  • FIG. 30 illustrates details of the Get Data From App process shown in FIG. 29 ;
  • FIG. 31 illustrates details of the Get Data From Backup process shown in FIG. 29 ;
  • FIG. 32 illustrates further details of the Get Data From Backup process shown in FIG. 29 ;
  • FIG. 33 illustrates details of the Get Data From Archive process shown in FIG. 29 ;
  • FIG. 34 illustrates further details of the Get Data From Archive process shown in FIG. 29 ;
  • FIG. 35 illustrates details of the Get Data from Replica process shown in FIG. 29 ;
  • FIG. 36 is a flow chart illustrating the steps for depicting the data view
  • FIG. 37 is a flow chart illustrating the steps for depicting the storage view
  • FIG. 38 is a flow chart illustrating the steps for depicting the path view.
  • FIG. 39 is a flow chart illustrating the steps for checking backup operations
  • FIG. 1 is a block diagram illustrating a hypothetical typical storage system as might be found in a complex computing environment. Most of the components of the system shown in FIG. 1 are well known and thus are discussed only briefly herein. The data manager 111 , however, is not well known and is explained in detail below.
  • the system shown in FIG. 1 includes two application servers 101 and 102 . These servers run computer programs 101 a and 102 a to provide computing resources to users of the overall system. By execution of a stored program, the applications 101 a and 102 a generate data which is stored in the system illustrated in FIG. 1 .
  • a replication server 103 replicates data to different storage systems or volumes within the storage system to provide well known mirroring functionality.
  • the replication server maintains a replication catalog 106 as will be discussed below.
  • a backup server 104 provides data backup functionality to enable restoration of data at a later date should there be hardware, software, or facilities failures.
  • a backup catalog 107 maintains a record of the backup operations, as also discussed below.
  • Server 105 archives little used data from primary storage areas to secondary storage areas to provide improved system performance and to reduce costs by maintaining the data on lower cost media.
  • archive server 105 maintains an archive catalog 108 , also explained further below.
  • servers 101 - 105 have been discussed as though each were a standalone hardware implementation, this is not necessary.
  • the servers may be implemented as separate processes running on a single large computer, or as separate processes running on separate processors within a connected array of computers.
  • the system shown in FIG. 1 also includes a storage area manager 109 .
  • the storage area manager is preferably a management server that manages the entire network depicted in FIG. 1 , including the servers and the storage systems 115 , 116 , and 117 .
  • the storage area manager maintains a device catalog 110 which is also discussed below. In essence, the storage area manager can retrieve information from the switches 114 , servers 101 . . . 105 , storage systems 115 - 117 , and the applications 101 a, 102 a.
  • Storage area managers such as depicted in FIG. 1 are often implemented using a standard protocol such as DMTF's CIM. Another way to implement the storage area manager is to install an agent on the server and have the agent collect information about the server locality and provide it to the storage area manager.
  • switches 114 have become an increasingly popular connection technique. These switches are typically switches based on Fibre Channel, Ethernet, or broadband technology.
  • the data received by the system or generated by the system as the result of its server operations is stored in storage systems such as 115 , 116 , and 117 .
  • Each such storage system includes a disk controller 118 , 119 , and 120 , respectively, as well as hard disk drives 118 a . . . 120 b for storing data.
  • FIG. 1 illustrates only two disk drives per storage system. In conventional implementations, however, hundreds of disk drives may be employed in the storage system.
  • the disk controllers 118 , 119 and 120 control input and output requests issued from the servers to store and retrieve data from the hard disk drives.
  • Storage system 115 is an enterprise Fibre Channel storage system. Such systems typically support SCSI as a data protocol between the servers and the storage systems.
  • the Nearline PC storage system 116 operates in a similar manner, however, using ATA format hard disk drives.
  • the Network Attached Storage system 117 supports NFS and CIFS as file protocols.
  • the system of this invention can be applicable to any type of storage system.
  • FIG. 1 The components and systems shown in FIG. 1 are interconnected using two techniques.
  • a network 100 is provided, for example based on TCP/IP/Ethernet to provide “out of band” communications.
  • the main data handling, however, for the storage systems is provided by switches 114 which allow interconnections of desired components as necessitated by the particular operations to be performed.
  • the system of this invention adds an additional component 111 , referred to herein as a data manager, to the overall system of FIG. 1 .
  • This data manager communicates with the other components via the local area network 100 and the switches 114 .
  • the data manager functions to collect data handling process information from the applications and the data applications and present the results to a user.
  • the results are typically presented through a graphical user interface running on a console 113 .
  • the data manager maintains a data catalog.
  • the data catalog enables the data manager to present to the user various “views” of the storage system.
  • the data manager 111 and data catalog together enable a user to view information about the physical locations where various files are stored, the path by which the information was stored, and other relationships among the data stored in the storage systems 115 , 116 , and 117 .
  • the data manager 111 creates and manages data descriptors, relationship descriptors, a discovered data table (discussed below) and a discovered relationship table (also discussed below). These tables are typically stored in local storage or network storage attached to the data manager.
  • the data manager also uses a discovery configuration table as discussed below.
  • the data manager itself may be configured by the console 113 .
  • the data manager relies upon catalogs created and stored throughout the system as designated in FIG. 1 . These catalogs are discussed next.
  • FIG. 2 is a diagram illustrating an archive catalog for the archive profile. This catalog is included within the catalog 108 shown in FIG. 1 .
  • the catalog 200 shown in FIG. 2 describes which data is to be archived, at what time, and to which storage. In the example shown in FIG. 2 the data is to be archived if it is not accessed within 30 days.
  • the data to be archived is set forth as the Folder, and the media to which it is to be archived is listed under Archive Media.
  • FIG. 3 illustrates an archive catalog for media information. This catalog is also included within catalog 108 shown in FIG. 1 .
  • the example in FIG. 3 illustrates that the Archive Media is actually an Archive Folder having a specified address associated with the specific server.
  • FIG. 3 also indicates that the Folder has a maximum capacity as shown.
  • FIG. 4 is a diagram illustrating an archive catalog for archive data. This catalog is included within catalog 108 shown in FIG. 1 .
  • the indicated Source Data is shown as being archived at the designated media location as an Archive Stream at the Archive Time shown in FIG. 4 .
  • FIGS. 5-7 illustrate backup catalogs stored as catalog 107 in FIG. 1 .
  • FIG. 5 an exemplary backup catalog for a backup profile is illustrated. This catalog describes how and when data is to be backed up.
  • files under the folder designated by Source are to be backed up to the Backup Media at the Backup Time stated.
  • the Backup Type indicates that all files are to be backed up, while the Next Backup Time indicates the time and date of the next backup operation.
  • FIG. 6 is a diagram illustrating a backup catalog for media information. In a similar manner to FIG. 3 , it illustrates the physical location of the particular media designated, as well as its capacity.
  • FIG. 7 illustrates a backup catalog for backup data. This catalog describes when and where data is backed up. In the example shown, two files as designated by Data Source have been backed up to the Backup Media at the time shown.
  • FIG. 8 is a diagram illustrating a replication relationship between two devices in the storage system, and is referred to as a replication catalog. This diagram provides additional information with regard to the replication catalog 106 in FIG. 1 .
  • the replication catalog describes the relationship between two data storage locations, commonly known as LDEVs in the storage system. As shown by FIG. 8 , the data in the Primary Storage is replicated to the Secondary Storage location. The Mode indicates whether the backup is to be synchronous or asynchronous.
  • FIG. 9 is a diagram illustrating a device catalog for a volume, with FIGS. 10-13 illustrating other device catalogs, all incorporated within catalog 110 in FIG. 1 .
  • the volume catalog 207 shown in FIG. 9 includes the volume identification, name, address, port, logical unit number, etc.
  • FIG. 10 illustrates a device catalog 208 for storage.
  • This catalog provides information about a storage system. As shown, the catalog includes an identification, name, address, capacity, information about ports coupled to the storage, etc.
  • FIG. 11 illustrates a catalog 220 for a file system. As shown there, the catalog includes information about identification, physical volume location, file system type, free space, etc. Similarly, FIG. 12 illustrates a device catalog for a path 221 . This catalog includes identification information and worldwide name identification.
  • FIG. 13 is a device catalog 222 for an application. As shown by FIG. 13 , the catalog includes identification, application type, host name, and associated data files.
  • FIGS. 14 and 15 illustrate an archive catalog for message based archiving.
  • FIGS. 2-4 illustrated archive catalogs for file-based archiving.
  • the archiving is performed at an application level. For example, an e-mail server may store messages into data files and an archive server then communicates with the e-mail server to archive the messages themselves, instead of the data files.
  • the archive profile also indicates the name of a server and the name of an application.
  • FIG. 14 illustrates an archive catalog 223 for an archive profile for the case just described. As shown, the application is indicated with A as well as the media name MN, and the media and timing information. The media information itself may be archived in the same manner as described in conjunction with FIG. 3 .
  • FIG. 15 illustrates an archive catalog 224 for archive data.
  • the Source Data designates particular messages instead of files.
  • the Server Name and information about the media, data, and time are also provided.
  • FIG. 16 depicts an exemplary system configuration which is used in the remainder of this application as an example to clarify the explanation.
  • several servers 230 are represented across the upper portion of the diagram, including an application server, an archive server, a backup server, and a replication server. Two of the servers are connected with an Ethernet link. In the middle portion of the diagram, two switches 231 couple the various servers to various storage systems 232 .
  • the replication server is coupled to the Enterprise Storage A to allow replication in that storage system.
  • the application server 230 stores data into LDEV 1 , while the archive server archives some of that data into LDEV 2 .
  • the replication server asks storage unit A to replicate LDEV 1 to LDEV 3 , and in response that event occurs.
  • the backup server backs up data from LDEV 3 to LDEV 4 .
  • FIG. 17 illustrates a sample data descriptor table 240 .
  • This table illustrates information collected by the data manager 111 (see FIG. 1 ) about the data being handled by the storage system and the servers.
  • the data descriptor table includes a considerable information for the particular unit of data discovered. It also includes logical information about the data, including for example, the host name associated with that data, the path name, the “owner” of the data, any restrictions on access or rewriting of the data, the size, time of creation, time of modification, time of last access, and a count of the number of accesses.
  • the data descriptor also includes information about the mount point (where the data is located), the type of file system associated with the data, and the maximum size of that file system.
  • the data descriptor includes physical information about the data, including the storage system brand name (Lightning 9900), its IP address, its LDEV, etc. The physical information can also include information about the maximum volume size, the level of RAID protection, etc.
  • the logical information includes which server has the data, its logical location within that server, and access control information, as well as size, and other parameters about the stored data.
  • the file system information describes the type of file system in which the data is stored.
  • the physical information describes the storage system and the LDEVs on which a particular file system has been created.
  • FIGS. 18-22 illustrate relationship descriptor tables to help establish the relationships among the data stored in the storage system.
  • FIG. 18 is an example of a relationship descriptor table 241 for the archives
  • the table includes information about a descriptor identification, its relationship to the original data, the original data descriptor, the archive data descriptor, the archive time and the retention period thus far.
  • the relationship descriptor shows how the discovered data are related and assigns a unique ID (RID).
  • FIG. 19 provides a relationship descriptor for backup as shown there.
  • Table 242 illustrates the original data of the specified addresses has been backed up as data specified at that address. The backup date, time, speed, and other parameters are also maintained.
  • FIG. 20 is a relationship descriptor table 243 for replication. This table, in addition to the other information provided, maintains the relationship between the original and the replicated data based on their global identification.
  • FIG. 21 is a relationship descriptor table 244 for an application. As shown by this table, the e-mail server in the Trinity server has data sources specified by the designated global identification numbers.
  • the data manager 111 also creates a number of tables based upon its interactions with the servers. These tables are referred to here as consisting of a discovery configuration table 280 shown in FIG. 23 , a discovered data table 420 shown in FIG. 24 , and a discovered relationship table 430 shown in FIG. 25 . These tables are discussed next.
  • the discovered configuration table 280 shown in FIG. 23 shows from which applications and data applications the data manager has gathered information. Each entry in the table, consisting of a row, specifies a type of discovered data, a server from which the information is gathered, an application or data application name, and ID and password information to gain access as needed. For example, in the first row of table 280 , an application program has collected information from server E using the application SAMSoft, and this can be accessed using the ID and password shown at the end of the row.
  • FIG. 24 illustrates a discovered data table 420 .
  • This table provides management information for the discovered data.
  • the data is uniquely identified by the combination of storage system, LDEV and a relative path name.
  • Files stored in the storage system are stored using a file system.
  • the relative path name provides a path name inside the file system instead of a path name when the file system is mounted on a folder in the server. For example, assume LDEV 1 is mounted on ⁇ folder 1 at a server. Also assume there is a file with a path name which is ⁇ folder 2 ⁇ fileA. Thus the relative path name is File A.
  • FIG. 25 illustrates a discovered relationship table 430 .
  • This table manages the identifications of discovered relationships.
  • the relationship identified by RID 0002 is a backup relationship indicating that the files having GIDs shown in the column “Source” were backed up as data identified by the “Destination” column. While backup, archive, and replication actions are associated with data at two locations, the application itself only has source data. Thus “destination” is not applicable.
  • GUI graphical user interface
  • FIG. 26 illustrates a “data view” GUI 250 .
  • the data manager presents a view related to the data itself.
  • the GUI has two parts, a data specifying panel on the left hand side and an information panel on the right hand side of the figure.
  • the data specification panel shows all of the applications and all of the data in the system that is being used by those applications.
  • the specification panel lists e-mail applications and within those applications an e-mail server A. That e-mail server has a number of files, shown in the example as A, B, and C. The user has chosen file A.
  • the GUI is illustrating information about that file in the right hand panel shown in FIG. 26 .
  • This panel illustrates the relationship information about the data associated with file A.
  • the server and file location are shown, as well as all archived, replicated, and backed up copies of that file.
  • file A has been archived by server B at the designated location, has been replicated by server C at the designated location, and has been backed up by server D at the designated location.
  • the user By clicking on the “Details” designation, the user causes the system to retrieve “deeper” information about that data, for example it's size, the time of the event, or other information provided in the descriptor tables discussed above, and that data will be presented on the GUI.
  • FIG. 27 illustrates the GUI for a “storage view” of the data.
  • the left hand panel shown in FIG. 27 corresponds to that discussed in FIG. 26 , enabling the user to select a particular file.
  • the user selected file A and thus the right hand panel of the storage view 260 is illustrating information about file A.
  • That panel shows the LDEV and storage system where the original data is stored, as well as the LDEVs and the storage systems in which all of the data related to the original data are stored, as well as the relationships among those locations. For example, as shown in the upper portion of the right hand panel, the replica, archive, and backup relationships are illustrated.
  • FIG. 28 is a third GUI enabling the user to more easily understand the location of various data in the storage system and the path by which that data is being handled.
  • FIG. 28 illustrates the “path view” GUI.
  • the left hand side of the GUI 270 enables the user to select the particular file, while the right hand side depicts the topology map of the servers, switches, storage systems, and LDEVs for the original data, and for data related to the original data.
  • This diagram also illustrates how data is transferred in the topology.
  • across the upper portion of the right hand panel in FIG. 28 are a series of “buttons.” By clicking on one of these buttons, the screen will show a path through which data is transferred by the specified relationship.
  • FIG. 29 is a flowchart illustrating a preferred embodiment of the data discovery process by the data manager shown in FIG. 1 .
  • the process is initiated by a user at the console 113 shown in FIG. 1 .
  • the data manager retrieves an entry from the discovery configuration table shown in FIG. 23 , unless that entry is a replication entry. If there is a non-replication entry the flow proceeds immediately downward as shown in FIG. 29 .
  • the data discovery process retrieves a replication entry from the discovery configuration table as shown by step 296 .
  • the data manager checks the type of server and executes one of three procedures 293 , 294 or 295 , depending upon the type of server, as shown by the loop in FIG. 29 . After that entry is retrieved the process reverts back to step 290 to be repeated as many times as is necessary to retrieve all of the entries from all of the servers. The details of the particular “get data” procedure 293 , 294 , or 295 are discussed below. Once these procedures are completed, then the system reverts to checking the replication entries as shown by step 296 . Assuming there are replication entries, then the procedure follows step 298 , which is also discussed later below. Once all of the entries have been retrieved as shown at step 297 , the data discovery process ends.
  • FIG. 30 illustrates in more detail the process flow for getting data from an application as shown by block 293 in FIG. 29 .
  • the data manager first connects to the SAM server via the network. It uses an identification and password in the discovery configuration table for the connection 300 . It then retrieves a list of applications from the SAM server 301 , and for each application a list of data files from that server as shown by step 302 . As shown by step 303 , for each data file on that list, the data manager gets a file system name in which the data file is stored in the SAM server. Then, as shown by step 304 , for each file system a storage name and an LDEV on which the file system is created are also retrieved from the SAM server.
  • the data manager creates a new entry in the discovered data table and allocates a new global identification to that if there is not already an entry for that set.
  • a data descriptor is created for each such GID.
  • the data manager will retrieve logical information, file system information, and physical information from the SAM server and file that information into the data descriptor table.
  • step 308 for each application a new entry in the discovered relationship table is created and a new RID is provided if there is not already an entry for that application.
  • the relationship descriptor for the application and the file information is then created. Once these steps are completed, the process flow returns to the diagram shown in FIG. 29 .
  • FIG. 31 illustrates the process of retrieving data from the backup server, illustrated in FIG. 29 as step 294 .
  • the data manager first connects to a backup server via the network. It uses the ID and password information from the discovery configuration table for the connection, as shown in step 320 . It also connects to the SAM server in the same manner, as shown in step 321 .
  • the data manager retrieves a list of backup profiles from the backup server. As shown by step 323 , for each such backup profile the data manager obtains a list of backup data from the backup server.
  • the data manager retrieves a file system in which the backup stream is stored from the backup server.
  • a storage name and an LDEV on which the file system is created are retrieved from the SAM server.
  • a new entry is created in the discovered data table and a new GID is allocated if there is not already an entry for that set.
  • a data descriptor is created.
  • logical information, file system information, and physical information from the SAM server is retrieved and provided to the data descriptor table.
  • FIG. 32 illustrates the process following step 328 .
  • the data manager obtains a list of the data sources from the backup server at step 329 .
  • a file system in which the data source is stored is also retrieved from the backup server at step 330 .
  • the data manager retrieves a storage name and an LDEV on which the file system is created from the same server.
  • a new entry is created in the discovered data table, and a new GID is allocated if there is not already an entry for that set.
  • a data descriptor is created for each GID.
  • logical information, file system information, and physical information is retrieved from the same server and filled into the data descriptor table.
  • step 336 for each backup data, a new entry is created in the discovered relationship table and a new RID is allocated if there is not already an entry for that backup data.
  • step 337 for each RID, a relationship descriptor for the backup information is created and this is filled into the discovered data table. That step concludes operations for the get data from backup step shown generally as step 294 in FIG. 29 .
  • FIG. 33 illustrates the details behind the step of getting data from the archive, represented by step 295 in FIG. 29 . As described above, these operations are similar to the other get data operations discussed in the previous few figures.
  • the process begins with step 340 in which the data manager connects to the archive server using an ID and password information. It also connects to the same server with the ID and password information as shown by step 341 .
  • the data manager connects to the archive server using an ID and password information. It also connects to the same server with the ID and password information as shown by step 341 .
  • it obtains a list of archive profiles, and at step 344 , for each archive profile it obtains a list of archive data from the archive server.
  • At step 345 for each archive data it retrieves the file system in which the archive stream is stored from the archive server.
  • a new entry is created in the discovered data table and a new GID is allocated if there is not already one for that set.
  • a data descriptor is created, and finally at step 349 , for each such data descriptor logical information from a file system information and physical information from the SAM server is filled into the data descriptor table. The process then continues with FIG. 34 .
  • step 350 for each archived data, a list of data sources is retrieved from the archive server. Then for each unique data source, a file system for that data source is retrieved from the archive server, as shown by step 351 . Then, for each unique file system, the storage name and LDEV on which the file system is created are retrieved from the SAM server. Next, at step 353 , for each unique set of a storage name, an LDEV, and a data source relative path name, a new entry is created in the discovered data table and a new GID is allocated if there is not already one for that set.
  • a new data descriptor is created for each GID and for each such data descriptor, logical information, file system information, and physical information is retrieved from the SAM server and filled into the data descriptor table as shown by step 355 . Then, for each archived data, a new entry is created in the discovered relationship table and a new RID is allocated if there is not already one for that data. Finally, a relationship descriptor is created for that RID and filled in to the data discovery table.
  • the process for getting data from the replica servers is similar to that described above. It is illustrated in FIG. 35 .
  • the process follows a flow of connecting to the replication server with an ID and password 360 , connecting to the SAM server 361 , and obtaining a list of replication profiles from the replication server 362 .
  • selected information is retrieved at step 363 , and for each such replication set, the data is located that is stored in these volumes at step 364 .
  • a new entry is created in the discovered relationship table, and for each such new RID a relationship descriptor is created and the information filled into the table at step 366 .
  • the techniques for showing the various data, storage and path view are described in FIG. 29 .
  • the data manager receives a server name, an application name, and a data file from the GUI, as shown by step 370 . As discussed above, this selection will typically be made by the user choosing an appropriate entry in the left hand panel of the GUI. Then, as shown by step 371 , the GID for the specified data is retrieved from the discovered data table, and at step 372 , a list is retrieved of all RIDs that contain the GID from the discovered relationship table. If there are none, then the found GIDs may be displayed, as shown by step 376 . If there are RIDs, then for each such RID, the GIDs and the destination are also retrieved from the discovered relationship table as shown by step 374 . Once this is completed, the display is produced as shown by step 376 .
  • FIG. 37 illustrates the steps for showing a storage view in the GUI.
  • the user selects various information as shown in step 380 , and the GID for the specified data is retrieved from the discovered data table.
  • the flow of operations through steps 382 , 383 , 384 , and 385 matches that from FIG. 36 .
  • the data manager finds the storage system and LDEVs in which the data specified by the GID is stored, and shows the storage as a storage icon on the screen and the LDEV as LDEV icons on the screen.
  • the LDEV icons are interconnected by relationship indicators for each found RID.
  • FIG. 38 is a flow chart illustrating the manner in which the path view GUI is created. Steps 390 - 395 are the same as those described above for the data and storage views.
  • step 396 for all of the found GIDs and RIDs find the related servers, switches, storage systems, and LDEVs that are related to the data or data applications specified by these found GIDs and RIDs.
  • step 397 the physical topology map for all the found hardware components is displayed at step 397 , and relationship buttons are added at step 398 .
  • step 399 if a button is pushed, then the system shows the data path by which the designated data is transferred, which information is provided by the SAM server.
  • FIG. 39 is a flow chart illustrating another feature provided by the system of this invention.
  • FIG. 39 provides a technique for detecting a misconfiguration of a data backup by comparing the size of the backup data with the size of the original data.
  • the process shown in FIG. 39 may be invoked by the user through the storage console 113 shown in FIG. 1 .
  • the system receives a server name, an application, and a data file from the GUI as shown by step 400 .
  • the GID for the specified data is retrieved from the discovered data table and the list of RIDs that contain that GID are retrieved from the discovered relationship table. This process is repeated until all RIDs and GIDs are retrieved as shown by steps 403 - 405 .
  • the size of the data files for that application are then computed at step 407 .
  • a successfully completed message is displayed at step 409 , while if the amounts do not match, an error is displayed at step 410 .
  • the user can then either reperform the backup of investigate the error and resolve it in some other manner.
  • the technology described has numerous applications. These applications are not restricted to backup, archive, replication, etc.
  • the invention can be applied to other applications or custom applications in which data is to be analyzed and relationships determined.
  • the invention is also not limited to files or data in the local file system or local server. Instead, the invention can be applied to volumes in storage systems and objects in object based storage devices, or files in network attached storage systems. It can be applied to volumes, and to storage systems which replicate volumes by themselves.
  • the data manager in such an application can determine from the storage system or the replication server how the volumes are replicated and create a data descriptor for each volume without path information, and also create a relationship descriptor by using the replication relationship.
  • the data is uniquely identified by an IP address, an exported file system and a relative path name.
  • the data manager may calculate a hash value for each data. Then the data manager can retrieve the logical location and physical location of such data from a SAM server. If the data are related to different locations, then the data manager can create a relationship descriptor for these data which indicates that the data are identical (in the case of duplicate hash values). This enables the user to see how many replications of data are present on the storage system and to determine which data can be deleted.
  • the data manager can also detect at what location in the hierarchy a performance bottleneck exists. In such a case, the data manager which retrieves performance information for each relationship and determines if those numbers are restricted by physical resources or disturbances caused by other data processing or application software.
  • the data manager also provides users a way to search for data and relationships among data by specifying some portion of the data. If the data manager receives such a request, the data manager can find data descriptors and relationship descriptors that include the specified information and provide it, for example as described on a graphical user interface.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A method of collecting information about data and data handling processes from different types of applications in the context of a storage system is described. The retrieved information is presented to the user to illustrate the relationships among the data, for example, in the form of a data view illustrating the relationship among files, a storage view, illustrating the physical location at which the stored data is located, or a path view illustrating a particular path through the topology of the overall computing system and storage system. Also described are techniques for assuring the accuracy of backed up files.

Description

    CROSS-REFERENCES TO RELATED APPLICATIONS
  • The present application is a Continuation Application of U.S. application Ser. No. 10/890,652, filed Jul. 13, 2004, which is incorporated by reference herein in its entirety for all purposes.
  • BACKGROUND OF THE INVENTION
  • This invention relates to systems for storing data, and in particular to storage systems in which data is distributed among large numbers of hard disk drives or other storage media.
  • In a typical data storage network, data from many different applications is stored and retrieved, and it is difficult to track the relationships among all of the data stored. For example, in an e-mail system, an e-mail server generates original data and provides it to a storage system. An archive server may archive some parts of the data to different parts of the storage system or to different storage systems. At the same time a replication server may replicate the original data to different storage, and the data may be backed up by a backup server to yet further storage. While each of these data handling processes operate on the data associated with that process in an appropriate manner, the archive server, the replication server and the backup server each operate independently. Each has its own catalog or other mechanism for managing how the data is stored and retrieved. Because of the distributed nature of the system and the lack of consolidated catalogs, a user of a storage system typically cannot understand where data is situated in that storage system on a reliable basis.
  • Furthermore, the complexity of storage systems increases the probability of mistakes. In the example just described, some parts of the original data are not stored in the original storage, but instead have been stored in the archive storage. As a result, a replication of the original data will not contain the archive data. Thus the backup data will also not contain the archive data. Therefore, when a user restores data from the backup, because the backup data is not a complete backup of the original data, not all of the original data will be restored. All of this complexity makes managing the data in a coherent manner difficult and error-prone.
  • There are a few tools that help manage data in storage systems. These tools, however, do not address the issues mentioned above. One commercially available tool for use in management of a data storage system is provided by Veritas™ and referred to as SANPoint Control. This system enables keeping track of the hardware devices and their relationships in a storage area network. Another commercially available tool is provided by AppIQ and known as storage authority suite. This system provides information about the hardware in the storage system, including hosts, bus adapters, switches, disk subsystems, etc. It also provides capabilities for management of particular applications running on the storage system, for example, Oracle databases, file servers, etc.
  • Another commercially available tool for use in storage systems is the Aptare StorageConsole. This application software provides increased reliability for backup and restore operations in a storage system. The Storage Resource Broker from Nirvana is software that enables users of systems to share and manage filed stored in various locations. It provides various searching and presentation functions to enable users to find particular files or information stored in various portions of large data storage units.
  • Therefore, a system is needed which enables a user of the system to have a complete view of the data handling processes and the relationships among processes for management of the data to reduce the chance of error and improve the efficiency with which the data is managed.
  • BRIEF SUMMARY OF THE INVENTION
  • A system according to this invention provides a method for collecting information about data and data handling processes from different types of data applications. This invention enables a user of the system to appreciate relationships among the data. It shows the data in a system view and can illustrate the relationships among the data stored in the system with a graphical user interface. Preferably, in a storage system having arrays of storage devices for storing information, a data manager according to this invention collects information about the relationships among data and files stored therein and presents them to a user.
  • In a preferred embodiment, the graphical user interface provides the user with the option of choosing from among three different views of data handling processes. These include a data view which illustrates how data are related to each other, for example, by showing where a particular file has been archived, replicated, or backed up. Preferably the system also provides a storage view which illustrates how the data volumes are related, for example, indicating which volumes in the storage system have the original data, the archived data, replica data, and backed up data.
  • A third view for information in the storage system is referred to as the path view. The path view illustrates how data is transferred through the system by various data handling processes, for example indicating which ports, switches, and storage handle particular files or other data. Furthermore, a system according to this invention provides a way to detect erroneous configurations of backup data by comparison of the amount of backup data with the amount of original data.
  • In one embodiment, a storage system having a replication server, a backup server, and an archive server further includes a data manager which tracks the stored data in at least two of three approaches. In one approach the stored data is tracked by presenting file name relationships among the replicated, backup, or archived copies of the stored data. In the second approach, the physical locations within the storage system, for example, in terms of volumes, are presented. In the third approach, path information depicting the processes by which the data arrived at its storage location are provided for the replicated, backup, or archived copies of the stored data.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 is a block diagram illustrating a system configuration for a typical storage area network including a data manager according to this invention;
  • FIG. 2 illustrates an archive catalog for an archive profile;
  • FIG. 3 illustrates an archive catalog for media information;
  • FIG. 4 illustrates an archive catalog for archived data;
  • FIG. 5 illustrates a backup catalog for a backup profile;
  • FIG. 6 illustrates a backup catalog for media information;
  • FIG. 7 illustrates a backup catalog for backup data;
  • FIG. 8 illustrates a replication catalog;
  • FIG. 9 illustrates a device catalog for a volume;
  • FIG. 10 illustrates a device catalog for storage;
  • FIG. 11 illustrates a device catalog for a file system;
  • FIG. 12 illustrates a device catalog for a path;
  • FIG. 13 illustrates a device catalog for an application;
  • FIG. 14 illustrates an archive catalog for an archive profile;
  • FIG. 15 illustrates an archive catalog for archived data;
  • FIG. 16 is a block diagram of one example of interconnections in a storage system;
  • FIG. 17 illustrates a data descriptor;
  • FIG. 18 illustrates a relationship descriptor for archived data;
  • FIG. 19 illustrates a relationship descriptor for backup data;
  • FIG. 20 illustrates a relationship descriptor for replication data;
  • FIG. 21 illustrates a relationship descriptor for application data;
  • FIG. 22 illustrates another relationship descriptor for archived data;
  • FIG. 23 illustrates a discovered configuration table;
  • FIG. 24 is an example of a discovered data table;
  • FIG. 25 is an example of a discovered relationship table;
  • FIG. 26 is an example of a GUI for a view of the data;
  • FIG. 27 is an illustration of a GUI for a view of the storage system;
  • FIG. 28 is an example of a GUI for a view of the path information;
  • FIG. 29 illustrates a process for data discovery;
  • FIG. 30 illustrates details of the Get Data From App process shown in FIG. 29;
  • FIG. 31 illustrates details of the Get Data From Backup process shown in FIG. 29;
  • FIG. 32 illustrates further details of the Get Data From Backup process shown in FIG. 29;
  • FIG. 33 illustrates details of the Get Data From Archive process shown in FIG. 29;
  • FIG. 34 illustrates further details of the Get Data From Archive process shown in FIG. 29;
  • FIG. 35 illustrates details of the Get Data from Replica process shown in FIG. 29;
  • FIG. 36 is a flow chart illustrating the steps for depicting the data view;
  • FIG. 37 is a flow chart illustrating the steps for depicting the storage view;
  • FIG. 38 is a flow chart illustrating the steps for depicting the path view; and
  • FIG. 39 is a flow chart illustrating the steps for checking backup operations;
  • DETAILED DESCRIPTION OF THE INVENTION
  • FIG. 1 is a block diagram illustrating a hypothetical typical storage system as might be found in a complex computing environment. Most of the components of the system shown in FIG. 1 are well known and thus are discussed only briefly herein. The data manager 111, however, is not well known and is explained in detail below.
  • The system shown in FIG. 1 includes two application servers 101 and 102. These servers run computer programs 101 a and 102 a to provide computing resources to users of the overall system. By execution of a stored program, the applications 101 a and 102 a generate data which is stored in the system illustrated in FIG. 1.
  • A replication server 103 replicates data to different storage systems or volumes within the storage system to provide well known mirroring functionality. The replication server maintains a replication catalog 106 as will be discussed below. Similarly, a backup server 104 provides data backup functionality to enable restoration of data at a later date should there be hardware, software, or facilities failures. A backup catalog 107 maintains a record of the backup operations, as also discussed below.
  • Many large storage systems also include a hierarchical storage manager or archive server 105. Server 105 archives little used data from primary storage areas to secondary storage areas to provide improved system performance and to reduce costs by maintaining the data on lower cost media. As with the other servers, archive server 105 maintains an archive catalog 108, also explained further below. Although servers 101-105 have been discussed as though each were a standalone hardware implementation, this is not necessary. The servers may be implemented as separate processes running on a single large computer, or as separate processes running on separate processors within a connected array of computers.
  • The system shown in FIG. 1 also includes a storage area manager 109. The storage area manager is preferably a management server that manages the entire network depicted in FIG. 1, including the servers and the storage systems 115, 116, and 117. The storage area manager maintains a device catalog 110 which is also discussed below. In essence, the storage area manager can retrieve information from the switches 114, servers 101 . . . 105, storage systems 115-117, and the applications 101 a, 102 a. Storage area managers such as depicted in FIG. 1 are often implemented using a standard protocol such as DMTF's CIM. Another way to implement the storage area manager is to install an agent on the server and have the agent collect information about the server locality and provide it to the storage area manager.
  • Although there are a variety of techniques commonly used to interconnect systems such as depicted in FIG. 1, switches 114 have become an increasingly popular connection technique. These switches are typically switches based on Fibre Channel, Ethernet, or broadband technology.
  • The data received by the system or generated by the system as the result of its server operations is stored in storage systems such as 115, 116, and 117. Each such storage system includes a disk controller 118, 119, and 120, respectively, as well as hard disk drives 118 a . . . 120 b for storing data. For simplicity FIG. 1 illustrates only two disk drives per storage system. In conventional implementations, however, hundreds of disk drives may be employed in the storage system. The disk controllers 118, 119 and 120 control input and output requests issued from the servers to store and retrieve data from the hard disk drives.
  • For illustration three different types of storage systems are shown in FIG. 1. Storage system 115 is an enterprise Fibre Channel storage system. Such systems typically support SCSI as a data protocol between the servers and the storage systems. The Nearline PC storage system 116 operates in a similar manner, however, using ATA format hard disk drives. Finally, the Network Attached Storage system 117 supports NFS and CIFS as file protocols. Thus, as depicted in FIG. 1, the system of this invention can be applicable to any type of storage system.
  • The components and systems shown in FIG. 1 are interconnected using two techniques. A network 100 is provided, for example based on TCP/IP/Ethernet to provide “out of band” communications. The main data handling, however, for the storage systems is provided by switches 114 which allow interconnections of desired components as necessitated by the particular operations to be performed.
  • The system of this invention adds an additional component 111, referred to herein as a data manager, to the overall system of FIG. 1. This data manager communicates with the other components via the local area network 100 and the switches 114. The data manager functions to collect data handling process information from the applications and the data applications and present the results to a user. The results are typically presented through a graphical user interface running on a console 113. The data manager maintains a data catalog. The data catalog enables the data manager to present to the user various “views” of the storage system. For example, the data manager 111 and data catalog together enable a user to view information about the physical locations where various files are stored, the path by which the information was stored, and other relationships among the data stored in the storage systems 115, 116, and 117. The data manager 111 creates and manages data descriptors, relationship descriptors, a discovered data table (discussed below) and a discovered relationship table (also discussed below). These tables are typically stored in local storage or network storage attached to the data manager. The data manager also uses a discovery configuration table as discussed below. The data manager itself may be configured by the console 113. The data manager relies upon catalogs created and stored throughout the system as designated in FIG. 1. These catalogs are discussed next.
  • FIG. 2 is a diagram illustrating an archive catalog for the archive profile. This catalog is included within the catalog 108 shown in FIG. 1. The catalog 200 shown in FIG. 2 describes which data is to be archived, at what time, and to which storage. In the example shown in FIG. 2 the data is to be archived if it is not accessed within 30 days. The data to be archived is set forth as the Folder, and the media to which it is to be archived is listed under Archive Media.
  • FIG. 3 illustrates an archive catalog for media information. This catalog is also included within catalog 108 shown in FIG. 1. The example in FIG. 3 illustrates that the Archive Media is actually an Archive Folder having a specified address associated with the specific server. FIG. 3 also indicates that the Folder has a maximum capacity as shown.
  • FIG. 4 is a diagram illustrating an archive catalog for archive data. This catalog is included within catalog 108 shown in FIG. 1. In the example of FIG. 4, the indicated Source Data is shown as being archived at the designated media location as an Archive Stream at the Archive Time shown in FIG. 4.
  • FIGS. 5-7 illustrate backup catalogs stored as catalog 107 in FIG. 1. In FIG. 5, an exemplary backup catalog for a backup profile is illustrated. This catalog describes how and when data is to be backed up. In the example depicted, files under the folder designated by Source are to be backed up to the Backup Media at the Backup Time stated. The Backup Type indicates that all files are to be backed up, while the Next Backup Time indicates the time and date of the next backup operation.
  • FIG. 6 is a diagram illustrating a backup catalog for media information. In a similar manner to FIG. 3, it illustrates the physical location of the particular media designated, as well as its capacity.
  • FIG. 7 illustrates a backup catalog for backup data. This catalog describes when and where data is backed up. In the example shown, two files as designated by Data Source have been backed up to the Backup Media at the time shown.
  • FIG. 8 is a diagram illustrating a replication relationship between two devices in the storage system, and is referred to as a replication catalog. This diagram provides additional information with regard to the replication catalog 106 in FIG. 1. The replication catalog describes the relationship between two data storage locations, commonly known as LDEVs in the storage system. As shown by FIG. 8, the data in the Primary Storage is replicated to the Secondary Storage location. The Mode indicates whether the backup is to be synchronous or asynchronous.
  • FIG. 9 is a diagram illustrating a device catalog for a volume, with FIGS. 10-13 illustrating other device catalogs, all incorporated within catalog 110 in FIG. 1. The volume catalog 207 shown in FIG. 9 includes the volume identification, name, address, port, logical unit number, etc.
  • FIG. 10 illustrates a device catalog 208 for storage. This catalog provides information about a storage system. As shown, the catalog includes an identification, name, address, capacity, information about ports coupled to the storage, etc.
  • FIG. 11 illustrates a catalog 220 for a file system. As shown there, the catalog includes information about identification, physical volume location, file system type, free space, etc. Similarly, FIG. 12 illustrates a device catalog for a path 221. This catalog includes identification information and worldwide name identification.
  • FIG. 13 is a device catalog 222 for an application. As shown by FIG. 13, the catalog includes identification, application type, host name, and associated data files.
  • FIGS. 14 and 15 illustrate an archive catalog for message based archiving. (FIGS. 2-4 illustrated archive catalogs for file-based archiving.) In message based archiving, the archiving is performed at an application level. For example, an e-mail server may store messages into data files and an archive server then communicates with the e-mail server to archive the messages themselves, instead of the data files. In these circumstances, the archive profile also indicates the name of a server and the name of an application.
  • FIG. 14 illustrates an archive catalog 223 for an archive profile for the case just described. As shown, the application is indicated with A as well as the media name MN, and the media and timing information. The media information itself may be archived in the same manner as described in conjunction with FIG. 3.
  • FIG. 15 illustrates an archive catalog 224 for archive data. As mentioned above, the Source Data designates particular messages instead of files. The Server Name and information about the media, data, and time are also provided.
  • FIG. 16 depicts an exemplary system configuration which is used in the remainder of this application as an example to clarify the explanation. As shown in FIG. 16, several servers 230 are represented across the upper portion of the diagram, including an application server, an archive server, a backup server, and a replication server. Two of the servers are connected with an Ethernet link. In the middle portion of the diagram, two switches 231 couple the various servers to various storage systems 232. The replication server is coupled to the Enterprise Storage A to allow replication in that storage system. The application server 230 stores data into LDEV1, while the archive server archives some of that data into LDEV2. The replication server asks storage unit A to replicate LDEV1 to LDEV3, and in response that event occurs. The backup server backs up data from LDEV3 to LDEV4.
  • In a conventional system without the data manager described in conjunction with FIG. 1, the various catalogs described above are all separated and the user is not able to see the total relationships of the data and files being managed by the storage system. The addition of the data manager, however, allows communication among the various servers and the data manager, for example using scripts or other well known interfaces. By communication between the data manager and the various servers these relationships may be discovered and presented to the user as discussed next.
  • FIG. 17 illustrates a sample data descriptor table 240. This table illustrates information collected by the data manager 111 (see FIG. 1) about the data being handled by the storage system and the servers. As shown in FIG. 17, the data descriptor table includes a considerable information for the particular unit of data discovered. It also includes logical information about the data, including for example, the host name associated with that data, the path name, the “owner” of the data, any restrictions on access or rewriting of the data, the size, time of creation, time of modification, time of last access, and a count of the number of accesses. The data descriptor also includes information about the mount point (where the data is located), the type of file system associated with the data, and the maximum size of that file system. Finally, the data descriptor includes physical information about the data, including the storage system brand name (Lightning 9900), its IP address, its LDEV, etc. The physical information can also include information about the maximum volume size, the level of RAID protection, etc.
  • Generally speaking, the logical information includes which server has the data, its logical location within that server, and access control information, as well as size, and other parameters about the stored data. Also generally speaking, the file system information describes the type of file system in which the data is stored. The physical information describes the storage system and the LDEVs on which a particular file system has been created.
  • FIGS. 18-22 illustrate relationship descriptor tables to help establish the relationships among the data stored in the storage system. FIG. 18 is an example of a relationship descriptor table 241 for the archives The table includes information about a descriptor identification, its relationship to the original data, the original data descriptor, the archive data descriptor, the archive time and the retention period thus far. The relationship descriptor shows how the discovered data are related and assigns a unique ID (RID).
  • FIG. 19 provides a relationship descriptor for backup as shown there. Table 242 illustrates the original data of the specified addresses has been backed up as data specified at that address. The backup date, time, speed, and other parameters are also maintained.
  • FIG. 20 is a relationship descriptor table 243 for replication. This table, in addition to the other information provided, maintains the relationship between the original and the replicated data based on their global identification.
  • FIG. 21 is a relationship descriptor table 244 for an application. As shown by this table, the e-mail server in the Trinity server has data sources specified by the designated global identification numbers.
  • As shown by table 245 in FIG. 22, there is a relationship descriptor for the archive in a message based system. Because it would be resource-consuming to create a data descriptor and a relationship descriptor for each message, only the relationship between the original data and the archived data are identified in the case of message based archiving. Of course, if desired, a data descriptor could be created.
  • The data manager 111 also creates a number of tables based upon its interactions with the servers. These tables are referred to here as consisting of a discovery configuration table 280 shown in FIG. 23, a discovered data table 420 shown in FIG. 24, and a discovered relationship table 430 shown in FIG. 25. These tables are discussed next.
  • The discovered configuration table 280 shown in FIG. 23 shows from which applications and data applications the data manager has gathered information. Each entry in the table, consisting of a row, specifies a type of discovered data, a server from which the information is gathered, an application or data application name, and ID and password information to gain access as needed. For example, in the first row of table 280, an application program has collected information from server E using the application SAMSoft, and this can be accessed using the ID and password shown at the end of the row.
  • FIG. 24 illustrates a discovered data table 420. This table provides management information for the discovered data. As shown by the table, the data is uniquely identified by the combination of storage system, LDEV and a relative path name. Files stored in the storage system are stored using a file system. The relative path name provides a path name inside the file system instead of a path name when the file system is mounted on a folder in the server. For example, assume LDEV1 is mounted on \folder1 at a server. Also assume there is a file with a path name which is \folder2\fileA. Thus the relative path name is File A.
  • FIG. 25 illustrates a discovered relationship table 430. This table manages the identifications of discovered relationships. In the example depicted, the relationship identified by RID 0002 is a backup relationship indicating that the files having GIDs shown in the column “Source” were backed up as data identified by the “Destination” column. While backup, archive, and replication actions are associated with data at two locations, the application itself only has source data. Thus “destination” is not applicable.
  • Using all of the tables discussed above and the various relationships created, in a manner which will be discussed in detail below, the system is capable of providing a comprehensive view of the relationships among the data stored in the affiliated storage systems. Exemplary graphical user interfaces for presenting these relationships to the user of the storage system are shown in FIGS. 26, 27, and 28. As should be understood, other graphical user interfaces (GUI) can also be created for presentation to the user to enable a better understanding of the data in the storage system. These interfaces will typically be of most benefit to an administrator of the data management system. Typically these interfaces will be presented on the console 113 shown in FIG. 1. Typical GUIs are discussed next.
  • FIG. 26 illustrates a “data view” GUI 250. In this exemplary GUI, the data manager presents a view related to the data itself. In the embodiment depicted, the GUI has two parts, a data specifying panel on the left hand side and an information panel on the right hand side of the figure. The data specification panel shows all of the applications and all of the data in the system that is being used by those applications. For example, in FIG. 26, the specification panel lists e-mail applications and within those applications an e-mail server A. That e-mail server has a number of files, shown in the example as A, B, and C. The user has chosen file A. In response the GUI is illustrating information about that file in the right hand panel shown in FIG. 26. This panel illustrates the relationship information about the data associated with file A. As shown at the top of the panel, the server and file location are shown, as well as all archived, replicated, and backed up copies of that file. As illustrated, file A has been archived by server B at the designated location, has been replicated by server C at the designated location, and has been backed up by server D at the designated location. By clicking on the “Details” designation, the user causes the system to retrieve “deeper” information about that data, for example it's size, the time of the event, or other information provided in the descriptor tables discussed above, and that data will be presented on the GUI.
  • FIG. 27 illustrates the GUI for a “storage view” of the data. The left hand panel shown in FIG. 27 corresponds to that discussed in FIG. 26, enabling the user to select a particular file. In the same manner as described there, the user selected file A, and thus the right hand panel of the storage view 260 is illustrating information about file A. That panel shows the LDEV and storage system where the original data is stored, as well as the LDEVs and the storage systems in which all of the data related to the original data are stored, as well as the relationships among those locations. For example, as shown in the upper portion of the right hand panel, the replica, archive, and backup relationships are illustrated.
  • FIG. 28 is a third GUI enabling the user to more easily understand the location of various data in the storage system and the path by which that data is being handled. FIG. 28 illustrates the “path view” GUI. As with the above FIGS. 26 and 27, the left hand side of the GUI 270 enables the user to select the particular file, while the right hand side depicts the topology map of the servers, switches, storage systems, and LDEVs for the original data, and for data related to the original data. This diagram also illustrates how data is transferred in the topology. To simplify the diagram, across the upper portion of the right hand panel in FIG. 28 are a series of “buttons.” By clicking on one of these buttons, the screen will show a path through which data is transferred by the specified relationship.
  • The preceding discussion has discussed the various tables created and used by the data manager 111, and the graphical user interface for presentation of that data to a user of the system. The remaining portion of this specification discusses the manner in which the system operates to establish those tables and present the graphical user interfaces.
  • FIG. 29 is a flowchart illustrating a preferred embodiment of the data discovery process by the data manager shown in FIG. 1. The process is initiated by a user at the console 113 shown in FIG. 1. At a first step 290 the data manager retrieves an entry from the discovery configuration table shown in FIG. 23, unless that entry is a replication entry. If there is a non-replication entry the flow proceeds immediately downward as shown in FIG. 29. On the other hand, if there is no new entry, then the data discovery process retrieves a replication entry from the discovery configuration table as shown by step 296. Assuming there is a new entry, the data manager checks the type of server and executes one of three procedures 293, 294 or 295, depending upon the type of server, as shown by the loop in FIG. 29. After that entry is retrieved the process reverts back to step 290 to be repeated as many times as is necessary to retrieve all of the entries from all of the servers. The details of the particular “get data” procedure 293, 294, or 295 are discussed below. Once these procedures are completed, then the system reverts to checking the replication entries as shown by step 296. Assuming there are replication entries, then the procedure follows step 298, which is also discussed later below. Once all of the entries have been retrieved as shown at step 297, the data discovery process ends.
  • FIG. 30 illustrates in more detail the process flow for getting data from an application as shown by block 293 in FIG. 29. The data manager first connects to the SAM server via the network. It uses an identification and password in the discovery configuration table for the connection 300. It then retrieves a list of applications from the SAM server 301, and for each application a list of data files from that server as shown by step 302. As shown by step 303, for each data file on that list, the data manager gets a file system name in which the data file is stored in the SAM server. Then, as shown by step 304, for each file system a storage name and an LDEV on which the file system is created are also retrieved from the SAM server. Next, for each unique set (a name of a storage system, an LDEV, a data file relative path name) the data manager creates a new entry in the discovered data table and allocates a new global identification to that if there is not already an entry for that set. As shown by step 306, for each such GID, a data descriptor is created. Then, as shown by step 307, for each data descriptor, the data manager will retrieve logical information, file system information, and physical information from the SAM server and file that information into the data descriptor table. Then, as shown by step 308, for each application a new entry in the discovered relationship table is created and a new RID is provided if there is not already an entry for that application. Finally, as shown by step 309, for each RID the relationship descriptor for the application and the file information is then created. Once these steps are completed, the process flow returns to the diagram shown in FIG. 29.
  • FIG. 31 illustrates the process of retrieving data from the backup server, illustrated in FIG. 29 as step 294. Once this process is invoked, the operation is similar to that described in FIG. 30. In particular, the data manager first connects to a backup server via the network. It uses the ID and password information from the discovery configuration table for the connection, as shown in step 320. It also connects to the SAM server in the same manner, as shown in step 321. At step 322, the data manager retrieves a list of backup profiles from the backup server. As shown by step 323, for each such backup profile the data manager obtains a list of backup data from the backup server. Then, at step 324, for each backup data, the data manager retrieves a file system in which the backup stream is stored from the backup server. Next, as shown by step 325, for each unique file system a storage name and an LDEV on which the file system is created, are retrieved from the SAM server. Then, at step 326, for each unique set (name, LDEV, and backup stream relative path name) a new entry is created in the discovered data table and a new GID is allocated if there is not already an entry for that set. Next, at step 327, for each GID a data descriptor is created. Then, as shown at step 328, for each data descriptor logical information, file system information, and physical information from the SAM server is retrieved and provided to the data descriptor table.
  • FIG. 32 illustrates the process following step 328. As shown in FIG. 32, for each backup data, the data manager obtains a list of the data sources from the backup server at step 329. Then for each unique data source, a file system in which the data source is stored is also retrieved from the backup server at step 330. At step 331, for each unique file system, the data manager retrieves a storage name and an LDEV on which the file system is created from the same server. Then, at step 332, for each unique set of storage name, LDEV, and data source relative path name, a new entry is created in the discovered data table, and a new GID is allocated if there is not already an entry for that set. Then at step 333, a data descriptor is created for each GID. At step 334, for each data descriptor, logical information, file system information, and physical information is retrieved from the same server and filled into the data descriptor table. Then at step 336, for each backup data, a new entry is created in the discovered relationship table and a new RID is allocated if there is not already an entry for that backup data. Finally, at step 337 for each RID, a relationship descriptor for the backup information is created and this is filled into the discovered data table. That step concludes operations for the get data from backup step shown generally as step 294 in FIG. 29.
  • FIG. 33 illustrates the details behind the step of getting data from the archive, represented by step 295 in FIG. 29. As described above, these operations are similar to the other get data operations discussed in the previous few figures. The process begins with step 340 in which the data manager connects to the archive server using an ID and password information. It also connects to the same server with the ID and password information as shown by step 341. At step 343, it obtains a list of archive profiles, and at step 344, for each archive profile it obtains a list of archive data from the archive server. At step 345 for each archive data, it retrieves the file system in which the archive stream is stored from the archive server. Then for each unique set of a storage name, an LDEV, and an archive stream relative path name, a new entry is created in the discovered data table and a new GID is allocated if there is not already one for that set. Next at step 348, for each GID a data descriptor is created, and finally at step 349, for each such data descriptor logical information from a file system information and physical information from the SAM server is filled into the data descriptor table. The process then continues with FIG. 34.
  • As shown by step 350, for each archived data, a list of data sources is retrieved from the archive server. Then for each unique data source, a file system for that data source is retrieved from the archive server, as shown by step 351. Then, for each unique file system, the storage name and LDEV on which the file system is created are retrieved from the SAM server. Next, at step 353, for each unique set of a storage name, an LDEV, and a data source relative path name, a new entry is created in the discovered data table and a new GID is allocated if there is not already one for that set. Then a new data descriptor is created for each GID and for each such data descriptor, logical information, file system information, and physical information is retrieved from the SAM server and filled into the data descriptor table as shown by step 355. Then, for each archived data, a new entry is created in the discovered relationship table and a new RID is allocated if there is not already one for that data. Finally, a relationship descriptor is created for that RID and filled in to the data discovery table.
  • The process for getting data from the replica servers is similar to that described above. It is illustrated in FIG. 35. The process follows a flow of connecting to the replication server with an ID and password 360, connecting to the SAM server 361, and obtaining a list of replication profiles from the replication server 362. Then for each replication profile, selected information is retrieved at step 363, and for each such replication set, the data is located that is stored in these volumes at step 364. Then for each found data set a new entry is created in the discovered relationship table, and for each such new RID a relationship descriptor is created and the information filled into the table at step 366. This completes the description of the processes initially shown in FIG. 29. Next, the techniques for showing the various data, storage and path view. The steps for showing a data view are illustrated by the flow chart of FIG. 36. To show the data view, the data manager receives a server name, an application name, and a data file from the GUI, as shown by step 370. As discussed above, this selection will typically be made by the user choosing an appropriate entry in the left hand panel of the GUI. Then, as shown by step 371, the GID for the specified data is retrieved from the discovered data table, and at step 372, a list is retrieved of all RIDs that contain the GID from the discovered relationship table. If there are none, then the found GIDs may be displayed, as shown by step 376. If there are RIDs, then for each such RID, the GIDs and the destination are also retrieved from the discovered relationship table as shown by step 374. Once this is completed, the display is produced as shown by step 376.
  • FIG. 37 illustrates the steps for showing a storage view in the GUI. In a manner similar to that described with FIG. 36, the user selects various information as shown in step 380, and the GID for the specified data is retrieved from the discovered data table. The flow of operations through steps 382, 383, 384, and 385 matches that from FIG. 36. Then, at step 386, for each found GID the data manager finds the storage system and LDEVs in which the data specified by the GID is stored, and shows the storage as a storage icon on the screen and the LDEV as LDEV icons on the screen. Next, as shown by step 387, the LDEV icons are interconnected by relationship indicators for each found RID.
  • FIG. 38 is a flow chart illustrating the manner in which the path view GUI is created. Steps 390-395 are the same as those described above for the data and storage views. At step 396, for all of the found GIDs and RIDs find the related servers, switches, storage systems, and LDEVs that are related to the data or data applications specified by these found GIDs and RIDs. Following this step, the physical topology map for all the found hardware components is displayed at step 397, and relationship buttons are added at step 398. At step 399, if a button is pushed, then the system shows the data path by which the designated data is transferred, which information is provided by the SAM server.
  • FIG. 39 is a flow chart illustrating another feature provided by the system of this invention. FIG. 39 provides a technique for detecting a misconfiguration of a data backup by comparing the size of the backup data with the size of the original data. The process shown in FIG. 39 may be invoked by the user through the storage console 113 shown in FIG. 1. Upon invocation, the system receives a server name, an application, and a data file from the GUI as shown by step 400. Then the GID for the specified data is retrieved from the discovered data table and the list of RIDs that contain that GID are retrieved from the discovered relationship table. This process is repeated until all RIDs and GIDs are retrieved as shown by steps 403-405. At step 406 a calculation is performed for each GID with a full backup to determine the size of the backup stream. The size of the data files for that application are then computed at step 407. At step 408, if the amounts match, a successfully completed message is displayed at step 409, while if the amounts do not match, an error is displayed at step 410. Upon receipt of the error the user can then either reperform the backup of investigate the error and resolve it in some other manner.
  • The technology described has numerous applications. These applications are not restricted to backup, archive, replication, etc. The invention can be applied to other applications or custom applications in which data is to be analyzed and relationships determined. The invention is also not limited to files or data in the local file system or local server. Instead, the invention can be applied to volumes in storage systems and objects in object based storage devices, or files in network attached storage systems. It can be applied to volumes, and to storage systems which replicate volumes by themselves. The data manager in such an application can determine from the storage system or the replication server how the volumes are replicated and create a data descriptor for each volume without path information, and also create a relationship descriptor by using the replication relationship. In the case of network attached storage, the data is uniquely identified by an IP address, an exported file system and a relative path name.
  • While LDEV has been user herein to identify the uniqueness of data, other approaches may be used. The data manager may calculate a hash value for each data. Then the data manager can retrieve the logical location and physical location of such data from a SAM server. If the data are related to different locations, then the data manager can create a relationship descriptor for these data which indicates that the data are identical (in the case of duplicate hash values). This enables the user to see how many replications of data are present on the storage system and to determine which data can be deleted.
  • By checking a hierarchy of relationships among data and performance information from the data processing, the data manager can also detect at what location in the hierarchy a performance bottleneck exists. In such a case, the data manager which retrieves performance information for each relationship and determines if those numbers are restricted by physical resources or disturbances caused by other data processing or application software. The data manager also provides users a way to search for data and relationships among data by specifying some portion of the data. If the data manager receives such a request, the data manager can find data descriptors and relationship descriptors that include the specified information and provide it, for example as described on a graphical user interface.
  • Although the invention has been described in detail above with respect to a preferred embodiment, it will be appreciated that variations and alterations may be made in the implementation of the invention without departing from its scope as shown by the appended claims.

Claims (6)

1. In a data management system coupled to a first server which processes data to be stored in a first storage system, a second server which provides a copy of the stored data to be stored in a second storage system, and a third server which provides another copy of the stored data to be stored in a third storage system, a data management method comprising steps of:
collecting from the second server, information about the copied data stored in the second storage system;
collecting from the third server, information about the another copied data stored in the third storage system;
creating relationship information indicative of associations among the stored data in the first storage system, the copied data stored in the second storage system and the another copied data stored in the third storage system; and
presenting the relationship information associated with stored data identified in a user's request.
2. The data management method of claim 1, wherein the relationship information include location information and/or path information.
3. The data management method of claim 2, wherein the path information include port and switch information.
4. The data management method of claim 1, wherein the relationship information include physical location information and/or virtual location information.
5. The data management method of claim 1, wherein the presenting the relationship information includes displaying the relationship information by graphical user interface.
6. The data management method of claim 1, the first server includes an application server.
US11/733,305 2004-07-13 2007-04-10 Data Management System Abandoned US20070198690A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US11/733,305 US20070198690A1 (en) 2004-07-13 2007-04-10 Data Management System

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US10/890,652 US7206790B2 (en) 2004-07-13 2004-07-13 Data management system
US11/733,305 US20070198690A1 (en) 2004-07-13 2007-04-10 Data Management System

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US10/890,652 Continuation US7206790B2 (en) 2004-07-13 2004-07-13 Data management system

Publications (1)

Publication Number Publication Date
US20070198690A1 true US20070198690A1 (en) 2007-08-23

Family

ID=35600711

Family Applications (2)

Application Number Title Priority Date Filing Date
US10/890,652 Expired - Fee Related US7206790B2 (en) 2004-07-13 2004-07-13 Data management system
US11/733,305 Abandoned US20070198690A1 (en) 2004-07-13 2007-04-10 Data Management System

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US10/890,652 Expired - Fee Related US7206790B2 (en) 2004-07-13 2004-07-13 Data management system

Country Status (2)

Country Link
US (2) US7206790B2 (en)
JP (1) JP4744955B2 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090106407A1 (en) * 2007-10-19 2009-04-23 Hitachi, Ltd. Content transfer system, content transfer method and home server
US20090204648A1 (en) * 2008-02-11 2009-08-13 Steven Francie Best Tracking metadata for files to automate selective backup of applications and their associated data
US8296414B1 (en) * 2007-09-28 2012-10-23 Emc Corporation Techniques for automated application discovery
US20120271934A1 (en) * 2007-12-27 2012-10-25 Naoko Iwami Storage system and data management method in storage system
US20130110785A1 (en) * 2011-10-27 2013-05-02 Hon Hai Precision Industry Co., Ltd. System and method for backing up test data
US20130151475A1 (en) * 2011-12-07 2013-06-13 Fabrice Helliker Data Management System with Console Module
US20130262374A1 (en) * 2012-03-28 2013-10-03 Fabrice Helliker Method Of Managing Data With Console Module

Families Citing this family (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7571387B1 (en) * 2005-09-21 2009-08-04 Emc Corporation Methods and apparatus facilitating management of a SAN
US7613742B2 (en) * 2006-05-02 2009-11-03 Mypoints.Com Inc. System and method for providing three-way failover for a transactional database
US8903883B2 (en) * 2006-05-24 2014-12-02 International Business Machines Corporation Apparatus, system, and method for pattern-based archiving of business events
US8112396B2 (en) * 2006-06-07 2012-02-07 Emc Corporation Backup and recovery of integrated linked databases
US20080104146A1 (en) * 2006-10-31 2008-05-01 Rebit, Inc. System for automatically shadowing encrypted data and file directory structures for a plurality of network-connected computers using a network-attached memory with single instance storage
CA2668074A1 (en) * 2006-10-31 2008-05-08 David Schwaab System for automatically shadowing data and file directory structures that are recorded on a computer memory
US8266105B2 (en) * 2006-10-31 2012-09-11 Rebit, Inc. System for automatically replicating a customer's personalized computer system image on a new computer system
JP5073348B2 (en) * 2007-04-04 2012-11-14 株式会社日立製作所 Application management support system, management computer, host computer, and application management support method
US8209443B2 (en) * 2008-01-31 2012-06-26 Hewlett-Packard Development Company, L.P. System and method for identifying lost/stale hardware in a computing system
US7882246B2 (en) * 2008-04-07 2011-02-01 Lg Electronics Inc. Method for updating connection profile in content delivery service
JP5579195B2 (en) 2008-12-22 2014-08-27 グーグル インコーポレイテッド Asynchronous distributed deduplication for replicated content addressable storage clusters
US9329951B2 (en) 2009-07-31 2016-05-03 Paypal, Inc. System and method to uniformly manage operational life cycles and service levels
JP5357068B2 (en) * 2010-01-20 2013-12-04 インターナショナル・ビジネス・マシーンズ・コーポレーション Information processing apparatus, information processing system, data archive method, and data deletion method
US8935612B2 (en) * 2010-04-07 2015-01-13 Sybase, Inc. Data replication tracing
US8527431B2 (en) 2010-11-18 2013-09-03 Gaurab Bhattacharjee Management of data via cooperative method and system
JP2014031458A (en) * 2012-08-06 2014-02-20 Nsk Ltd Lubricant composition and bearing unit for a hard disc drive swing arm
US10664356B1 (en) * 2013-05-30 2020-05-26 EMC IP Holding Company LLC Method and system for enabling separation of database administrator and backup administrator roles
WO2015126971A1 (en) * 2014-02-18 2015-08-27 Cobalt Iron, Inc. Techniques for presenting views of a backup environment for an organization on a sub-organizational basis

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5065347A (en) * 1988-08-11 1991-11-12 Xerox Corporation Hierarchical folders display
US5388196A (en) * 1990-09-07 1995-02-07 Xerox Corporation Hierarchical shared books with database
US6282602B1 (en) * 1998-06-30 2001-08-28 Emc Corporation Method and apparatus for manipulating logical objects in a data storage system
US6329985B1 (en) * 1998-06-30 2001-12-11 Emc Corporation Method and apparatus for graphically displaying mapping of a logical object
US20020161855A1 (en) * 2000-12-05 2002-10-31 Olaf Manczak Symmetric shared file storage system
US20030009295A1 (en) * 2001-03-14 2003-01-09 Victor Markowitz System and method for retrieving and using gene expression data from multiple sources
US20030115218A1 (en) * 2001-12-19 2003-06-19 Bobbitt Jared E. Virtual file system
US20030185064A1 (en) * 2002-04-02 2003-10-02 Hitachi, Ltd. Clustering storage system
US20040078376A1 (en) * 2002-10-21 2004-04-22 Hitachi, Ltd. Method for displaying the amount of storage use
US20040139128A1 (en) * 2002-07-15 2004-07-15 Becker Gregory A. System and method for backing up a computer system
US6854035B2 (en) * 2001-10-05 2005-02-08 International Business Machines Corporation Storage area network methods and apparatus for display and management of a hierarchical file system extension policy
US7054892B1 (en) * 1999-12-23 2006-05-30 Emc Corporation Method and apparatus for managing information related to storage activities of data storage systems

Family Cites Families (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5495607A (en) * 1993-11-15 1996-02-27 Conner Peripherals, Inc. Network management system having virtual catalog overview of files distributively stored across network domain
US5890165A (en) * 1996-03-29 1999-03-30 Emc Corporation Method and apparatus for automatic discovery of databases
US5953525A (en) * 1997-03-31 1999-09-14 International Business Machines Corporation Multi-tier view project window
US6173293B1 (en) * 1998-03-13 2001-01-09 Digital Equipment Corporation Scalable distributed file system
WO2000004483A2 (en) * 1998-07-15 2000-01-27 Imation Corp. Hierarchical data storage management
US6952823B2 (en) * 1998-09-01 2005-10-04 Pkware, Inc. Software patch generator using compression techniques
US6380957B1 (en) * 1998-12-15 2002-04-30 International Business Machines Corporation Method of controlling view of large expansion tree
JP2000207264A (en) * 1999-01-19 2000-07-28 Canon Inc Backup method and restoring method
US6950871B1 (en) * 2000-06-29 2005-09-27 Hitachi, Ltd. Computer system having a storage area network and method of handling data in the computer system
WO2002029540A2 (en) * 2000-10-06 2002-04-11 Ampex Corporation System and method for transferring data between recording devices
US6912543B2 (en) * 2000-11-14 2005-06-28 International Business Machines Corporation Object-oriented method and system for transferring a file system
US6636878B1 (en) * 2001-01-16 2003-10-21 Sun Microsystems, Inc. Mechanism for replicating and maintaining files in a spaced-efficient manner
US7478096B2 (en) * 2003-02-26 2009-01-13 Burnside Acquisition, Llc History preservation in a computer storage system
US20050049998A1 (en) * 2003-08-28 2005-03-03 International Business Machines Corporation Mechanism for deploying enterprise information system resources
US7287048B2 (en) * 2004-01-07 2007-10-23 International Business Machines Corporation Transparent archiving
US20050267918A1 (en) * 2004-05-28 2005-12-01 Gatev Andrei A System and method for bundling deployment descriptor files within an enterprise archive for fast reliable resource setup at deployment time

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5065347A (en) * 1988-08-11 1991-11-12 Xerox Corporation Hierarchical folders display
US5388196A (en) * 1990-09-07 1995-02-07 Xerox Corporation Hierarchical shared books with database
US6282602B1 (en) * 1998-06-30 2001-08-28 Emc Corporation Method and apparatus for manipulating logical objects in a data storage system
US6329985B1 (en) * 1998-06-30 2001-12-11 Emc Corporation Method and apparatus for graphically displaying mapping of a logical object
US7054892B1 (en) * 1999-12-23 2006-05-30 Emc Corporation Method and apparatus for managing information related to storage activities of data storage systems
US20020161855A1 (en) * 2000-12-05 2002-10-31 Olaf Manczak Symmetric shared file storage system
US20030009295A1 (en) * 2001-03-14 2003-01-09 Victor Markowitz System and method for retrieving and using gene expression data from multiple sources
US6854035B2 (en) * 2001-10-05 2005-02-08 International Business Machines Corporation Storage area network methods and apparatus for display and management of a hierarchical file system extension policy
US20030115218A1 (en) * 2001-12-19 2003-06-19 Bobbitt Jared E. Virtual file system
US20030185064A1 (en) * 2002-04-02 2003-10-02 Hitachi, Ltd. Clustering storage system
US20040139128A1 (en) * 2002-07-15 2004-07-15 Becker Gregory A. System and method for backing up a computer system
US20040078376A1 (en) * 2002-10-21 2004-04-22 Hitachi, Ltd. Method for displaying the amount of storage use

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8296414B1 (en) * 2007-09-28 2012-10-23 Emc Corporation Techniques for automated application discovery
US20090106407A1 (en) * 2007-10-19 2009-04-23 Hitachi, Ltd. Content transfer system, content transfer method and home server
US8819205B2 (en) * 2007-10-19 2014-08-26 Hitachi, Ltd. Content transfer system, content transfer method and home server
US20120271934A1 (en) * 2007-12-27 2012-10-25 Naoko Iwami Storage system and data management method in storage system
US8775600B2 (en) * 2007-12-27 2014-07-08 Hitachi, Ltd. Storage system and data management method in storage system
US20090204648A1 (en) * 2008-02-11 2009-08-13 Steven Francie Best Tracking metadata for files to automate selective backup of applications and their associated data
US20130110785A1 (en) * 2011-10-27 2013-05-02 Hon Hai Precision Industry Co., Ltd. System and method for backing up test data
US8538925B2 (en) * 2011-10-27 2013-09-17 Hong Fu Jin Precision Industry (Shenzhen) Co., Ltd. System and method for backing up test data
US20130151475A1 (en) * 2011-12-07 2013-06-13 Fabrice Helliker Data Management System with Console Module
US9514143B2 (en) * 2011-12-07 2016-12-06 Hitachi Data Systems Corporation Data management system with console module
US20170039113A1 (en) * 2011-12-07 2017-02-09 Hitachi Data Systems Corporation Data management system with console module
US10817384B2 (en) * 2011-12-07 2020-10-27 Hitachi Vantara Llc Data management system with console module
US20130262374A1 (en) * 2012-03-28 2013-10-03 Fabrice Helliker Method Of Managing Data With Console Module

Also Published As

Publication number Publication date
US20060015544A1 (en) 2006-01-19
US7206790B2 (en) 2007-04-17
JP4744955B2 (en) 2011-08-10
JP2006031695A (en) 2006-02-02

Similar Documents

Publication Publication Date Title
US20070198690A1 (en) Data Management System
US20200257595A1 (en) Systems and methods for restoring data from network attached storage
US7406473B1 (en) Distributed file system using disk servers, lock servers and file servers
US9740700B1 (en) Snapshot map
US7320060B2 (en) Method, apparatus, and computer readable medium for managing back-up
US9092378B2 (en) Restoring computing environments, such as autorecovery of file systems at certain points in time
US8032491B1 (en) Encapsulating information in a storage format suitable for backup and restore
EP1522926B1 (en) Systems and methods for backing up data files
US20060230243A1 (en) Cascaded snapshots
US8284198B1 (en) Method for visualizing space utilization in storage containers
US9047352B1 (en) Centralized searching in a data storage environment
US20030065780A1 (en) Data storage system having data restore by swapping logical units
US20100223428A1 (en) Snapshot reset method and apparatus
US8719535B1 (en) Method and system for non-disruptive migration
US9690791B1 (en) Snapshot history map
US20050278383A1 (en) Method and apparatus for keeping a file system client in a read-only name space of the file system
US7228306B1 (en) Population of discovery data
US9165003B1 (en) Technique for permitting multiple virtual file systems having the same identifier to be served by a single storage system
KR20060007435A (en) Managing a relationship between one target volume and one source volume
US7685460B1 (en) Multiple concurrent restore using same user interface
US7613720B2 (en) Selectively removing entities from a user interface displaying network entities
US11442815B2 (en) Coordinating backup configurations for a data protection environment implementing multiple types of replication

Legal Events

Date Code Title Description
STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION