EP2126701A1 - Gestion de données dans un système de stockage de données, réalisée en utilisant des ensembles de données - Google Patents

Gestion de données dans un système de stockage de données, réalisée en utilisant des ensembles de données

Info

Publication number
EP2126701A1
EP2126701A1 EP08725916A EP08725916A EP2126701A1 EP 2126701 A1 EP2126701 A1 EP 2126701A1 EP 08725916 A EP08725916 A EP 08725916A EP 08725916 A EP08725916 A EP 08725916A EP 2126701 A1 EP2126701 A1 EP 2126701A1
Authority
EP
European Patent Office
Prior art keywords
data
storage
data set
management policy
data management
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP08725916A
Other languages
German (de)
English (en)
Inventor
Peter L. Smoot
Jim Holl
Sahn Lam
Colin Johnson
David E. La France
Brian Hackworth
Kostadis Roussos
Jim Voll
Anawat Chankhunthod
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NetApp Inc
Original Assignee
NetApp Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US11/710,190 external-priority patent/US7953928B2/en
Priority claimed from US11/710,202 external-priority patent/US20080208926A1/en
Application filed by NetApp Inc filed Critical NetApp Inc
Publication of EP2126701A1 publication Critical patent/EP2126701A1/fr
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/2053Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant
    • G06F11/2056Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant by mirroring
    • G06F11/2071Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant by mirroring using a plurality of controllers

Definitions

  • the present invention relates to networked data storage systems, and more particularly, to managing data storage using data sets.
  • a networked data storage system can be used for a variety of purposes, such as providing multiple users access to shared data, or facilitating backups or data mirroring.
  • a networked storage system may include a number of storage servers.
  • a storage server may provide services related to accessing and organizing data on mass storage devices, such as disks.
  • Some storage servers are commonly referred to as filers or file servers, as these storage servers provide file-level access to data. Some of these filers further provide clients with sub-file level access to data (e.g., block-level access).
  • An example of such a storage server is any of the Filer products made by Network Appliance, Inc. in Sunnyvale, California.
  • the storage server may be implemented with a special-purpose computer or a general-purpose computer programmed in a particular way.
  • Logical units of storage may be created and manipulated on storage servers, such as files, directories, volumes, logical unit numbers (LUNs). Such logical units are referred to as storage objects in this document. Creating a single storage object is typically fast and easy, but managing a storage object over time can be difficult. A storage administrator has to make numerous decisions, such as how to monitor the available space for the storage object, how to schedule data backups, how to configure backups, whether the data should be mirrored, where data should be mirrored, etc.
  • Answers to the above questions may be provided in a data management policy, and once this policy is decided, the administrator needs to ensure that the policy is correctly implemented on all relevant storage objects, that the required space is available, that the data protection operations succeed, and so forth. If the administrator decides to change the policy (for example, extending the amount of time that backups should be retained), the administrator has to find all the affected storage objects and then manually re-configure all the relevant settings.
  • the present invention includes an apparatus and a method to manage data using data sets.
  • the apparatus includes a conformance checker and a conformance engine to make data sets conform to data management policies.
  • the conformance checker may be operable to compare a state of a data set against a data management policy associated with the data set to determine if the data set currently conforms to the data management policy.
  • the conformance engine may then make the data set conform to the data management policy if the conformance checker determines that the data set currently violates the data management policy.
  • the method includes allowing an administrator of a data storage system to define a data set having a plurality of storage objects and to associate the data set with a data management policy.
  • Each of the storage objects includes a logical representation of a collection of data and replicas of the collection of data.
  • the collection of data is stored in storage containers.
  • the storage containers are managed by storage servers in the data storage system, wherein the storage containers are independent of the logical representation of the collection of data.
  • the method may further include using a storage manager to manage the data set as a single unit according to the data management policy.
  • Figure 1 illustrates an embodiment of a networked storage system
  • Figure 2 illustrates an embodiment of a storage manager
  • Figure 3 A shows a tree graph of one embodiment of a data management policy
  • Figures 3B and 3C illustrate a flow diagram of one embodiment of a process performed by a conformance checker to determine if a data set is in conformance with a data management policy
  • Figure 4 illustrates a functional diagram of an embodiment of a storage manager
  • Figure 5 illustrates a flow diagram of an embodiment of a process to make a data set conform to a data management policy
  • Figures 6A-6C illustrate an embodiment of a series of GUI screens to create a new data set
  • Figure 6D illustrates an embodiment of a GUI screen for applying a data management policy to a data set
  • Figure 7 illustrates a flow diagram of an embodiment of a process to manage data using data sets.
  • the apparatus includes a conformance checker and a conformance engine.
  • the conformance checker is operable to compare a state of a data set against a data management policy associated with the data set to determine if the data set conforms to the data management policy.
  • the conformance engine then makes the data set conform to the data management policy if the conformance checker determines that the data set violates the data management policy.
  • conformance checker and the conformance engine described herein provides great convenience to storage administrators, because the data sets can be checked automatically and frequently without imposing much burden on the administrators. More details about the data sets, storage objects, data management policy, conformance checker, and conformance engine are discussed below.
  • the method includes allowing an administrator of a network data storage system to define a data set having a set of storage objects associated with a data management policy.
  • Each storage object may include a logical representation of a collection of data and replicas of the collection of data.
  • the collection of data is stored in one or more storage containers.
  • the storage containers are managed by one or more storage servers in the data storage system.
  • the storage containers are independent of the logical representation.
  • the method may further include managing the data set as a single unit according to the data management policy using a storage manager.
  • a single unit in the context of the following discussion is a group having one or more members, which may be manipulated by the administrator as a whole without referring to each individual member of the group.
  • the data management policy and any changes thereof are applied to all of the storage objects in the data set.
  • Using data sets and data management policies can vastly reduce the workload of storage administrators, as well as the risk of making errors in deploying changes in the data management policy. More details about the data sets, storage objects, and data management policy are discussed below.
  • FIG. 1 shows a networked data storage system 100 according to some embodiments of the present invention.
  • the system 100 includes client machines 110, 112, and 114, a storage manager 120, a storage manager database 130, a storage server 160, a backup storage server 140, and a mirror storage server 150.
  • the above components can be coupled to each other through one or more networks of various types, such as local area network (LAN), wide area network (WAN), etc.
  • the network connections may be wireline, wireless, or a combination of both.
  • the above components may or may not be located at different geographical locations.
  • data is stored and transferred in units of files in the data storage system 100. Therefore, the system 100 may be a file- based networked storage system.
  • the system 100 can be a network-attached storage (NAS) system that provides clients with access to data at the file level.
  • a NAS system uses file access protocols to retrieve data, such as, for example, Network File System (NFS), or Common Internet File System (CIFS).
  • the files are logically arranged into directories.
  • a volume of storage devices may be mapped to one or more directories.
  • the system 100 may include or be part of a storage area network (SAN), to provide clients with access to data at the block level of storage servers.
  • a block is the basic unit of data used to store data in the SAN.
  • the data storage system 100 may provide clients with access to data at both the block level and the file level.
  • any or all of the components of system 100 and associated hardware may be used in various embodiments of the present invention.
  • other configurations of the networked data storage system may include more or fewer devices than those discussed above.
  • the client machine 110 is used by a storage administrator, and thus, may be referred to as an administrative client.
  • the other client machines 112 and 114 are used by users of the network data storage system 100 to access data, and thus, may be referred to as storage clients.
  • a storage client and an administrative client may not be mutually exclusive, that is, both the administrator and users may use the same client machine in some embodiments.
  • the client machines 110, 112, and 114 may be implemented on personal computers (PCs), laptop computers, special purpose computing devices, etc.
  • the client machine 110 is coupled to the storage manager 120, which is further coupled to the storage manager database 130.
  • the storage manager 120 is a software application which may be implemented on one or more servers, personal computers (PCs), special- purpose computing machines, etc. Details of one embodiment of a machine usable to implement the data manager 120 are shown in Figure 2.
  • the storage manager 120 may include an application programming interface (API) 124 to interface with the client machine 110. Further, the storage manager 120 manages storage using entities called data sets. Details of data sets are discussed below.
  • the storage manager 120 creates a user interface (e.g., graphical user interface (GUI), command line interface (CLI), etc.) and provides the user interface to the client machine 110 via the API 124.
  • GUI graphical user interface
  • CLI command line interface
  • the API 124 may be implemented on a separate server coupled between the storage manager 120 and the client machine 110.
  • the client machine 110 includes a display (e.g., a monitor) to present the user interface (e.g., the GUI 118) to a storage administrator of the data storage system 100 (also commonly referred to as the administrator).
  • the GUI 118 the administrator may input information of data sets and/or data management policies to the storage manager 120.
  • the GUI 118 is presented via a network access application, such as an internet browser, operable on the client machine 110.
  • the storage manager 120 may include a data set support module 122 to manage storage using data sets.
  • the storage manager 120 may further include a user interface module 126 to create the user interface (e.g., graphical user interface (GUI), command line interface (CLI), etc.) and to provide the user interface to the client machine 110 via the API 124.
  • GUI graphical user interface
  • CLI command line interface
  • Some exemplary embodiments of screen displays of the GUI 118 are illustrated in Figures 6A-6D.
  • the term "storage manager" is used herein to encompass all embodiments of storage manager 120.
  • the storage manager 120 Based on the administrator inputs, the storage manager 120 creates, removes, and/or updates data sets, where each data set is associated with a data management policy.
  • Objects representing the data sets and the data management policy are stored in the storage manager database 130.
  • the storage manager database 130 may be implemented using a storage device that stores data persistently, such as a disk, a read-only memory (ROM), etc.
  • the storage manager 120 manages data in the networked data storage system 100. More details of the data sets, data management policies, and data management using data sets are discussed below.
  • the storage manager 120 is further coupled to the storage server 160, the backup storage server 140, and the mirror storage server 150.
  • the storage servers 140, 150, and 160 are shown in Figure 1 as examples of storage servers for illustrative purpose only. Other embodiments of the data storage system may include more or fewer storage servers, each storage server managing a set of physical storage devices, such as magnetic disks, optical disks, tape drives, etc., in different configurations.
  • the storage server 160 manages two disks 162A and 162B.
  • the disks 162 A and 162B may hold various storage containers, either in whole or in part.
  • a storage container is a logical unit for storing data, such as a file, a directory, a volume, a qtree (which is a subset of a volume, optionally associated with a space usage quota), a LUN, etc.
  • the disk 162 A holds two qtrees 164A and 164B.
  • a disk may hold a part of a storage container.
  • a disk may hold part of a volume, where the volume spans multiple disks.
  • the client machines 112 and 114 may access data in the disks managed by the storage server 160.
  • the data may be stored in storage containers of different forms and/or structures, such as qtrees, directories, volumes, etc.
  • the client machine 112 stores data in the qtree 164 A
  • the client machine 114 stores data in the qtree 164B.
  • the storage server 160 may send the data in the qtrees 164 A and 164B to the backup storage server 140, which creates a backup copy of the data in the qtrees 164A and 164B in the disk 142.
  • the backup filer 140 may further mirror the disk 142 onto the disk 152 managed by the mirror storage server 150.
  • the client machine 112 stores data in an internal disk (not shown) and have the internal disk backed up in the disk 142 managed by the backup storage server 140. Note that the above are merely one example of data protection policy topologies. It should be appreciated that many different data protection policy topologies may be implemented in the system 100.
  • the storage manager 120 automatically uses data sets to manage data in the data storage system 100 according to data management policies from the administrator. Details of data sets and the use of such are discussed below. Data Sets and Storage Objects
  • a data set includes a set of storage objects associated with a data management policy.
  • the data management policy is applied to the storage objects in the data set, directing how the administrator wishes the data in the storage objects to be managed as a single unit.
  • a data set is a collection of storage objects grouped by virtue of the storage objects to be managed as a single unit.
  • a storage object may be defined to be a home directory of an employee in a company, which is a member of a data set of the home directories of all employees in the company.
  • the storage objects may be referred to as members of the data set.
  • a storage object may include a logical representation of a collection of data in one or more storage containers and replicas of the collection of data (e.g., a mirrored copy of the data and/or a backed up copy of the data).
  • a logical representation of the storage object of the employee's home directory may be the employee's identification (ID), such as "jsmith.”
  • ID employee's identification
  • the collection of data may be created by users or the administrator of the data storage system 100.
  • the data of a storage object is stored in a storage container or a set of storage containers (e.g., the disk 162A) managed by one or more storage servers (such as the storage server 160) in the data storage system 100.
  • the content of the employee's home directory in the above example may be stored in the qtree 164A in the disk 162 A.
  • Some examples of storage objects include data in qtrees, volumes, directories, etc. These examples may also be referred to as elementary storage objects because they are logical representation of data in basic units of storage in the networked data storage system 100 in the context of data sets. Further, a storage object may be a reference to a collection of elementary storage objects, such as a reference to all volumes on a storage server.
  • the physical implementation of the storage containers are independent of the logical representation of the data.
  • the data is not managed by where the data is stored or how the data is accessed. Rather, the data is managed by the logical representation, which may be associated with the content of the data.
  • the data may be a word processing document, "employee_review.doc" stored in the disk 162 A.
  • the logical representation may be the name of the document (i.e., "employee_review.doc").
  • the storage manager 120 may manage the document by the name of the document (i.e., "employee_review.doc"), rather than by the storage container (i.e., the disk 162A in the current example) or the set of storage containers in which the document is stored.
  • the physical implementation of the disk 162 A is independent of the name of the document (i.e., "employee_review.doc") stored in the disk 162A.
  • the storage object, as well as the data set having the storage object are not bound to any actual physical location or storage container and may move to another location or another storage container over time.
  • the storage containers associated with a data set may become obsolete in performance over time and the storage manager 120 may move the data to a set of new storage containers, with or without alerting the administrator. Any movement of data sets may be substantially transparent from a client perspective in order to provide a separation of the logical representation from the physical location of the data.
  • the storage manager 120 may re-balance resources (e.g., the disks 162 A, 162B, 142, and 152) in the data storage system 100 over time.
  • the data set provides the virtualization of the physical storage containers used to hold the data.
  • a data set includes user created data as well as meta data.
  • Meta data may include information about the user created data. Examples of meta data include exported names, language settings, storage server association, LUN mappings, replication configuration, quotas, policies, consistency groups, etc. Meta data may be used to move or restore the corresponding data set. A complete data set backup is thus useful in handling disaster recovery scenarios. If the storage server (e.g., a filer) which hosts the primary storage set associated with the data set is destroyed, the data set may be reconstructed on another storage server using another storage set that is a replica of the primary storage set to provide client data access without manual configuration by the administrator. [0039].
  • a data set may have two types of membership of the storage objects which it contains, namely static and dynamic membership.
  • Static members are low level storage objects (volumes, directories, LUNs), which could be managed by themselves.
  • the elementary storage objects mentioned above are static members.
  • Dynamic members are references to storage objects which may contain other storage objects. For example, an administrator could add a user's home directory to a data set as a static member. Alternatively, the administrator could realize that a given storage server is only used to hold home directories and add the storage server itself to a data set as a dynamic member. This saves the administrator work later because, as directories are created and destroyed on that storage server, the directories may be dynamically added to or removed from the data set.
  • a data set aggregates the status of its members according to some embodiments of the invention.
  • status parameters include a data availability status, a data protection status, and a data protection policy conformance.
  • the data availability status indicates whether all components of the data set are available for use.
  • the data protection status indicates that all the data set members are being protected by a data protection policy.
  • the data protection policy conformance status indicates that the data protection mechanisms (e.g., snapshots, backups, and mirrors) have been configured in accordance with the data protection policy.
  • the storage manager 120 may roll up the corresponding statuses of members of the data set to derive or to generate a value of the corresponding status of the data set.
  • a status parameter may have a number of levels, each associated with a value.
  • the storage manager 120 may select the maximum value among all the corresponding statuses of the members.
  • a status can have six possible levels: normal, information, warning, error, critical, and emergency, where normal has a value of 1 , information has a value of 2, warning has a value of 3, and so forth.
  • an exemplary data set has three members and, the corresponding status parameter values of which are 2, 3, and 5. Then the storage manager 120 may determine the corresponding status parameter value of the entire data set to be 5, which is the maximum value among the three values.
  • the storage manager 120 may perform various operations on a data set, such as in response to administrator requests. Some examples of such operations include changing or modifying an associated data management policy of a data set, provisioning new members in a data set, listing members in a data set, adding members to a data set, deleting or removing members from a data set, migrating a data set to a different set of storage containers, generating performance views specific to a data set, generating storage usage reports of a data set, setting quota on a data set or individual members within a data set.
  • Some examples of such operations include changing or modifying an associated data management policy of a data set, provisioning new members in a data set, listing members in a data set, adding members to a data set, deleting or removing members from a data set, migrating a data set to a different set of storage containers, generating performance views specific to a data set, generating storage usage reports of a data set, setting quota on a data set or individual members within a data set.
  • a data management policy includes a description of the desired behavior of the associated data set.
  • a data management policy may describe how the storage should be used and configured.
  • One exemplary data management policy is a data protection policy, which describes how storage objects in a data set should be protected.
  • Other examples of data management policies include a performance management policy, a provisioning policy, etc. Attributes associated with a data management policy are abstracted at the highest level possible, allowing implementation of underlying technology to change over time without adversely impacting the administrator. In other words, a layer of abstraction is provided between the administrator and the physical implementation of the storage containers in which the data is stored.
  • the physical implementation may be modified without violating or impacting the data management policy.
  • the administrator may be shielded from the idiosyncrasies of various underlying implementations that allow the data set to use newer technology as it becomes available in an automated fashion.
  • the storage manager 120 may automatically start applying the data management policy associated with the data set to all members in the data set. For instance, the storage manager 120 may configure storage objects in the data set, schedule backup of the storage objects in the data set, etc., according to the data management policy.
  • the storage manager 120 may generate an error message to alert the administrator, who may respond by reassigning the subset of storage object(s) to another data set or by creating a new data set for the subset of storage object(s).
  • a data management policy may be represented by a tree graph having a number of nodes and branches.
  • Figure 3 A shows a tree graph of one embodiment of a data management policy.
  • the tree graph 210 includes nodes 211-216 and branches 251-255.
  • Each node represents a storage object and is coupled to another node via a branch, which describes the relationship between the two corresponding storage objects.
  • branch 253 is marked as a "backup" connection between nodes 212 and 214.
  • storage object represented by node 214 is a backup copy of the storage object 212.
  • the graph 210 represents how the administrator intends to manage data in the data storage system.
  • the data management policy describes attributes of the data in terms that the administrator is comfortable with, and leaves the configuration and choice of technologies to implement the policy to the storage manager 120.
  • the attributes in the policies generally focus on desired data protection behaviors and configuration settings rather than on software technology and hardware choices. Although the choice of hardware may have some impact on the performance and cost of the storage, the physical equipment choices may be driven by a simple label scheme described in more detail below. Examples of the above-mentioned attributes include cost, performance, availability, reliability, type of data protection, capacity related actions, security settings, capabilities, etc.
  • the storage containers in the system 100 are identical to [0047] in some embodiments.
  • tier-1 may be collectively referred to as a resource pool
  • tier-2 may be collectively referred to as a resource pool
  • tier-3 Such labels may be specified as a part of a provisioning policy to limit physical storage resources to a select data set.
  • a data access name may be specified in addition to a policy for the desired behavior of the resulting data set.
  • the data access name is used to configure the necessary export configurations (e.g., NFS, CIFS, ISCSI, FCP, etc.).
  • the data management policy associated with a data set may be explicitly changed by the administrator. For example, in a tiered storage system, as the data in tier-1 storage ages, the relevance or importance of the data may diminish, and thus, the data may be migrated to tier-2 storage from the tier-1 storage. In some embodiments, the administrator may determine which data sets are candidates for migration and associate such data sets with a policy created for data in tier-2 storage.
  • the storage manager 120 automatically starts applying the data management policy associated with the data set to all members in the data set. For instance, the storage manager 120 may configure storage objects in the data set, schedule backup of the storage objects in the data set, etc., according to the data management policy.
  • the storage manager 120 In response to a change in the data set and/or the data management policy, the storage manager 120 automatically checks the data set to determine if the data set still conforms to the policy and if not, the storage manager 120 may re-apply the policy to the data set to make the data set conform to the policy. For example, when the administrator adds a new member to a data set, the storage manager 120 automatically applies the data management policy associated with the data set to the new member.
  • the storage manager 120 when the administrator alters a data management policy associated with a data set, the storage manager 120 automatically identifies the data set associated with the altered data management policy. To identify the affected data sets, the storage manager 120 may access the storage manager database 130 to find the data sets associated to the altered policy. Then the storage manager 120 automatically checks to determine if the storage objects in the data set still conform to the altered data management policy. If not, the storage manager 120 automatically applies the altered policy to the storage objects in the data set. For instance, the storage manager 120 may automatically re-configure the storage servers (e.g., storage servers 140, 150, 160) and/or the storage devices (e.g., disks 142, 152, 162A, 162B), as well as the relationships between the storage servers according to the altered policy.
  • the storage servers e.g., storage servers 140, 150, 160
  • the storage devices e.g., disks 142, 152, 162A, 162B
  • the storage manager 120 may also give the administrator a preview of what actions the storage manager 120 is configured to take to make the data set conform to the data management policy, so that the administrator can confirm the actions are correct before the actions are taken. In some embodiments, the storage manager 120 may find certain situations unresolvable and report these to the administrator for manual resolution.
  • policies may be modified according to disaster recovery requirements or storage attributes, subject to permission allowed via the role based access control mechanism.
  • For cloning a new copy of a policy with identical attributes may be generated using the cloning operation.
  • Using data sets and data management policies as described herein can vastly reduce the workload of storage administrators. There are at least two ways in which using data sets as described herein help reduce manual administrative work and ensure a more reliable policy implementation. [0053] First, using data sets can reduce work by reducing the number of objects a storage administrator has to monitor. While a data center may have hundreds of thousands of directories, these may be classified into a much smaller number of collections and be managed by a smaller number of policies. For example, every user in a large enterprise may have a home directory, but these all need to be managed the same way. Thus, these home directories can be collected into a single data set associated with a data protection policy.
  • the second way a data set reduces work is by automating implementation of and changes to data management policies. For instance, suppose a data center originally decided user home directories should be backed up, but the secondary storage holding the backups did not need further protection. Further, suppose the administrator subsequently decided this was not adequate and that home directory backups should be mirrored to off-site storage. In a conventional environment, this would be a huge task, including, for example, tracking down all the secondary volumes which have ever held home directory backups, provisioning appropriate mirrored storage, configuring all the mirror processes, and monitoring that the mirror operations have been succeeding, etc. Using a data set associated with a data management policy, the administrator only has to modify the data management policy to add a mirroring stage.
  • the storage manager 120 may then perform the tedious task of finding all the volumes which now require mirrors, provision the mirrored storage, and establish the relationships, etc. On an ongoing basis, the storage manager 120 may monitor that the mirrors are working and report a data set wide error status if not.
  • the storage manager 120 may be implemented on a server as illustrated in Figure 2.
  • the storage manager 200 includes a processor 222, a memory 224, a network interface 226, and a storage adaptor 228, which are coupled to each other via a bus system 230.
  • the bus system 230 may include one or more busses and/or interconnects.
  • the storage manager 200 communicates with a network (e.g., the Internet) via the network interface 226, which can be an Ethernet adaptor, fiber channel adaptor, etc.
  • the network interface 226 may be coupled to a public network, a private network, or a combination of both in order to communicate with a client machine (such as the client machine 110 in Figure 1) usable by an administrator of the data storage system.
  • the processor 222 reads instructions from the memory 224 and executes the instructions.
  • the memory 224 may include any of various types of memory devices, such as, for example, random access memory (RAM), read-only memory (ROM), flash memory, one or more mass storage devices (e.g., disks), etc.
  • the memory 224 stores instructions of an operating system 230.
  • the processor 222 may retrieve the instructions from the memory 224 to run the operating system 230.
  • the storage manager 200 interfaces with one or more storage servers (e.g., the storage servers 140, 150, 160 in Figure 1) via the storage adaptor 228, which may include a small computer system interface (SCSI) adaptor, fiber channel adaptor, etc.
  • SCSI small computer system interface
  • Figure 4 illustrates a functional diagram of one embodiment of a storage manager 300, which can represent storage manager 120 in Figure 1.
  • the storage manager 300 includes an API 330, a conformance checker 310, and a conformance engine 320.
  • the API 330 is operatively coupled to the conformance checker 310, which is further operatively coupled to the conformance engine 320.
  • the conformance checker 310 includes a translator 312.
  • the conformance engine 320 includes a storage adaptor 322.
  • the API 330 is communicably coupled to a GUI 340, which may be provided by a client machine (e.g., the client machine 110 in Figure 1).
  • the storage manager 300 is coupled to a database 350, which stores representations of data sets and data management policies, which are also referred to as objects.
  • the conformance checker 310 checks whether the storage objects of a data set conforms to a data management policy associated with the data set.
  • the storage objects of the data set has to have relationships with each other as specified by the data management policy.
  • the data set violates the data management policy if the storage objects are not related to each other as specified in the data management policy.
  • a data set includes three volumes, namely, volume A, volume B, and volume C.
  • the data set is associated with a data management policy, which is a protection policy that specifies some backup and mirroring relationships between the volumes. Specifically, the protection policy specifies that volume A should be backed up on volume B and volume B should be mirrored to volume C.
  • the data management policy (which, in this case, is the protection policy) may specify more details on the relationships between the storage objects (e.g., the frequencies of backup and mirroring). More details on how the conformance checker 310 determines if a data set conforms or violates a data management policy are described below.
  • the GUI 340 receives administrator inputs 301 on data sets and/or data management policies.
  • the administrator inputs 301 may include, for example, a request to apply a data management policy to a data set, a change to an existing data management policy and/or an existing data set, to request to create a new data management policy and/or a new data set, etc.
  • the administrator inputs 301 are typically written in human readable terms, such as words, phrases, etc., in a structured, machine-readable format.
  • the API 330 receives the administrator inputs 301 from the GUI 340 and forwards the inputs 305 to the conformance checker 310. [0060]
  • the translator 312 in the conformance checker 310 translates the administrator inputs 301 into machine-readable terms.
  • a client program running on a client machine invokes the API 330, which may construct an in-memory representation of the data set and its associated data protection policy, which are commonly referred to as objects. Note that these objects are software entities distinct from the storage objects described above.
  • the API 330 invokes the main entry point in the conformance checker 310 with pointers to the objects corresponding to the data set and policy.
  • the conformance checker 310 compares the state of the data set against the data management policy to determine if the data set conforms to the data management policy. For example, the conformance checker 310 iterates through the connections of a tree graph representing the policy as applied to the data set.
  • the conformance checker 310 iterates through the connections represented by branches 251 -255. For instance, the conformance checker 310 may iterate through the tree graph 210 from branch 251 to branch 252, then to branches 253, 254, and 255. Alternatively, the conformance checker 310 may iterate through the tree graph 210 from branch 251 to branches 253 and 254, and then to branch 252, and finally to branch 255. For each connection, the conformance checker 310 compares the states of the storage objects represented by the nodes 211 -216 by making various determinations. A flowchart showing the determinations made by the conformance checker 310 according to one embodiment of the invention is shown in Figures 3B and 3C.
  • the conformance checker 310 determines if each member of the data set corresponding to a source node of the tree graph is protected by a relationship between storage servers according to the policy (block 261). For instance, the conformance checker 310 may look into a configuration file of the data storage system to find out the relationships between the storage servers managing the storage objects. For example, the policy may require volume A to be mirrored to volume B, where volume A is on storage server A and volume B is on storage server B. Then the conformance checker 310 may look into the configuration file of the data storage system to determine if storage server A mirrors volume A onto volume B on storage server B.
  • the conformance checker 310 may check whether there is a mirroring relationship between storage server A and storage server B. If not, then the data set is not in conformance. Otherwise, the conformance checker 310 continues at block 262 to determine if there are any source nodes protected by relationships that do not terminate at a destination node of the tree graph. If there are, then the data set is not in conformance. Otherwise, the conformance checker 310 determines if there are any destination nodes that are end points for relationships not corresponding to the source node (block 263). If there are, then the data set is not in conformance. Next, the conformance checker 310 determines if there is a missing physical relationship in the tree graph (block 264).
  • the conformance checker 310 determines if there is an existing storage object already in the data set to hold a copy of the source node (block 265). If not, the data set is not in conformance. Otherwise, the conformance checker 310 determines if the storage object is large enough to hold the data (block 266). If not, the data set is not in conformance. Next, the conformance checker 310 determines if there is an appropriate destination object (block 267). If so, the data set is in conformance. Otherwise, the conformance checker 310 determines if an appropriate destination object can be constructed (block 268).
  • the conformance checker 310 determines that the data set is not in conformance with the data management policy, the conformance checker 310 generates a task or a set of tasks, which is for making the data set conform to the data management policy.
  • a task includes one or more specific machine-executable or machine-readable instructions to cause a storage server to perform a specific function, such as to create a storage object, to create a relationship between a set of storage objects, to delete a storage object, etc.
  • the conformance checker 310 may generate a task including instructions to provision storage in order to create a new storage object.
  • the task may further include the parameters of what needs to be done in order to make the data set conform to its policy.
  • the translator 312 translates the task list into human readable description 303B, which is sent to the API 330 from the conformance checker 310.
  • each task is associated with a specific piece of code to translate the task into a corresponding human readable description.
  • the API 330 forwards the human readable description 303A of the task list to the GUI 340, which outputs the human readable description 303A of the task list to the administrator.
  • the GUI 340 may output the human readable description 303 A of the task list in a screen display.
  • the administrator may verify the human readable description 303A of the task list and if correct, the administrator may confirm that the human readable description 303A of the task list is correct via the GUI 340.
  • the conformance checker 310 may forward the machine-readable task list 307 to the conformance engine 320.
  • the conformance engine 320 processes tasks in the list 307 in sequence, for example, first-in- first-out (FIFO), sending the appropriate commands or instructions 309 to storage servers (e.g., the storage servers 140, 150, 160 in Figure 1) to re-configure the storage system to comply with the policy.
  • storage servers e.g., the storage servers 140, 150, 160 in Figure 1.
  • the conformance engine 320 is not making decisions based on the policy, rather, the conformance engine 320 is simply processing the tasks on behalf of the conformance checker 310. As such, the conformance engine 320 does not change the sequence of the tasks in the list 307.
  • the conformance engine 320 includes a network adaptor 322 to interface with one or more storage servers in the data storage system, such as storage servers 140, 150, and 160 in Figure 1.
  • the storage servers are managed by the storage manager 300.
  • the network adaptor 322 may issue the instructions in the task list 307 to the relevant storage servers to cause the storage servers to perform functions according to the instructions in order to make the data set to conform to the data' management policy.
  • the instructions from the storage manager 300 may cause a storage server to schedule data backup according to the data management policy, to reconfigure some storage objects managed by the storage server according to the data management policy, to re-allocate storage devices according to the data management policy, etc.
  • the storage manager 300 has made the data set conform to the data management policy.
  • the storage manager 300 described above provides great convenience to the administrator. Since the conformance checker 310 of the storage manager 300 automatically checks for conformance, the data sets in the data storage system can be checked faster and more frequently. This may help to detect issues much sooner than having the administrator to manually check the data sets. Further, if the conformance checker 310 detects an issue, the administrator does not have to manually enter commands to resolve the situation. Rather, the conformance checker 310 generates a list of tasks, which may be executed to make the data sets conform to the data management policy. Manually entering commands or instructions is tedious and error prone. Thus, the storage manager 300 helps to reduce risk in making errors when changing a data management policy.
  • splitting the conformance process into two stages, supported by the conformance checker 310 and the conformance engine 320 respectively, allows the administrator to find out what tasks would be executed before any expensive or irreversible actions are taken.
  • the administrator may cancel tasks before they are executed, should the administrator decide not to accept the tasks recommended by the conformance checker 310.
  • the conformance checker 310 may identify all the data sets using that policy and deduce the operations needed to bring the data sets into conformance. As mentioned above, this may be considered as a dry run of the change in the data management policy. Previewing the dry run allows the administrator to decide whether such a change is too disruptive or expensive before implementing the change in the data management policy.
  • Figure 5 illustrates a flow diagram of one embodiment of a process to manage data in a data storage system using data sets.
  • the process is performed by processing logic that comprises hardware (e.g., circuitry, dedicated logic, etc.), software (such as is run on a general-purpose computer system or a dedicated machine, such as the storage manager 120 in Figure 1), firmware, or any combination of these.
  • hardware e.g., circuitry, dedicated logic, etc.
  • software such as is run on a general-purpose computer system or a dedicated machine, such as the storage manager 120 in Figure 1
  • firmware e.g., firmware, or any combination of these.
  • processing logic receives input from an administrator of the data storage system (processing block 410). Processing logic then checks if there is a request to apply a data management policy to a data set or change to at least one of the data management policy and the data set (processing block 420). If there is no request or change, then processing logic transitions to processing block 490 and the process ends. Otherwise, processing logic transitions to processing block 430.
  • processing logic compares a state of the data set against the data management policy (processing block 430). Then processing logic checks if the data set conforms to the policy (processing block 440). If so, then processing logic transitions to processing block 490 and the process ends. Otherwise, processing logic transitions to processing block 450. [0073] Processing logic then translates the input from the administrator into instructions executable by storage servers (processing block 450). In some embodiments, these instructions are translated back into a human readable task list and output to the administrator for verification. The administrator may accept or reject the task list.
  • processing logic may issue the instructions to the storage servers to cause the storage servers to take actions that would bring the storage objects into conformance with the policy (processing block 460). Then processing logic transitions to processing block 490 and the process ends. Details of some embodiments of the above operations have been described above.
  • FIGs 6A-6C illustrate one embodiment of a series of displays of GUI to enable an administrator to create a new data set.
  • a GUI 610 for creating new data sets is shown.
  • the GUI 610 may be displayed via a window created by the client machine 110 in Figure 1.
  • the GUI 610 includes a field 612 for entry of a name of the data set and a field 614 for entry of the description of the data set.
  • an administrator has input "Accounting Data" as the name of a new data set.
  • the GUI 600 includes additional fields for entry of other attributes or information of the new data set, such as owner, contact, timezone.
  • Figure 6B illustrates one embodiment of a display of a GUI used in creating the new data set.
  • the GUI 620 includes a list 622 of available physical resources (e.g., available directories) to be added into the new data set. The administrator may select from the list 622 of physical resources by clicking onto the particular resource.
  • the GUI 620 further includes a set of user interface controls 624 to allow the administrator to add the selected physical resources to the new data set.
  • the GUI 620 includes a field 626 to display the selected physical resources in the data set.
  • Figure 6C illustrates one embodiment of a display of a GUI used in creating the new data set.
  • the GUI 630 shows a summary of the new data set created using the GUI 610 and 620 in Figures 6A and 6B.
  • the administrator may verify the newly created data set using the GUI 630 and if desired, may return to the GUI 610 and/or 620, to make changes using the user interface control 632.
  • the administrator may confirm the creation of the new data set by actuating the user interface control 634.
  • the administrator may cancel the creation of the new data set by actuating the user interface control 636.
  • FIG. 6D illustrates one embodiment of a display of a GUI for applying a data management policy to a data set.
  • the GUI 640 includes a field 642 displaying data sets created, a field 644 displaying data management policies, a field 646 to display details of a data management policy selected.
  • An administrator may click on a data set in the field 642 to select the data set. For instance, the data set "NY Payroll" is selected in the example shown in Figure 6D.
  • the GUI 640 further includes user interface controls 648 to allow the administrator to add, edit, or delete a data set.
  • the administrator may select one of the data management policies in the field 644 to apply to the selected data set by first clicking on the desired data management policy and the desired data set to select them, and then actuating the "Apply" button 649 to apply the selected policy to the selected data set.
  • the administrator has selected the policy of "Backed up, then mirrored" in the field 644, and details of this policy is displayed in the field 646 in graphics, text, or a combination of both.
  • Figure 7 illustrates a flow diagram of one embodiment of a process to manage data in a data storage system using data sets.
  • the process is performed by processing logic that comprises hardware (e.g., circuitry, dedicated logic, etc.), software (such as is run on a general-purpose computer system or a dedicated machine, such as the storage manager 120 in Figure 1), or a combination of both.
  • hardware e.g., circuitry, dedicated logic, etc.
  • software such as is run on a general-purpose computer system or a dedicated machine, such as the storage manager 120 in Figure 1
  • processing logic creates a GUI to receive inputs from an administrator of the data storage system (processing block 710).
  • Processing logic receives administrator inputs on data sets and/or data management policies (processing block 720).
  • the administrator may provide information via the GUI to define data sets (e.g., names and description of the data set, storage objects to be included in the data set, etc.) and to define data management policies (e.g., a data protection policy).
  • the processing logic organizes storage objects specified by the administrator into data sets based on administrator inputs (processing block 730).
  • Processing logic may store a list of the storage objects in each data set in a persistent store (processing logic 740).
  • processing logic manages each data set as a single unit by applying a corresponding data management policy to the data set (processing block 750). For example, processing logic may apply the data manage policy by configuring the storage objects, scheduling backups of the storage objects, etc., according to the data management policy. [0080] Further, processing logic may determine a value of a status of each data set based on the corresponding status of each storage object in the respective data set (processing block 770). Details of data sets, data management policies, and management of data using such have been described in detail above.
  • the present invention also relates to an apparatus for performing the operations described herein.
  • This apparatus may be specially constructed for the required purpose, or it may comprise a general-purpose computer selectively activated or reconfigured by a computer program stored in the computer.
  • a computer program may be stored in a machine-accessible medium, also referred to as a computer-readable medium, such as, but is not limited to, any type of disk including floppy disks, optical disks, CD-ROMs, and magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, or any type of media suitable for storing electronic instructions, and each coupled to a computer system bus.

Abstract

La présente invention se rapporte à un dispositif et à un appareil appropriés pour la gestion de données en utilisant des ensembles de données. Dans un aspect de l'invention, le dispositif comprend un dispositif de contrôle de conformité ainsi qu'un moteur de conformité qui sont adaptés pour rendre des ensembles de données conformes à des politiques de gestion de données. Le dispositif de contrôle de conformité peut être utilisé afin de comparer un état d'un ensemble de données par rapport à une politique de gestion de données associée à l'ensemble de données. Le moteur de conformité rend alors l'ensemble de données conforme à la politique de gestion de données si le dispositif de contrôle de conformité détermine que l'ensemble de données viole actuellement la politique de gestion de données. Dans un autre aspect, le procédé comprend l'étape consistant à autoriser un administrateur d'un système de stockage de données à définir un ensemble de données comprenant une pluralité d'objets de stockage et à associer l'ensemble de données à une politique de gestion de données. Par ailleurs, le procédé peut comprendre l'utilisation d'un gestionnaire de stockage afin de gérer l'ensemble de données comme un module unique conformément à la politique de gestion de données.
EP08725916A 2007-02-22 2008-02-22 Gestion de données dans un système de stockage de données, réalisée en utilisant des ensembles de données Withdrawn EP2126701A1 (fr)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US11/710,190 US7953928B2 (en) 2007-02-22 2007-02-22 Apparatus and a method to make data sets conform to data management policies
US11/710,202 US20080208926A1 (en) 2007-02-22 2007-02-22 Data management in a data storage system using data sets
PCT/US2008/002326 WO2008103429A1 (fr) 2007-02-22 2008-02-22 Gestion de données dans un système de stockage de données, réalisée en utilisant des ensembles de données

Publications (1)

Publication Number Publication Date
EP2126701A1 true EP2126701A1 (fr) 2009-12-02

Family

ID=39540362

Family Applications (1)

Application Number Title Priority Date Filing Date
EP08725916A Withdrawn EP2126701A1 (fr) 2007-02-22 2008-02-22 Gestion de données dans un système de stockage de données, réalisée en utilisant des ensembles de données

Country Status (3)

Country Link
EP (1) EP2126701A1 (fr)
JP (1) JP2010519646A (fr)
WO (1) WO2008103429A1 (fr)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8438247B1 (en) * 2010-12-21 2013-05-07 Amazon Technologies, Inc. Techniques for capturing data sets
US20120254118A1 (en) 2011-03-31 2012-10-04 Microsoft Corporation Recovery of tenant data across tenant moves
US8775774B2 (en) * 2011-08-26 2014-07-08 Vmware, Inc. Management system and methods for object storage system

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020188592A1 (en) * 2001-06-11 2002-12-12 Storage Technology Corporation Outboard data storage management system and method
EP1430399A1 (fr) * 2001-08-31 2004-06-23 Arkivio, Inc. Techniques de stockage de donnees fondees sur les modalites de stockage
AU2002365580A1 (en) * 2001-11-23 2003-06-10 Commvault Systems, Inc. Selective data replication system and method
JP4196579B2 (ja) * 2002-04-10 2008-12-17 株式会社日立製作所 ストレージ運用管理方法およびシステム
JP2004303190A (ja) * 2003-03-20 2004-10-28 Hitachi Ltd プログラム、情報処理装置、情報処理装置の制御方法、及び記録媒体
WO2005001646A2 (fr) * 2003-06-25 2005-01-06 Arkivio, Inc. Techniques permettant d'effectuer des operations automatisees par une politique
US7581224B2 (en) * 2003-07-10 2009-08-25 Hewlett-Packard Development Company, L.P. Systems and methods for monitoring resource utilization and application performance
US7734561B2 (en) * 2003-12-15 2010-06-08 International Business Machines Corporation System and method for providing autonomic management of a networked system using an action-centric approach
US7397770B2 (en) * 2004-02-20 2008-07-08 International Business Machines Corporation Checking and repairing a network configuration
US7818608B2 (en) * 2005-02-18 2010-10-19 Microsoft Corporation System and method for using a file system to automatically backup a file as a generational file

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of WO2008103429A1 *

Also Published As

Publication number Publication date
WO2008103429A1 (fr) 2008-08-28
JP2010519646A (ja) 2010-06-03

Similar Documents

Publication Publication Date Title
US7953928B2 (en) Apparatus and a method to make data sets conform to data management policies
US11474896B2 (en) Monitoring, diagnosing, and repairing a management database in a data storage management system
US11238173B2 (en) Automated intelligent provisioning of data storage resources in response to user requests in a data storage management system
US10942894B2 (en) Operation readiness checking and reporting
US20200356443A1 (en) Single snapshot for multiple applications
US20080208926A1 (en) Data management in a data storage system using data sets
US10628267B2 (en) Client managed data backup process within an enterprise information management system
US20180373597A1 (en) Live browsing of backed up data residing on cloned disks
US9632874B2 (en) Database application backup in single snapshot for multiple applications
US20150212894A1 (en) Restoring application data from a single snapshot for multiple applications
US20150212895A1 (en) Generating mapping information for single snapshot for multiple applications
US8364640B1 (en) System and method for restore of backup data
US11249863B2 (en) Backup-based media agent configuration
US8321867B1 (en) Request processing for stateless conformance engine
US11615147B2 (en) Mobile storage manager control application for managing a storage manager of an information management system
WO2019213058A1 (fr) Processus de sauvegarde de données géré par un client dans un système de gestion d'informations d'entreprise
EP2126701A1 (fr) Gestion de données dans un système de stockage de données, réalisée en utilisant des ensembles de données
Holl et al. Policy-Driven Management of Data Sets.

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20090918

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MT NL NO PL PT RO SE SI SK TR

DAX Request for extension of the european patent (deleted)
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: EXAMINATION IS IN PROGRESS

17Q First examination report despatched

Effective date: 20161207

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20170419