WO2009064720A2 - Partage de charges, deplacement de fichiers, configuration de reseau et deduplication de fichiers par virtualisation de fichiers - Google Patents

Partage de charges, deplacement de fichiers, configuration de reseau et deduplication de fichiers par virtualisation de fichiers Download PDF

Info

Publication number
WO2009064720A2
WO2009064720A2 PCT/US2008/083117 US2008083117W WO2009064720A2 WO 2009064720 A2 WO2009064720 A2 WO 2009064720A2 US 2008083117 W US2008083117 W US 2008083117W WO 2009064720 A2 WO2009064720 A2 WO 2009064720A2
Authority
WO
WIPO (PCT)
Prior art keywords
file
copy
server
mirror
storage tier
Prior art date
Application number
PCT/US2008/083117
Other languages
English (en)
Other versions
WO2009064720A3 (fr
Inventor
Suma Suresh
Borislav Marinov
Chitra Makkar
Saravanan Coimbatore
Ron S. Vogel
Vladan Z. Marinkovic
Thomas K. Wong
Original Assignee
Attune Systems, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Attune Systems, Inc. filed Critical Attune Systems, Inc.
Publication of WO2009064720A2 publication Critical patent/WO2009064720A2/fr
Publication of WO2009064720A3 publication Critical patent/WO2009064720A3/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/174Redundancy elimination performed by the file system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems

Definitions

  • the inventions described herein relate generally to storage networks and, more particularly, to load sharing, file migration, network configuration, and file deduplication using file virtualization in storage networks. 11/11/2008
  • a "load balancing" cluster file system different nodes in the cluster access the same portion or the entirety of the shared file system. Clients of the file system are either randomly connected to a node, or a group of clients are designated to connect to a specific node. Each node may receive a different load of client requests for file services. If a node is experiencing more requests than other nodes, the node may forward the request to a node with a lower load. Ideally, each node should get similar number of file requests from clients.
  • every node participating in the cluster can contain the authoritative state on any given file system object, every node can be a synchronization point for a file. Since two or more nodes may access the same file at the same time, complex distributed concurrency algorithms are needed to resolve any access conflict. These algorithms are hard to write and take years to become fully reliable to function properly in a production environment.
  • the GPFS file system developed by IBM is an example of a Load Balancing Cluster File System.
  • each cluster node is responsible for serving one or more non-overlapping portions of the cluster file system namespace. If a node receives client requests for data outside the scope of the namespace it is serving, it may forward the request to the node that does service the requested region of the namespace. Since the server nodes do not share overlapped regions of the file system, only a single server will contain the authoritative state of the portion of the file system it serves, a single synchronization point exists. This removes the need for implementing complex distributed concurrency algorithms.
  • Load sharing cluster file systems generally provide such things as: 1) High Availability and Redundancy: Because the file system is configured within a cluster, cluster protection and availability are extended to the file system. 2) Reduced complexity: Since each node has exclusive ownership of the filesystem it servers, implementing a load sharing cluster filesystem becomes much simpler compared to a load balancing cluster file system where complex concurrency 11/11/2008
  • Namespace partitioning allows capacity and performance to expanded as there is need and in the area where the need is greatest, rather than globally in the cluster.
  • the node that exports the entire namespace of the cluster file system will bear the full burden and will get all of the request traffic for the cluster file system. That node must then direct each request to the node that is responsible for the partitioned namespace. This extra hop adds additional latency and introduces a scalability problem.
  • the workload of a Load Sharing Cluster is distributed among the nodes based on how the cluster namespace is partitioned. Certain namespaces may experience more workload than others, creating hotspots in the cluster file system.
  • Microsoft DFS allows administrators to create a virtual folder consisting of a group of shared folders located on different servers by transparently connecting them to one or more DFS namespaces.
  • a DFS namespace is a virtual view of shared folders in an organization.
  • Each virtual folder in a DFS namespace may be a DFS link that specifies a file server that is responsible for the namespace identified by the virtual folder, or it may be another virtual folder containing other DFS links and virtual folders.
  • a file server that exports shared folders may be a member of many DFS namespaces.
  • Each server in a DFS namespace is not aware that the file server is a member of a DFS namespace.
  • DFS creates a loosely coupled distributed file system consisting of one or more file servers that operate independently of each other in the namespace.
  • DFS uses a client-side name resolution scheme to locate the file server that is destined to process file request for a virtual folder in a DFS namespace.
  • the server that exports the DFS namespace in a root virtual folder, the DFS root server will receive all the name resolution request traffic destined for the virtual folder.
  • the clients of a DFS namespace will ask the DFS root server who is the target file server and the shared folder in the file server that corresponds to a DFS virtual folder.
  • the DFS clients is responsible to redirect file requests to the target file server and a new path name constructed from the information obtained from the DFS root server.
  • the DFS root server does not keep track of who are the clients of the exported DFS namespace.
  • clients keep a cache of the association of a virtual folder and its target server and the actual pathname in the target server.
  • the DFS server no longer participates in the network I/O.
  • the client will not contact the DFS root server again for the same name until the cache is stale, usually for about 15 minutes. This methodology allows DFS great efficiency and optimal performance since the client is rapidly connected directly to the target file server.
  • file virtualization is a method for a computer node to proxy client filesystem requests to a secondary storage server that has been virtually represented in the local portion of the file system namespace as a mounted folder.
  • a traditional file system manages the storage space by providing a hierarchical namespace.
  • the hierarchical namespace starts from the root directory, which contains files and subdirectories. Each directory may also contain files and subdirectories identifying other files or subdirectories. Data is stored in files. Every file and directory is identified by a name. The full name of a file or directory is constructed by concatenating the name of the root directory and the names of each subdirectory that finally leads to the subdirectory containing the identified file or directory, together with the name of the file or the directory.
  • the full name of a file thus carries with it two pieces of information: (1) the identification of the file and (2) the physical storage location where the file is stored. If the physical storage location of a file is changed (for example, moved from one partition mounted on a system to another), the identification of the file changes as well. For ease of management, as well as for a variety of other reasons, the administrator would like to control the physical storage location of a file. For example, important files might be stored on expensive, high-performance file servers, while less important files could be stored on less expensive and less capable file servers.
  • File virtualization is a technology that separates the full name of a file from its physical storage location. File virtualization is usually implemented as a hardware 11/11/2008
  • FIG. A-I is a schematic diagram showing an exemplary switched file system including a file switch (MFM).
  • file virtualization provides the following capabilities:
  • file virtualization also allows multiple full filenames to refer to a single file. This is important as it allows existing users to use the old filename while allowing new users to use a new name to access the same file. 3) Allows having one full name to refer to many files
  • filename may refer to many files. Files that are identified by a single filename need not contain identical contents. If the files do contain identical contents, then one file is usually designated as the authoritative copy, while the other copies are called the mirror copies.
  • Cluster file systems may be used to meet strong growth of end user unstructured data needs.
  • Load sharing cluster file system is generally simpler to implement than load balancing cluster file system.
  • a cluster file system that uses partitioned namespace to divide workload among the nodes in a cluster is a better match for the business environment. This is because each organization in a business environment usually has its own designated namespace. For example, engineering department may own the namespace /global/engineering, while the marketing department owns /global/marketing namespace.
  • DFS is good match for a load sharing namespace. Unfortunately, it is hard to maintain configuration consistency among all clients. It also is not a true cluster and does not provide protection from failure.
  • Section A relates to load sharing cluster file systems.
  • Section B relates to non-disruptive file migration.
  • Section C relates to on demand file virtualization for server configuration management with limited interruption.
  • Section D relates to file deduplication using storage tiers.
  • Section E relates to file deduplication using copy-on- write storage tiers.
  • Embodiments of the present invention relate generally to load sharing clusters in which each node is responsible for one or more non-overlapping subset(s) of the cluster namespace and will process only those requests that access file or directory objects in the 11/11/2008
  • partitioned namespace that the node controls while redirecting requests designated for other nodes.
  • Specific embodiments of the present invention are based on using DFS in conjunction with File Virtualization to overcome DFS configuration consistency deficiency as well as to provide cluster protection and availability.
  • Exemplary embodiments use DFS to enable clients to communicate directly with the node in the load sharing cluster that is destined to process the request according to the partitioned namespace that the request is for. Once the namespace for the node is resolved, DFS is essentially out of the picture. DFS resolution is essentially used as a hint. If the DFS configuration is changed and a node receives a request not destined for the node, the node will forward the request to the correct owner, thus overcoming the DFS configuration consistency problem.
  • a method for load sharing in an aggregated file system having a cluster of file storage nodes and a distributed filesystem server (DFS) node, the file storage nodes collectively maintaining a shared storage including a plurality of non-overlapping portions, each file storage node owning at least one of the non-overlapping portions and including for each non- overlapping portion not owned by the file storage node a file virtualization link identifying another file storage node for the non-overlapping portion, the DFS node mapping each non-overlapping portion to a file storage node.
  • DFS distributed filesystem server
  • the method involves generating client requests by a number of client nodes, each client request identifying a non- overlapping portion and directed to a specific file storage node based on an access to the DFS server or information in a client cache; and for each client request received by a file storage node, servicing the client request by the receiving file storage node if the receiving file storage node owns the identified non-overlapping portion and otherwise forwarding the client request by the receiving file storage node to another file storage node identified using the file virtualization links.
  • the method may further involve migrating a specified non-overlapping portion from a source file storage node to a destination file server node, for example, due to reconfiguration of the cluster of based on loading of the source file storage node.
  • Migrating the specified non-overlapping portion may involve establishing a file virtualization link on the destination file server node, the file virtualization link identifying the file storage node that owns the non-overlapping 11/11/2008 portion; updating the cluster resource to map the non-overlapping portion to the destination file storage node; building metadata for the non-overlapping portion on the destination file storage node using sparse files such that all file and directory attributes of the non-overlapping portion are replicated on the destination file storage node without any data, and during such building, forwarding client requests received for the non- overlapping portion by the destination file storage node to the file storage node that owns the non-overlapping portion based on the file virtualization link; after building the metadata for the non-overlapping portion on the destination file storage, copying data for the non-overlapping portion from the source file
  • a method for load sharing by a file storage node in an aggregated file system having a plurality of file storage nodes and a distributed filesystem server (DFS) node, the file storage nodes collectively maintaining a shared storage including a plurality of non- overlapping portions, each file storage node owning at least one of the non-overlapping portions and including for each non-overlapping portion not owned by the file storage node a file virtualization link identifying another file storage node for the non- overlapping portion, the DFS node mapping each non-overlapping portion to a file storage node.
  • DFS distributed filesystem server
  • the method involves receiving, by the file storage node, a client request identifying a non-overlapping portion; when the file storage node owns the identified non- overlapping portion, servicing the client request by the file storage node; and when the file storage node does not own the identified non-overlapping portion, forwarding the client request by the file storage node to another file storage node identified using the file virtualization links 11/11/2008
  • the method may further involve migrating a specified non-overlapping portion from another file storage node.
  • Migrating the specified non-overlapping portion may involve maintaining a file virtualization link to the specified non-overlapping portion on the other file storage node; migrating metadata for the specified non-overlapping portion from the other file storage node; after migrating the metadata, migrating data for the specified non-overlapping portion from the other file storage node; and after migrating the data, breaking the file virtualization link.
  • migrating the metadata the file storage node typically redirects requests for the specified non- overlapping portion to the other file storage node.
  • the file storage node While migrating the data, the file storage node typically services metadata requests for the specified non-overlapping portion from the migrated metadata and forwards data request for the specified non- overlapping portion to the other file storage node. After breaking the file virtualization link, the file storage node typically services requests for the specified non-overlapping portion from the migrated metadata and data. Migrating may be done for at least one of load sharing and hotspot mitigation.
  • a file storage node for use in an aggregated filesystem having a plurality of file storage nodes and a cluster resource, the file storage nodes collectively maintaining a shared storage including a plurality of non-overlapping portions, each file storage node owning at least one of the non-overlapping portions and including for each non-overlapping portion owned by another file storage node a file virtualization link identifying the other file storage node, the cluster resource including for each non-overlapping portion a link mapping the non-overlapping portion to a target file storage node.
  • the file storage node includes a network interface for receiving a client request identifying a non-overlapping portion; and a processor configured to service the client request if the file storage node owns the identified non-overlapping portion and to forward the client request to another file storage node identified using the file virtualization links if the file storage node does not own the identified non-overlapping portion.
  • the processor may be further configured to migrate a specified non-overlapping portion from another file storage node.
  • Migrating the specified non-overlapping portion may involve maintaining a file virtualization link to the specified non-overlapping portion on the other file storage node; migrating 11/11/2008 metadata for the specified non-overlapping portion from the other file storage node; after migrating the metadata, migrating data for the specified non-overlapping portion from the other file storage node; and after migrating the data, breaking the file virtualization link.
  • migrating the metadata the file storage node typically redirects requests for the specified non-overlapping portion to the other file storage node.
  • the file storage node While migrating the data, the file storage node typically services metadata requests for the specified non- overlapping portion from the migrated metadata and forwards data request for the specified non-overlapping portion to the other file storage node. After breaking the file virrualization link, the file storage node typically services requests for the specified non- overlapping portion from the migrated metadata and data. Migrating may be done for at least one of load sharing and hotspot mitigation.
  • FIG. A-I is a schematic diagram showing an exemplary switched file system including a file switch (MFM) as known in the art;
  • MFM file switch
  • FIG. A-2 is a schematic diagram showing a cluster with shared storage as known in the art
  • FIG. A-3 is a schematic diagram showing a file virtualization based aggregated file system as known in the art
  • FIG. A-4 is a schematic diagram showing a clustered DFS namespace as known in the art.
  • FIG. A-5 is a schematic diagram showing a load sharing cluster file system in accordance with an exemplary embodiment of the present invention.
  • FIG. A-6 is a schematic diagram showing client interaction with a load sharing cluster file system in accordance with an exemplary embodiment of the present invention
  • FIG. A-7 is a schematic diagram showing direct client access with forwarding of the request using file virtualization, in accordance with an exemplary embodiment of the present invention. 11/11/2008
  • FIG. A-8 is a schematic diagram showing a situation in which file virtualization is used to forward requests that are misdirected due to stale cache information, in accordance with an exemplary embodiment of the present invention
  • FIG. A-9 is a schematic diagram showing a load sharing cluster file system in accordance with another exemplary embodiment of the present invention.
  • FIG. A-IO is a schematic diagram showing DFS redirection to an available node in accordance with another exemplary embodiment of the present invention.
  • FIG. A-I l is a schematic diagram showing client I/O redirection with file virtualization in accordance with another exemplary embodiment of the present invention.
  • FIG. A- 12 is a schematic diagram showing metadata migration in accordance with another exemplary embodiment of the present invention.
  • FIG. A- 13 is a schematic diagram showing data migration in accordance with another exemplary embodiment of the present invention.
  • FIG. A- 14 is a schematic diagram showing migration completion in accordance with another exemplary embodiment of the present invention.
  • a “cluster” is a group of networked computer servers that all work together to provide high performance services to their client computers.
  • a “node,” “computer node” or “cluster node” is a server computer system that is participating in providing cluster services within a cluster.
  • a “cluster file system” is a distributed file system that is not a single server with a set of clients, but instead a cluster of servers that all work together to provide high performance file services to their clients. To the clients, the cluster is transparent - it is just “the file system", but the file system software deals with distributing requests to elements of the storage cluster.
  • An "active-active file system cluster” is a group of network connected computers in which each computer (node) participates in serving a cluster file system.
  • Embodiments of the present invention relate generally to load sharing clusters in which each node is responsible for one or more non-overlapping subset(s) of the cluster namespace and will process only those requests that access file or directory objects in the 11/11/2008
  • partitioned namespace that the node controls while redirecting requests designated for other nodes.
  • Specific embodiments of the present invention are based on using DFS in conjunction with File Virtualization to overcome DFS configuration consistency deficiency as well as to provide cluster protection and availability.
  • Exemplary embodiments use DFS to enable clients to communicate directly with the node in the load sharing cluster that is destined to process the request according to the partitioned namespace that the request is for. Once the namespace for the node is resolved, DFS is essentially out of the picture. DFS resolution is essentially used as a hint. If the DFS configuration is changed and a node receives a request not destined for the node, the node will forward the request to the correct owner, thus overcoming the DFS configuration consistency problem.
  • each computer node participating in the cluster is the owner of one or more non-overlapped regions of the shared storage.
  • the storage is located on a shared bus such that any node in the cluster can access any storage region, as needed for maintaining cluster file system availability.
  • Each non-overlapped storage region contains a hierarchical filesystem containing a root and a shared folder. The folder is shared using the SMB protocol.
  • FIG. A-2 is a schematic diagram showing a cluster with shared storage.
  • FIG. A-3 is a schematic diagram showing a file virtualization based aggregated file system.
  • FIG. A-4 is a schematic diagram showing a clustered DFS namespace.
  • FIG. A-5 is a schematic diagram showing a load sharing cluster file system in accordance with an exemplary embodiment of the present invention.
  • each file server owns one or more non-overlapping portions (e.g., folders) of the aggregated filesystem and includes file virtualization links to the folders owned by other file servers.
  • Nodel owns folder A and includes file virtualization links to folder B in 11/11/2008
  • Node2 and to folder X in NodeX Node2 owns folder B and includes file virtualization links for folder A in Nodel and to folder X in NodeX, and Node X owns folder X and includes file virtualization links to folder A in Nodel and to folder B in Node2.
  • FIG. A-6 is a schematic diagram showing client interaction with a load sharing cluster file system in accordance with an exemplary embodiment of the present invention.
  • the client first sends an I/O request to the cluster resource (e.g., DFS server) including a file pathname (WCluster ⁇ Share ⁇ B ⁇ file.txt in this example).
  • the cluster resource e.g., DFS server
  • Wluster ⁇ Share ⁇ B ⁇ file.txt in this example
  • the cluster resource maps the file pathname to a file server that owns the file according to its own view of the aggregated filesystem (which may differ from the views maintained by one or more of the file servers for various reasons) and responds with a DFS reparse message that redirects the client to the file server selected by the cluster resource (the file pathname maps to Node2 in this example).
  • the client updates its local MUP Cache to redirect all I/O destined for that particular pathname (i.e., ⁇ Cluster ⁇ Share ⁇ B) to the specified location (i.e., ⁇ Node2 ⁇ Share ⁇ B) and then performs I/O directly to ⁇ Node2 ⁇ Share ⁇ B.
  • FIG. A-7 is a schematic diagram showing direct client access with forwarding of the request using file virtualization, in accordance with an exemplary embodiment of the present invention.
  • FIG. A- 8 is a schematic diagram showing a situation in which file virtualization is used to forward requests that are misdirected due to stale cache information, in accordance with an exemplary embodiment of the present invention.
  • folder B has been moved to Nodel but the client continues to direct requests for the folder to Node2 based on its cached information.
  • Node2 proxies the requests to Nodel using file virtualization. 11/11/2008
  • clients consult the DFS node to identify the target file storage node that owns an unknown file object (e.g., a file object that has never been accessed before by the client).
  • the client sends file accesses to the file object directly to the identified target file storage node.
  • the client may choose not to consult the DFS node again to identify the target node of the known file object until the client deemed it is necessary to consult the DFS node for the known file object again (e.g., when the cache entry expires). Over time, the information in the DFS node, the client caches, and/or the file storage nodes may become mismatched.
  • a client may send a request to a file storage node (e.g., the node that the client thinks still owns the file object) that does not own the file object. Therefore, in embodiments of the present invention, the file storage nodes employ file virtualization techniques to direct misdirected file requests to the proper file storage node. It should be noted that it is possible for the view of the global namespace maintained by a particular file storage node to be incorrect and so one file storage node may misdirect the request to another file storage node that does not own the file object, but each file storage node will use file virtualization to forward the request as appropriate.
  • each participating node of the cluster owns exclusive access to a non-overlapped portion of the shared file system namespace. If a node is experiencing a high number of client requests, it generally cannot distribute any portion of those requests to other nodes in the cluster. This may cause hotspots in the files system, where certain portions of the namespace experience high client request volume. If the high volume of requests causes the node to reach its performance limits, clients may experience degraded performance.
  • This hotspot problem may be mitigated, for example, by moving a portion of the aggregated filesystem from one node to another and/or by repartitioning or dividing the original namespace that experiences the hotspot problem into one or more smaller, non- overlapping sub-namespaces. Additional new nodes may be added to the cluster, or existing under-utilized nodes may be designated to take over the newly created namespaces. Before the reconfiguration of the namespace is complete, the metadata and data must be migrated from the old node to the newly created or existing newly responsible nodes. 11/11/2008
  • certain embodiments of the present invention use file virtualization and DFS redirection as discussed above in combination with a non-disruptive server-side data mirroring/migration technique to permit portions of the aggregated filesystem to be moved among the nodes.
  • These embodiments generally maintain data availability during the migration, so clients in general will not be aware of the server side operations. Since the migration is performed behind the cluster, the path to the data does not change.
  • a load sharing cluster file system is created that supports non-disruptive configuration changes.
  • the clients of this cluster are able to talk to the correct node that is destined to handle their requests, bypassing most of the need for server-side namespace switching and concurrency control, resulting in a very efficient and scalable cluster file system.
  • each cluster node is responsible for serving one or more non-overlapping portions of the cluster file system namespace. If a node receives client requests for data outside the scope of the namespace it is serving, it may forward the request to the node that does service the requested region of the namespace.
  • Each portion of the namespace that a node exclusively serves to clients is an actual folder in the nodes local file system. Portions of the namespace served by other nodes will appear as a virtual folder, internally identifying the file server and the full path name where the virtual folder is located via File Virtualization methods.
  • node 1 is responsible for the namespace ⁇ Cluster ⁇ Share ⁇ A, and ⁇ Cluster ⁇ Share ⁇ B. They appear as real folders ⁇ A and ⁇ B in node 1 respectively.
  • node 1 is not responsible for the namespaces ⁇ Cluster ⁇ Share ⁇ C and ⁇ Cluster ⁇ Share ⁇ D. These namespaces appear as virtual folders ⁇ C and ⁇ D on node 1 respectively.
  • node 1 only receives requests targeted for ⁇ A and ⁇ B. However, if there is an inadvertent request directing at the virtual folder ⁇ C or ⁇ D, possibly because of a delay in propagating a namespace configuration changes, the 11/11/2008
  • node 1 the node that is responsible for ⁇ C (node 2) and ⁇ D (node 3) respectively.
  • the cluster namespace is to be reconfigured so that node 2 is no longer responsible for the partitioned namespace ⁇ Cluster ⁇ Share ⁇ D. Instead, node 3 will be the node to take on the additional responsibility of managing ⁇ Cluster ⁇ Share ⁇ D.
  • the local filesystem folder D on Node2 will need to be migrated to Node3.
  • Node3 will have a file virtualization link to the folder on Node2, as shown in FIG. A-IO.
  • the first step is to use DFS to redirect client requests to the target node that will receive the migrated data. Even though Node3 will use File Virtualization to redirect the data requests to Node2, this will ensure that the namespace remains available during the migration of the data files later in the process. This is illustrated in FIG. A-I l.
  • client I/O requests will all go through Node3 and utilize the server- side file virtualization link from Node3 to Node2.
  • the next step is to build metadata in the local file system folder on Node3 using sparse files such that all file and directory attributes of the original data in Node2 are replicated in Node3 without any of the data, as illustrated in FIG. A-12.
  • the share D on Node2 becomes a "Native with Metadata" Virtualized folder on Node3.
  • This allows Node3 to start serving all metadata requests from the local metadata, such as locking, date and time, user authentication etc. Data requests remain proxied to Node2.
  • the complete Metadata is created on Node3, the data can be mirrored from Node2 to Node3, as illustrated in FIG. A-13.
  • the mirror between Node2 and Node3 is broken, as illustrated in FIG. A- 14. Migration is now complete, and the client load that was completely on Node2 is now distributed between Node2 and Node3.
  • NAS Network Attached Storage
  • file servers provide file services for clients connected in a computer network using networking protocols like 11/11/2008
  • CIFS or any other stateful protocol (e.g., NFS-v4).
  • NFS-v4 any other stateful protocol
  • a common approach to migrate files is to start migrating files while the source server is continued to be accessed and gradually copy all files to the destination server. On the subsequent passes only the newly modified files and directories (since the last pass) are copied and so on. This process is repeated until all files are migrated to the destination server. At this point, the source server is taken offline and replaced with the destination server, thus lowering the amount of time needed to migrate from one server to another. Although this solution lowers the down time it does not completely solve the problem with files that are constantly accessed or held open in exclusive mode. For those files, the user still suffers a visible access interruption and will have to invalidate all of its open handles and suffer service interruption during the migration of those files.
  • File Virtualization is a very powerful server management tool that normally is used for mirroring and load balancing for virtualized systems.
  • Native Volume with Metadata is the only known way to bring File Virtualization to places where preserving the user's native directory structure is a must.
  • Using File mirroring over Native Volume with Metadata is an excellent way to provide non-disruptive migration for storage servers.
  • a method and file switch for non-disruptive migration of a native mode volume from a source server to a destination server involves converting, by the file switch, the source native volume to a native with metadata volume using a local file system managed by the file switch; converting, by the file switch, the native with metadata 11/11/2008
  • the destination server including a mirror copy of the native with metadata volume; removing, by the file switch, the source server from the mirrored native with metadata volume; and converting, by the file switch, the mirror copy of the native with metadata volume on the destination server to a destination native volume on the destination server.
  • converting the source native volume to the native with metadata volume may involve for each source directory in the source native volume, creating a corresponding local directory in the local file system including metadata associated with the source directory copied from the source native volume; and for each source file in the source native volume, creating a corresponding local sparse file in the local file system including file attributes copied from the source native volume but excluding the file contents associated with the source file.
  • the metadata associated with the source directory copied from the source native volume may include directory security descriptors.
  • Creating a local directory for a source directory may involve opening the source directory in the source native volume; placing a lock on the source directory; and creating the local directory and its metadata.
  • Converting the native with metadata volume to the mirrored native with metadata volume may involve for each local directory, creating a corresponding destination directory in the destination server and maintaining a mapping of the local directory to a source directory pathname for the corresponding source directory in the source server and to a destination directory pathname for the corresponding destination directory in the destination server; and for each local file, creating a corresponding destination file in the destination server including file data copied from the source native volume and maintaining a mapping of the local file to a source file pathname for the corresponding source file in the source server and to a destination file pathname for the corresponding destination file in the destination server.
  • Each mapping may include an indicator of the number of servers associated with the mirrored native with metadata volume.
  • Removing the source server from the mirrored native with metadata volume may involve disabling usage of the source destination pathnames and the source file pathnames.
  • Converting the mirror copy of the native with metadata volume on the destination server to a destination native volume may involve replicating state information for the destination directories and the 11/11/2008
  • Converting the mirror copy of the native with metadata volume on the destination server to a destination native volume further may involve deleting unneeded metadata associated with the mirror copy of the native with metadata volume from the destination server.
  • FIG. B-I is a schematic block diagram of a two server system demonstrating file access from multiple clients
  • FIG. B-2 is a schematic block diagram of a two server system where one of the servers is taken off the grid for migration;
  • FIG. B-3 is a schematic block diagram of a two server system where one of the servers was replaced by the new server after all files were copied from the old one;
  • FIG. B-4 depicts the process sequence of server migration with minimal interruption
  • FIG. B-5 depicts the process sequence of non-disruptive server migration
  • FIG. B-6 is a practical example of a sample global namespace including the metadata information and how the global name-space is used to calculate the target path;
  • FIG. B-7 is a practical example of a sample global namespace including the metadata information and how the global name-space is used to calculate the target paths;
  • FIG. B-8 is a logic flow diagram for non-disruptive file migration by a file switch in accordance with an exemplary embodiment of the present invention.
  • An “aggregator” is a file switch that performs the function of directory, data, or namespace aggregation of a client data file over a file array.
  • a “file switch” is a device (or group of devices) that performs file aggregation, transaction aggregation, and directory aggregation functions, and is physically or logically positioned between a client and a set of file servers.
  • the file switch appears to be a file server having enormous storage capabilities and high throughput.
  • the file switch appears to be a client.
  • the file switch directs the storage of individual user files over multiple file servers, using mirroring to improve fault tolerance as well as throughput.
  • the aggregation functions of the file switch are done in a manner that is transparent to client devices.
  • the file switch preferably communicates with the clients and with the file servers using standard file protocols, such as CIFS or NFS.
  • the file switch preferably provides full virtualization of the file system such that data can be moved without changing path names and preferably also allows expansion/contraction/replacement without affecting clients or changing pathnames.
  • Attune System's Maestro File Manager (MFM), which is represented in FIG. B-5, is an example of a file switch.
  • Switched File System A "switched file system" is defined as a network including one or more file switches and one or more file servers.
  • the switched file system is a file system since it exposes files as a method for sharing disk storage.
  • the switched file system is a network file system, since it provides network file system services through a network file protocol—the file switches act as network file servers and the group of file switches may appear to the client computers as a single file server.
  • Native File System A "native file system" is defined as the native file system exposed by the back-end servers. Native mode. A “native mode” of operation is a mode of operation where the backend file system is exposed to the clients through the file switch such that the file switch completely preserves the directory structure and other metadata of the back end server. Each file server (share) represents a single mount point in the global namespace exposed by the file switch.
  • File A file is the main component of a file system.
  • a file is a collection of information that is used by a computer. There are many different types of files that are used for many different purposes, mostly for storing vast amounts of data (i.e., database files, music files, MPEGs, videos).
  • Files There are also types of files that contain applications and programs used by computer operators as well as specific file formats used by different applications. Files range in size from a few bytes to many gigabytes and may contain any type of data. Formally, a file is a called a stream of bytes (or a data stream) residing on a file system. A file is always referred to by its name within a file system. 11/11/2008
  • a "user file” is the file or file object that a client computer works with (e.g., read, write, etc.), and in some contexts may also be referred to as an "aggregated file.”
  • a user file may be mirrored and stored in multiple file servers and/or data files within a switched file system.
  • File/Directory Metadata A "file/directory metadata,” also referred to as the “the metadata,” is a data structure that contains information about the position of a specific file or directory including, but not limited to, the position and placement of the file/directory mirrors and their rank.
  • ordinary clients are typically not permitted to directly read or write the content of "the metadata", the clients still have indirect access to ordinary directory information and other metadata, such as file layout information, file length, etc..
  • the existence of "the metadata” is transparent to the clients, who need not have any knowledge of "the metadata” and its storage.
  • a "mirror” is a copy of a file. When a file is configured to have two mirrors, that means there are two copies of the file.
  • Oplock An oplock, also called an "opportunistic lock" is a mechanism for allowing the data in a file to be cached, typically by the user (or client) of the file. Unlike a regular lock on a file, an oplock on behalf of a first client is automatically broken whenever a second client attempts to access the file in a manner inconsistent with the oplock obtained by the first client. Thus, an oplock does not actually provide exclusive access to a file; rather it provides a mechanism for detecting when access to a file changes from exclusive to shared, and for writing cached data back to the file (if necessary) before enabling shared access to the file.
  • This section relates generally to migrating file data from one storage server to another in a non-disruptive manner using a stateful network file protocol such as CIFS.
  • CIFS stateful network file protocol
  • FIGs. B-I, B-2, and B-3 demonstrate how the standard (non-optimized) file migration is done.
  • FIG. B-I is a schematic block diagram of network file system before the beginning of the migration.
  • Client 11 to Client Im are regular clients that connect to the two back-end servers (Serverl 1 and Serverl2) through a regular IP switch over a 11/11/2008
  • FIG. B-4 depicts the minimal disruption migration. All accessible files are migrated from Server41 to Server43. Since the process can take a long time, some of the files may get changed during migration. In the second step, those files are migrated (again). Step two is repeated until all files are migrated or until the amount of data remaining to be migrated falls under a predetermined amount. Finally, the migration is completed in a way similar to the regular migration: in Step n+1 Server41 and Server43 are taken offline. In step n+2, the remaining files are copied to the destination. In the final step (n+3), the server is renamed to the name of the source server and the destination server is brought on-line (n+4).
  • Embodiments of the present invention described below utilize file virtualization in order to provide non-disruptive file/server migration.
  • non- disruptive file migration can be summarized in four general steps:
  • a native volume is a basic virtualized representation of a share from the back-end server. Its content (directories and files) are completely managed by the hosting file server. Clients can access the virtualized volume through the global namespace or directly by accessing the back-end server.
  • a native volume with metadata is a natural extension of the native volume mode with the ability to keep additional metadata information for each file/directory. "The metadata" will keep at least the following information: the number of mirrors and a list of the destinations where the file/directory mirror is placed.
  • NTFS directory is used for storing all information about the native volume.
  • the whole remote namespace (without the file data) is replicated inside this directory.
  • All file attributes (including security, EA, file size, etc) are preserved on all mirrors as well as in the file switch namespace.
  • FIG. B-6 is a practical example of a sample global namespace including the metadata information and how the global name-space is used to calculate the target path.
  • FIG. B-7 is a practical example of a sample global namespace including the metadata information and how the global name-space is used to calculate the target paths.
  • OPEN EXISTING FILE/DIRECTORY When an open operation comes, the operation is performed initially over the local NTFS file. This allows the file security permissions to be evaluated locally and force evaluation of the sharing mode. If it succeeds, the metadata is read, to get the file placement and mirrors after which the open operation is forwarded simultaneously to all mirrors. When all mirrors complete the open, the open operation is completed back to the client.
  • READ/WRITE OPERATIONS - Data operations are submitted simultaneously to all mirrors with the operation sent to the mirrors in their rank order. When all of them complete the operation is acknowledged to the client. No read/write data is stored on the local disk so there is no need to send data operations to it.
  • RANGE-LOCK OPERATIONS - Advisory range-locks or mandatory range-locks may be implemented. 11/11/2008
  • range-lock requests are sent only to the local NTFS volume.
  • range-lock requests are sent to the local file and after it succeeds it is sent to all mirrors.
  • the local file acts as an arbiter for resolving range-lock conflicts and deadlocks.
  • OPPORTUNISTIC LOCK (OP-LOCK) OPERATIONS - Oplock operations are submitted to local file and all mirrors in parallel.
  • oplock break operations are the only operations that treats status pending as an acknowledgement that the operation completed successfully (i.e., processing it in a work item or from a different thread is unacceptable).
  • DIRECTORY CHANGE NOTIFICATIONS - Directory operations are submitted to all mirrors. Pass back the response when it comes. If there is no request to be completed, MFM saves the responses in their arrival order. When a new dir-change- notification request comes, it will pick the first pending response and complete it to the client, the next one will pick the next pending and so on. It is possible for the client to receive more than one break notification for the same change - one for the local metadata and one for each of the mirrors. This behavior is acceptable since the directory notifications are advisory and not time sensitive. The worst that can happen is the client will have to reread the state of the affected files. If there is no pending completion, than we submit directory change notification request to all mirrors that have no pending directory notification.
  • the file switch In order to convert the Native Volume to a Native with metadata, all access to the back end server that is being converted will go through the file switch, i.e., the file switch is an in-band device. There should be no file access that does not go through it. A data corruption is possible in case files are been modified/accessed not through the file switch. The file switch cannot not enforce that the access to the backend servers is done only through the file switch.
  • Conversion from native to extended native is done by walking down the source directory tree and converting the volume directory by directory.
  • Each directory operation usually is run by a single execution thread.
  • the execution thread opens the source directory, places a batch oplock on the source directory, so it can be notified in case someone changes it. In case the batch 11/11/2008
  • the directory is enumerated and for each of the files found a sparse file is created in the local file system.
  • the sparse file size corresponds to the actual file size. All other file attributes (time, attributes, security descriptors and EAs) are copied as well.
  • the creation of "the metadata" for the file completes the conversion of the file.
  • the directory oplock break status is checked after processing each directory entity (file and/or directory). The status of the oplock break is not checked during the batch adding of the sub-directories to the directory processing queue since this operation is entirely local and is executed almost instantaneously. All security descriptors are copied verbatim (without looking into it) except for the top level directory. The root directory security descriptor is converted to effective security descriptor and than set in the local NTFS directory. This would allow the sub- entities to properly inherit their security attributes from their parents.
  • the number of simultaneously processed directories can be limited to a predefined number to avoid slowing the system down due to over-parallelism.
  • the in memory structures of the currently opened files and directories maintained by the file switch (FIG. B-5) needs to be modified to comply with the requirements of the native with metadata volume structure.
  • some operations may require a temporal suspension of all operations over the affected entity (file or directory). In this case the access to the file/directory is suspended, the system waits for all outstanding operations (except range- 11/11/2008
  • the temporary access suspension is at most several hundreds of milliseconds long, which is comparable to the network latency, and thus would not affect the applications using those files even if they are actively using the opened file.
  • the range-lock requests can be handled in one of two possible ways: as advisory locks or as mandatory locks (Windows default). If advisory range-locks are supported, access to the file is suspended temporarily, and all range-lock requests are submitted to the local NTFS volume on the File Switch after which all pending requests on the source file are cancelled. Once cancelled access to the file is restored. If mandatory range-locks are supported, access to the file is suspended, and all range-lock requests are submitted to local NTFS volume first, followed by the range-lock requests being submitted to the other file mirrors. After the range-locks are granted, access to the file is restored.
  • Converting opportunistic lock operations from Native to Native Volume with metadata involves submitting an oplock to the local NTFS volume in order to make it compliant with the expected model. CONVERTING ACTIVE DIRECTORY ENUMERATION - Since directory operation is a relatively short operation, there really is nothing special that needs to be done here. The operation would be completed eventually and then served the proper way.
  • RENAME OPERATIONS There are four different rename operation combinations based on the file conversion state and the destination directory conversion state: both are converted, both are not converted; only the source is converted, and only the destination is converted. None special is needed if both are converted. If the source is converted but the destination directory does not exist in the local NTFS volume, the destination directory is created in the local volume and the rename/move operation is performed on the native volume and on the NTFS volume. If the destination directory is converted, but the local file is not, the file is converted after the rename operation completes. If the destination directory is converted, but the local directory is not, the directory name is added to the list of directories that require conversion.
  • the rename operation is executed over the native volume only. After the operation completed, the destination directory is checked one more time and in case the destination directory suddenly becomes converted, and the entity is a file, metadata is created for it; if the entity is a directory, it is added to the list of directories that require conversion. This behavior is done to ensure that an entity conversion will not be missed.
  • CONVERTING DIRECTORY CHANGE NOTIFICATIONS Converting the directory change notifications from Native to Native Volume with metadata involves submitting a directory change notification to the local NTFS volume in order to make it compliant with the expected model.
  • the directory operations and walking the tree is very similar to converting the volume to extended-native mode. For each directory found, a new destination directory is created and all directory attributes are copied there as well.
  • filter oplocks are not supported across the network. If this filter oplock gets broken because someone opened the file, the mirroring process is stopped, the uncompleted mirrors are deleted, and the file is put on a list for later attempts to mirror.
  • an open file mirroring is performed.
  • the process starts by creating an empty file where the new mirrors are placed and begins to copy file data.
  • the file data is read sequentially from the beginning of the file until the end of the file and is written to all of the mirrors (please note that no file size increase is allowed during this phase).
  • all client write (and file size change) requests are replicated and sent to all mirrors.
  • reading the data from the source and writing it to the mirror(s) is performed while user access to this file is suspended. The suspension is once again performed for a relatively small interval so as not be noticed by the user (or application).
  • the file handle state is propagated to the new mirror as well.
  • This state includes but is not limited to: mirror file handle, range-locks and oplocks. Range-locks are replicated to all mirrors only if mandatory range-locks are supported; otherwise, there is nothing more that needs to be done if only advisory locks are supported.
  • any directory change notifications request needs to be resubmitted to the new mirror as well.
  • Convert back to a native with metadata volume is done atomically by programmatically setting the source server state to "force-removed", changing a global state to removing a mirror and logging off from the server. All operations pending on this server would be completed by the backend server and the file switch will silently "eat” them without sending any of them to the client.
  • the source server references can be removed from "the metadata”: the directory operations and walking the tree is very similar to the way the data mirrors are rebuild described at "Creating/rebuilding data mirrors for Native mode with Metadata Volume”. Only the metadata structure is updated by removing the source server references from “the metadata”. Finally, the in-memory data handle structures are updated to remove any references to the source server. All those operations can be performed with no client and/or application disruption.
  • Converting starts by going through all currently opened handles and replicating the opened state (e.g. range locks directory notifications, oplocks, etc.) over the native volume.
  • opened state e.g. range locks directory notifications, oplocks, etc.
  • ALL access to the specified server set is temporarily suspended and all open files/directories on the local NTFS directory are closed (any operations failed/completed due to the close are ignored).
  • the global state of the volume is set to a pure native volume so all new open/creates should go to the native volume only. Finally, access to the volume is restored.
  • the metadata directory can be moved to a separate NTFS directory where all files and directories containing "the metadata" can be deleted and associated resources can be freed.
  • NAS Network Attached Storage file servers
  • CIFS Network Attached Storage
  • NFS-v4 stateful protocol
  • Many companies utilize various file Virtualization Appliances to provide better storage utilization and/or load balancing.
  • Those devices usually sit in the data path (in-band) between the clients and the servers and present a unified view of the name spaces provided by the back-end server. From the client perspective, this device looks like a single storage server; for the back-end servers, the device looks like a super client that runs a multitude of users. Since the clients cannot see the back-end servers, the virtualization device is free to move, replicate, and even take offline any of the user's data, thus providing the user with a better user experience.
  • In-line file virtualization is the next big thing in Storage but it does come with some drawbacks. It is difficult to almost impossible to insert the Virtualization Appliance in the data path without visibly interrupting user and/or application access to the back- end servers. Removing the Virtualization Appliance without disruption is as difficult as placing it in-line.
  • the administrator faces the challenge of inserting the virtualization appliance without or with very limited interruption to user's access to the backend servers.
  • the administrator is able to eliminate the user interruption and only in a very few cases cause an interim disruption the access of the user to the back end servers when a Virtualization Appliance is inserted in the data path between the clients machine(s) and the backend servers.
  • the method involves configuring a global namespace of the virtualization appliance to match a global namespace exported by the distributed filesystem server; and updating the distributed filesystem server to redirect client requests associated with the global namespace to the virtualization appliance.
  • DFS distributed file system
  • the method may further involve, after updating the distributed filesystem server, ensuring that no clients are directly accessing the file servers; and thereafter sending an administrative alert to indicate that insertion of the virtualization appliance is complete.
  • Ensuring that no clients are directly accessing the file servers may involve identifying active client sessions running on the file servers; and ensuring that the active client sessions include only active client sessions associated with the virtualization appliance.
  • the virtualization appliance may be associated with a plurality of IP addresses, and ensuring that the active client sessions include only active client sessions associated with the virtualization appliance may involve ensuring that the active client sessions include only active client sessions associated with any or all of the plurality of IP addresses. Ensuring that no clients are directly accessing the file servers 11/11/2008
  • the method may further involve automatically reconfiguring a switch to create a VLAN for the virtualization appliance.
  • the distributed filesystem server may be configured to follow the Distributed File System standard.
  • Connecting a virtualization appliance to the storage network may include connecting a first switch to a second switch, wherein the first switch is connected to at least one file server; connecting the virtualization appliance to the first switch; connecting the virtualization appliance to the second switch; and for each file server connected the first switch, disconnecting the file server from the first switch and connecting the file server to the second switch.
  • a method for removing a virtualization appliance logically positioned between client devices and file servers in a storage network having a distributed filesystem server involves sending a global namespace from the virtualization appliance to the distributed filesystem server; and configuring the virtualization appliance to not respond to any new client connection requests received by the virtualization appliance.
  • the method may further involve disconnecting the virtualization appliance from the storage network after a predetermined final timeout period.
  • the method may also involve for any client request associated with an active client session received by the virtualization appliance during a predetermined time window, closing the client session.
  • the predetermined time window may be between the end of a first timeout period and the predetermined final timeout period.
  • the distributed filesystem server may be configured to follow the Distributed File System standard.
  • FIG. C-I is a schematic block diagram of a three server DFS system demonstrating file access from multiple clients;
  • FIG. C-2 is a schematic block diagram of a virtualized three server system
  • FIG. C-3 depicts the process sequence of adding the Virtualization Appliance to the network
  • FIG. C-4 depicts the process sequence of removing direct access between the client machines and the back-end servers
  • FIG. C-5 depicts the process sequence of restoring direct access between the client machines and back-end servers
  • FIG. C-6 is a logic flow diagram for logically inserting a virtualization appliance between client devices and file servers in a storage network, in accordance with an exemplary embodiment of the present invention.
  • FIG. C-7 is a logic flow diagram for removing a virtualization appliance from a storage network, in accordance with an exemplary embodiment of the present invention.
  • File virtualization is a technology that separates the full name of a file from its physical storage location.
  • File virtualization is usually implemented as a hardware appliance that is located in the data path (in-band) between clients and the file servers.
  • a file Virtualization Appliance appears as a file server that exports the namespace of a file system. From the file servers' perspective, the file Virtualization Appliance appears as just a beefed up client machine that hosts a multitude of users.
  • Virtualization Appliance A "Virtualization Appliance” is a network device that performs File Virtualization. It can be in-band or out-of-band device.
  • DFS Distributed File System
  • a.k.a. DFS Distributed File System
  • DFS allows the clients to access the closest server based on a server ranking system.
  • DFS does not provide any data replication, so in this case some other (non-DFS) solution should be used to ensure the consistency of the user data between the different copies of user data.
  • Embodiments of the present invention relate generally to a method for allowing a file server, with limited interruption, to be in the data path of a file virtualization 11/11/2008
  • Embodiments enable file virtualization to allow on-demand addition and removal of file servers under control by the file virtualization. As a result, out-of-band file servers can enjoy the benefit of continuous availability even during namespace reconfiguration.
  • FIG. C-I demonstrates how the standard DFS based virtualization works.
  • Client 11 to Client 14 are regular clients that are on the same network with the DFS server (DFS 1) and the back-end servers (Serverl 1 to Serverl3).
  • the clients and the servers connect through a standard network file system protocol CIFS and/or NFS over a TCP/IP switch based network.
  • the Clients are accessing the global name space presented by the DFSl server.
  • the client When a client wants to access a file, the client sends its file system request to the DFS server (DFSl) which informs the client that the file is being served by another server. Upon this notification, the client forms a special DFS request asking for the file placement of the file in question.
  • the DFS server instructs the client what portion of the file path is served by which server and where on that server this path is placed.
  • the client stores this information in its local cache and resubmits the original request to the specified server. As long as there is an entry in its local cache, the client would never ask the DFS to resolve another reference for an entity residing within that path.
  • the cache expiration timeout is specified by the DFS administrator and by default is set to 15 minutes. There is no way for the DFS server to revoke a cached reference or purge it from a client's cache.
  • the administrators force a reboot on the client's machines or log-in to those machines, install and run a special utility that flushes the whole DFS cache for all of the servers this client is accessing, which in turn forces the client to consult the DFS server the next time it tries to access that/any file from the global namespace.
  • FIG. C-2 illustrates the basic operations of a small virtualized system that consists of four clients (Client21 to Client24), three back-end servers (Server21 to Server23) a Virtualization Appliance, and couple of IP switches 21 and 22.
  • the Virtualization Appliance 2 resolves the file/directory path to a server, a server share, and a path and dispatches the client request to the appropriate back-end server 21, 22 or 23. Since the client 21 does not have direct access to the back-end servers 21-23, the Virtualization Appliance 2 can store the files and the corresponding directories at any place and in whatever format it wants, as long as it preserves the user data.
  • Some of the major functions include: moving user files and directories without user access interruptions, mirror the user files, load balancing, and storage utilization, among others.
  • FIG. C-3 demonstrates how the virtualization device is added to the physical network.
  • the process includes manually bringing a virtualization device and an IP switch in a close proximity to the rest of the network and manually connecting them to the network.
  • the same operation can be repeated with the rest of the servers.
  • the administrator can do the hardware reconfiguration during scheduled server shut down and this way he doesn't have to worry how fast he can perform the hardware reconfiguration.
  • the above operations can be performed programmatically without any physical disconnect by simply reconfiguring the switch 31 to create two separate VLANs, one to represent switch 31 and one for switch 32.
  • FIG. C-4 describes the steps by which the Virtualization Appliance 4 is inserted in the data path with no interruption or minimal interruption to users.
  • the operation begins with the Virtualization Appliance 4 reading the DFS configuration from (DFS4, stepl) configuring its global namespace to match the one exported by the DFS server 4 (step2) and updating the DFS server 4 configuration (step3) to redirect all of its global namespace to the Virtualization Appliance 4. This would guarantee that any opens after the clients cache expires would go through the Virtualization Appliance 4 (step 4).
  • a Virtualization Appliance 4 can utilize to make sure that clients do not access the back-end servers. This is performed (in step5) by going to the back-end servers 41-43 and obtaining the list of user sessions established. There should be no other sessions except the sessions originated through one of the IP addresses of the Virtualization Appliance 4.
  • the Virtualization Appliance 4 can send an administrative alert (e-mail, SMS, page) to indicate that the insertion has been completed, so the administrator can physically disconnect the two switches 41 and 42 (step 7).
  • the Virtualization Appliance 4 can reconfigure the switch to separate the two VLANs. 11/11/2008
  • the Virtualization Appliance can kick the user off of a predetermined server by sending a session close command (step ⁇ ) to the server on which the user was logged on. This would force the user's machine to reestablish the session which triggers a refresh on the affected cache entries.
  • the session can be killed since the client does not have any state other than the session itself, which the client's machine can restore without any visible impact. If the user has been idle for a prolonged interval of time (e.g. 2 hours), this is an indication that the user session can be forcefully closed.
  • the Virtual Appliance 4 can perform a survey, monitoring the amount of open files and traffic load coming from the offending users and present the administrator with the option to trigger a session close when the user has the least amount of files and/or traffic. This way, the impact on the particular user would be minimized.
  • Another alternative is for the Virtualization Appliance 4 is to send an e- mail/SMG/page to the offending users, requesting them to reboot if twice the maximum specified timeout has expired.
  • Removing the Virtualization Appliance (FIG. C-5) is significantly easier than inserting it into the network.
  • the process begins with the administrator physically reconnecting the two switches (switch51 and switch52, stepl). After that, the virtual device restores the initial DFS configuration (step2) and stops responding to any new connection establishments. In case some changes to the back-end file and directory placements are made, the Virtualization Appliance has to rebuild the DFS configuration based on the new changes. After a while, all clients will log off from the Virtualization Appliance and connect directly to the back-end servers (steps3,4,5,6). 11/11/2008
  • the Virtualization Appliance can start kicking users off by applying the principles used when the appliance was inserted into the data path.
  • the administrator can safely power-down and disconnect the Virtualization Appliance from both switches (step7 and step8).
  • FIG. C-6 is a logic flow diagram for logically inserting a virtualization appliance between client devices and file servers in a storage network, in accordance with an exemplary embodiment of the present invention.
  • a global namespace of the virtualization appliance is configured to match a global namespace exported by the distributed filesystem server.
  • the distributed filesystem server is updated to redirect client requests associated with the global namespace to the virtualization appliance.
  • the virtualization appliance ensures that no clients are directly accessing the file servers and in block 608 thereafter sends an administrative alert to indicate that insertion of the virtualization appliance is complete.
  • FIG. C-7 is a logic flow diagram for removing a virtualization appliance from a storage network, in accordance with an exemplary embodiment of the present invention.
  • a global namespace is sent from the virtualization appliance to the distributed filesystem server.
  • the virtualization appliance is configured to not respond to any new client connection requests received by the virtualization appliance.
  • the virtualization appliance closes the client session.
  • the virtualization appliance is disconnected from the storage network after a predetermined final timeout period.
  • employees tend to keep copies of all of the necessary documents and data that they access often. This is so that they can find the documents and data easily (central locations tend to change at least every so often). Furthermore, employees also tend to forget where certain things were found (in the central location), or never even knew where the document originated (they are sent a copy of the document via email). Finally, multiple employees may each keep a copy of the latest mp3 file, or video file, even if it is against company policy.
  • Deduplication is a technique where files with identical contents are first identified and then only one copy of the identical contents, the single-instance copy, is kept in the physical storage while the storage space for the remaining identical contents is reclaimed and reused.
  • deduplication achieves what is called “Single-Instance Storage” where only the single-instance copy is stored in the physical storage, resulting in more efficient use of the physical storage space. File deduplication thus creates a domino effect of efficiency, reducing capital, administrative, and facility costs and is considered one of the most important and valuable technologies in storage.
  • US patents 6389433 and 6477544 are examples of how a file system provides the single-instance-storage.
  • deduplicated file the file system creates a partial or full copy of the single-instance copy, and the update is allowed to proceed only after the (partial) copied data has been created and only on the copied data.
  • the delay to wait for the creation of a (partial) copy of the single-instance data before an update can proceed introduces significant performance degradation.
  • the process to identify and dedupe replicated files also puts a strain on file system resources. Because of the performance degradation, deduplication or single-instance copy is deemed not acceptable for normal use. In reality, deduplication is of no (obvious) benefit to the end-user.
  • deduplication is of no (obvious) benefit to the end-user.
  • File system level deduplication offers many advantages for the IT administrators. However, it generally offers no direct benefits to the users of the file system other than performance degradation for those files that have been deduped. Therefore, the success of deduplication in the market place depends on reducing performance degradation to an acceptable level.
  • deduplication is usually done on a per file system basis. It is more desirable if deduplication is done together on one or more file systems. For example, the more file systems that are deduped together, the more chances that files with identical contents will be found and more storage space will be reclaimed. For example, if there is only one copy of file A in a file system, file A will not be deduped. On the other hand, if there is a copy of file A in another file system, then together, file A in the two file systems can be deduped. Furthermore, since there is only one single-instance copy for all of the deduplicated files from one or more file systems, the more file systems that are deduped together, the more efficient the deduplication process becomes.
  • a method and an apparatus for deduplicating files in a file storage system having a primary storage tier and a secondary storage tier.
  • file deduplication involves identifying a 11/11/2008 plurality of files stored in the primary storage tier having identical file contents; copying the plurality of files to the secondary storage tier; storing in the primary storage tier a single copy of the file contents; and storing metadata for each of the plurality of files, the metadata associating each of the file copies in the secondary storage tier with the single copy of the file contents stored in the primary storage tier.
  • identifying the plurality of files stored in the primary storage tier having identical file contents may involve computing, for each of the plurality of files, a hash value based on the contents of the file; and identifying the files having identical file contents based on the hash values.
  • Storing the single copy of the file contents in the primary storage tier may involve copying the file contents to a designated mirror server; and deleting the remaining file contents from each of the plurality of files in the primary storage tier.
  • the read access may be directed to the single copy of the file contents maintained in the primary storage tier.
  • the association between the file copy in the secondary storage tier and the single copy of the file contents stored in the primary storage tier may be broken the file copy stored in the secondary storage tier may be modified.
  • the modified file copy subsequently may be migrated from the secondary storage tier to the primary storage tier based on a migration policy.
  • deduplicating a selected file in the primary storage tier may involve determining whether the file contents of the selected file match the file contents of a previously deduplicated file having a single copy of file contents stored in the primary storage tier; when the file contents of the selected file match the file contents of a previously deduplicated file, deduplicating the selected file; otherwise determining whether the file contents of the selected file match the file contents of a non-duplicate file in the first storage tier; and when the file contents of the selected file match the file contents of a non-duplicate file, deduplicating both the selected file and the non-duplicate file.
  • Determining whether the file contents of the selected file match the file contents of a previously deduplicated file may involve comparing a hash value associated with the selected file to a distinct hash value associated with each single copy of file contents stored in the primary storage tier.
  • Deduplicating the selected file may involve copying the selected file to the secondary storage tier; deleting the file contents from the selected file; and storing metadata for the selected file, the metadata associating the file copy in 11/11/2008
  • Deduplicating both the selected file and the non-duplicate file may involve copying the selected file and the non-duplicate file to the secondary storage tier; storing in the primary storage tier a single copy of the file contents; and storing metadata for each of the first and second selected files, the metadata associating each of the file copies in the secondary storage tier with the single copy of the file contents stored in the primary storage tier.
  • Storing the single copy of the file contents for deduplicating both the selected file and the non-duplicate file may involve copying the file contents to the designated mirror server; and deleting the remaining file contents from the selected file and the non-duplicate file.
  • Deduplication may be implemented in a file switch or other device that manages file storage.
  • FIG. D-I is a logic flow diagram for file deduplication using storage tiers in accordance with an exemplary embodiment of the present invention
  • FIG. D-2 is a logic flow diagram deduplicating a selected file in accordance with an exemplary embodiment of the present invention.
  • This section relates generally to a method for performing deduplication on a global namespace using file virtualization when the global namespace is constructed from one or more storage servers, and to enable deduplication as a storage placement policy in a tiered storage environment.
  • a traditional file system manages the storage space by providing a hierarchical namespace.
  • the hierarchical namespace starts from the root directory, which contains files and subdirectories. Each directory may also contain files and subdirectories identifying other files or subdirectories. Data is stored in files. Every file and directory is identified by a name. The full name of a file or directory is constructed by concatenating the name of the root directory and the names of each subdirectory that finally leads to the subdirectory containing the identified file or directory, together with the name of the file or the directory.
  • the full name of a file thus carries with it two pieces of information: (1) the identification of the file and (2) the physical storage location where the file is stored. If the physical storage location of a file is changed (for example, moved from one partition mounted on a system to another), the identification of the file changes as well.
  • Deduplication is of no obvious benefit to the end users of a file system. Instead of using deduplication as a management policy to reduce storage space and subsequently cause inconvenience to the end users of the deduplicated files, this invention uses deduplication as a storage placement policy to intelligently managed the storage assets of an enterprise, with relatively little inconvenience to the end users.
  • a set of file servers is designated as tier 1 where data stored in these file servers is considered more important to the enterprise.
  • Another (typically non-overlapping) set of file servers is designated as tier 2 storage where data stored in these file servers is considered less important to the business.
  • the system administrators can spend more time and resources to provide faster access and more frequent backup on the data stored on the tier 1 file servers.
  • Deduplication typically is treated as one of the storage placement policies that decides where data should be stored, e.g., on a tier 1 or tier 2 file server.
  • duplicated data is automatically moved from tier 1 to tier 2.
  • the total storage space used by the deduplicated data on tier 1 and tier 2 remains the same (or perhaps even increases slightly). However, there is more storage space available on tier 1 file servers as a result of deduplication, since all the duplicated data is now stored on tier 2.
  • tier 1 and tier 2 file servers There may be performance differences between tier 1 and tier 2 file servers. However, these differences tend to be small since the relatively inexpensive file servers are still very capable.
  • One of the tier 1 file servers is designated as a mirror server where all of the mirror copies are stored.
  • Read access to a deduplicated file is redirected to the deduplicated file's mirror copy.
  • the association from the deduplicated file stored in a tier 2 server to its mirror copy that is stored in a tier 1 server is discarded. Accesses to the "modified" duplicated file will then resume normally from the tier 2 file server.
  • the "modified" deduplicated file is then migrated back to tier 1 storage.
  • Extending file virtualization to support deduplication is relatively straight forward.
  • a set of tier- 1 servers is identified as a target for deduplication, and a set of tier 2 servers is identified for receiving deduplicated data.
  • One of the tier 1 file servers is chosen as the mirror server.
  • the mirror server is used to store the mirror copy of each set of deduplicated files with identical contents.
  • a background deduplication process typically is run periodically within the file virtualization appliance to perform the deduplication.
  • Exemplary embodiments use a shal digest computed from the contents of a file to identify files that have identical contents.
  • a shal digest value is a 160-bit globally unique value for any given set of data (contents) of a file. Therefore, if two files are identical in contents (but not necessarily 11/11/2008
  • the file is migrated to a tier 2 file server according to the storage placement policy.
  • the migrated file is marked as deduplicated, and a mirror association is created between the migrated file and its mirror copy.
  • a deduplicated file When a deduplicated file is open for read, a check is made to see if there is a mirror copy stored in the mirror server. If there is, subsequent read requests on the deduplicated file will be switched to the mirror server for processing. Otherwise, the read request is switched to the tier 2 file server containing the actual data of the deduplicated file.
  • FIG. D-I is a logic flow diagram for file deduplication using storage tiers in accordance with an exemplary embodiment of the present invention.
  • a deduplication device e.g., a file switch
  • the deduplication device copies the plurality of files to the secondary storage tier.
  • the deduplication 11/11/2008 e.g., a file switch
  • FIG. D-2 is a logic flow diagram deduplicating a selected file in the primary storage tier in accordance with an exemplary embodiment of the present invention.
  • the deduplication device determines whether the file contents of the selected file match the file contents of a previously deduplicated file having a single copy of file contents stored in the primary storage tier.
  • the deduplication device deduplicates the selected file in block 306, for example, by copying the selected file to the secondary storage tier, deleting the file contents from the selected file, and storing metadata for the selected file associating the file copy in the secondary storage tier with the single copy of the file contents for the previously deduplicated file stored in the primary storage tier.
  • the deduplication device determines whether the file contents of the selected file match the file contents of a non-duplicate file in the first storage tier in block 308.
  • the deduplication device deduplicates both the selected file and the non- duplicate file, for example, by copying the selected file and the non-duplicate file to the secondary storage tier, storing in the primary storage tier a single copy of the file contents, and storing metadata for each of the first and second selected files associating each of the file copies in the secondary storage tier with the single copy of the file contents stored in the primary storage tier.
  • the deduplication device may add the selected file a list of non-duplicate files.
  • file deduplication as discussed herein may be implemented using a file switches of the types described above and in the provisional patent application referred to by Attorney Docket No. 3193/114. It should also be noted that embodiments of the present invention may incorporate, utilize, supplement, or be 11/11/2008
  • Section D discloses a method of deduplication where duplicated files in one or more file servers in tier- 1 storage are migrated to one or more file servers in tier-2 storage.
  • the storage space occupied by duplicated files in tier- 1 storage is reclaimed, while storage space in less expensive tier-2 storage is consumed for storing the duplicated files migrated from tier- 1.
  • a mirror copy from each set of duplicated files is left in the tier- 1 storage for maintaining read performance. The performance degradation that exists on update operation on deduplicated file is eliminated since COW is not needed.
  • the deduplication method specified in the co-pending application does not actually save total storage space consumed by the duplicate files, it makes it easier for end-users to accept deduplication since they will experience, at most, a very minor inconvenience. Furthermore, the number of files in tier- 1 storage is reduced by deduplication, resulting in faster backup of tier- 1 file servers. However, in some cases, the actual removal of all duplicated files is unlikely to cause any inconvenience to end-users. For example, the contents of music or image files are never changed once created and are therefore good candidates for deduplication. In another case, files that have not been accessed for a long time are also good candidates, since they are unlikely to be changed again any time soon.
  • deduplication It would be desirable to achieve deduplication with acceptable performance. It is even more desirable to be able to dedupe across more file systems to achieve higher deduplication efficiency. Furthermore, to reduce inconvenience experienced by end- users due to the performance overhead of deduplication, deduplication itself should be able to be performed on a selected set of files, instead of on every file in one or more selected file servers. Finally, in the case where end-users are unlikely to experience 11/11/2008 inconvenience due to deduplication, deduplication should result in less utilization of storage space by eliminating the storage of identical file copies.
  • Deduplicating files involves associating a number of files from the primary storage tier with a copy-on-write storage tier having a designated mirror server and deduplicating the files associated with the copy-on-write storage tier, such deduplicating including storing in the designated mirror server of the copy-on-write storage tier a single copy of the file contents for each duplicate and non-duplicate file associated with the copy-on-write storage tier; deleting the file contents from each deduplicated file in the copy-on-write storage tier to leave a sparse file; and storing metadata for each of the files, the metadata associating each sparse file with the corresponding single copy of the file contents stored in the designated mirror server.
  • Associating a number of files from the primary storage tier with a copy-on- write storage tier alternatively may involve marking the number of files as being associated with the copy-on-write storage tier, wherein the copy-on-write storage tier is a virtual copy-on-write storage tier.
  • Associating a number of files from the primary storage tier with a copy-on-write storage tier may involve maintaining a set of storage policies identifying files to be associated with the copy-on-write storage tier and associating the number of files with the copy-on-write storage tier based on the set of storage policies.
  • Storing a single copy of the file contents for each duplicate and non-duplicate file may involve determining whether the file contents of a selected file in the copy-on-write storage tier match the file contents of a previously deduplicated file having a single copy 11/11/2008
  • Determining whether the file contents of a selected file in the copy-on- write storage tier match the file contents of a previously deduplicated file having a single copy of file contents stored in the designated mirror server may involve comparing a hash value associated with the selected file to a hash values associated with the single copies of file contents for the previously deduplicated files stored in the designated mirror server.
  • Deduplicating files may further involve purging unused mirror copies from the designated mirror server.
  • Identifying mirror copies in the designated mirror server that are no longer associated with existing files in the copy-on-write storage tier may involve constructing a list of hash values associated with existing files in the copy-on-write storage tier; and for each mirror copy in the designated mirror server, comparing a hash value associated with the mirror copy to the hash values in the list of hash values, wherein the mirror copy is deemed to be an unused mirror copy when the hash value associated with the mirror copy is not in the list of hash values.
  • the method may further involve processing open requests for files associated with the copy-on-write storage tier, such processing of open requests comprising: receiving from a client an open request for a specified file associated with the copy-on-write storage tier; when the specified file is a non-deduplicated file: creating a copy-on-write file handle for the specified file; marking the copy-on-write file handle as ready; and returning the copy-on write file handle to the client; 11/11/2008
  • the mirror file handle for the mirror copy may be obtained from the designated mirror server based on hash values associated with the specified file and the mirror copy.
  • the contents of the specified file may be filled from the copy of the file contents stored in the designated mirror server using a background task.
  • the method may further involve processing file requests for files associated with the copy-on-write storage tier. Such processing may involve: receiving from the client a file request including the copy-on-write file handle; when the copy-on-write file handle is marked as not ready: suspending the file request until the contents of the specified file have been refilled from the mirror copy; marking the copy-on-write file handle as ready if the contents of the specified file have been refilled successfully; and 11/11/2008
  • FIG. E-I is a logic flow diagram for file deduplication using copy-on-write storage tiers in accordance with an exemplary embodiment of the present invention.
  • Embodiments of the present invention relate generally to using a copy-on-write storage tier to reclaim storage space of all duplicated files and recreate the contents of a duplicated file from its mirror copy when an update is about to occur on the duplicated file.
  • a traditional file system manages the storage space by providing a hierarchical namespace.
  • the hierarchical namespace starts from the root directory, which contains files and subdirectories. Each directory may also contain files and subdirectories identifying other files or subdirectories. Data is stored in files. Every file and directory is identified by a name. The full name of a file or directory is constructed by concatenating the name of the root directory and the names of each subdirectory that 11/11/2008 finally leads to the subdirectory containing the identified file or directory, together with the name of the file or the directory.
  • the full name of a file thus carries with it two pieces of information: (1) the identification of the file and (2) the physical storage location where the file is stored. If the physical storage location of a file is changed (for example, moved from one partition mounted on a system to another), the identification of the file changes as well.
  • Embodiments of the present invention utilize a Copy-On- Write (COW) storage tier in which every file in any of the file servers in the storage tier is eventually deduplicated, regardless whether there is any file in the storage tier that has identical contents. This is in contrast with the typical deduplication, where only files with identical contents are deduped.
  • COW Copy-On- Write
  • Storage policies are typically used to limit the deduplication to only a set of files selected by the storage policies that apply to a synthetic namespace comprising one or more file servers. For example, one storage policy may migrate a specified class of files (e.g., all mp3 audio and jpeg image files) to a COW storage tier. Another example is that all files that have not been referenced for a specified period of time (e.g., over six months) are migrated to a COW storage tier. Once the files are in the COW storage tier, deduplication is done on every file, regardless whether any file with duplicated contents exists.
  • a specified class of files e.g., all mp3 audio and jpeg image files
  • a specified period of time e.g., over six months
  • extending file virtualization to support deduplication using the COW storage tier operates generally as follows. First, a synthetic 11/11/2008
  • namespace is created via file virtualization, and is comprised of one or more file servers.
  • a set of storage policies is created that selects a set of files from the synthetic namespace to be migrated to the COW storage tier.
  • a set of file servers are selected to be in the COW storage tier.
  • One of the file servers in a COW storage tier will also act as a mirror server.
  • a mirror server is storage that may contain the current, past, or both current and past mirror copies of the authoritative copy of files stored at the COW storage tier.
  • each mirror copy in the mirror sever is associated with a hash value, e.g., identified by a 160-bit number, which is the shal digest computed from the contents of the mirror copy.
  • a shal digest value is a globally unique value for any given set of data (contents) of a file.
  • the mirror server is a special device. While it can be written, the writing of it is only performed by the file virtualization appliance itself, and each write to a file is only done once. Users and applications only read the contents of files stored on the mirror server. Basically, the mirror server is a sort of write once, read many (WORM) device. Therefore, if the mirror server were replicated, users and applications could read from any of the mirror servers available.
  • WORM write once, read many
  • a file is stored in a COW storage tier, the file will eventually be deduplicated. For example, if there is no update made to any files in a COW storage tier, then after a certain duration, all files in the COW storage tier will be deduped. After a file is deduped, the file becomes a sparse file where essentially all of the file's storage space is reclaimed while all of the file's attributes, including its size, remain.
  • a background deduplication process typically is run periodically within the file aggregation appliance to perform the deduplication.
  • An exemplary deduplication process for a COW storage tier is as follows: 11/11/2008
  • the deduplication process logs the full name of the single file together with the error code in a log file. The deduplication process will continue with the next file stored in the COW storage tier.
  • the shal digest is retrieved from the metadata of the file.
  • the file handle from opening a file in the COW storage tier is called the COW file handle. Notice that once a COW file is deduped, it becomes a sparse file and does not contain any data.
  • the shal digest is retrieved from the metadata and the shal digest for the file is then used to obtain a mirror file handle from the mirror server. If a mirror file handle is returned, the mirror file handle is associated with the COW file handle and the COW file handle is marked as ready.
  • a file request When a file request is sent to the MFM, it includes a COW file handle. Exemplary steps for handling a file identified by the COW file handle are as follows:
  • the request will be suspended until the COW file handle is ready (i.e. the file to be opened is made non-sparse, and the data from the mirror copy was copied into the original file in the COW storage).
  • the mirror file handle is used to retrieve the data. Otherwise, the COW file handle is used to retrieve the data. The result from either the COW file or the mirror server is returned to the user.
  • the COW file handle is used to write the data to the COW storage.
  • Some enterprises or locations may not have multiple storage tiers available to setup a copy-on-write storage tier, or not have enough available storage in an available tier to store the large amount of mp3 and image files that a storage policy would dictate be stored on the copy-on-write storage tier.
  • a new storage tier is just that, a new storage tier to create and manage.
  • an alternative embodiment removes the restriction that the copy-on- write storage tier is a separate and real physical storage tier.
  • the copy-on-write storage tier may just be some part of another storage tier, such as tier- 1 or tier-2 storage, thus becoming a virtual storage tier.
  • files could be marked as a part of the virtual storage tier by virtue of a metadata flag, hereafter referred to as the COW flag. If the COW flag is false, the file is just a part of the storage tier the file resides within. If the COW flag is true, the file is not part of the storage tier the file resides within. Rather, the file is part of the virtual copy-on-write storage tier.
  • a set of storage policies is created that selects a set of files from the synthetic namespace to be migrated to the virtual COW storage tier. If the files already reside on the tier which co-resides with the virtual COW storage tier, then no actual migration is performed. Rather, the COW flag within the metadata indicating that the file has been migrated to the virtual COW storage tier is set. If the file resides on a different storage tier than the virtual COW storage tier, then a physical migration is performed to the COW storage tier. Again, the COW flag within the metadata indicating that the file has been migrated to the virtual COW storage tier is set. Alternatively, there may be a single virtual COW storage tier for all physical storage tiers within the namespace.
  • the file will eventually be deduped. In other words, if there is no update made to any files in a virtual COW storage tier, then after a certain duration, all files in the virtual COW storage tier will be deduped. After a file is deduped, the file becomes a sparse file where all of the file's storage space is reclaimed while all of the file's attributes, including its size, remain. Since the file just resides within a regular storage tier, the storage space that is reclaimed is the valuable tier storage space the file used to occupy.
  • a background deduplication process typically is run periodically within the MFM to perform the deduplication.
  • An exemplary deduplication process for a virtual COW storage tier is as follows: 11/11/2008
  • the deduplication process logs the full name of the single file together with the error code in a log file.
  • the deduplication process will continue with the next file stored in the storage tier (or namespace).
  • An exemplary process to dedupe a single file (as called by the deduplication process above) is essentially unchanged from the process described above.
  • An exemplary process to dedupe a single file is as follows:
  • the file handle from opening a file in the virtual COW storage tier is called the COW file handle. Notice that once a COW file is deduped, it becomes a sparse file and does not contain any data. Also notice that this COW file handle is really the normal file handle for opening the file in its normal place.
  • the shal digest is retrieved from the metadata and the shal digest for the file is then used to obtain a mirror file handle from the mirror server. If a mirror file handle is returned, the mirror file handle is associated with the COW file handle and the COW file handle is marked as ready.
  • the request will be suspended until the COW file handle is ready (i.e. the file to be opened is made non-sparse, and the data from the mirror copy was copied into the original file in the COW storage).
  • the COW file handle is used to write the data to the COW storage.
  • the in-user mirror list in an actual embodiment may be implemented as a hash table, a binary tree, or using other data structures commonly used by the people skilled in the art to achieve acceptable find performance. 11/11/2008
  • the mirror server completely fills up (even though past mirror copies are purged). Therefore, the mirror server should be as large as possible, to accommodate at least one copy of all files that can exist in the COW storage tier. Otherwise, the mirror server may run out of space, and further deduplication will not be possible.
  • the related application entitled Remote File Virtualization Data Mirroring a mechanism to purge mirror copies from the mirror server (any mirror copy can be purged at any given time, since an authoritative copy exists elsewhere) discusses a process for purging past mirror copies from the mirror server.
  • Such purging of in-use mirror copies generally cannot be used in embodiments of the present invention. This is because a file that has been deduped in the COW storage tier only exists as a sparse file (no data in the file) and as a mirror copy. Thus, the mirror copy is actually the authoritative copy of the data contents of the deduped file.
  • An in-use mirror copy is not purged because, among other things, it is difficult to locate and restore the contents of all the COW files that have the same identical mirror copy.
  • FIG. E-I is a logic flow diagram for file deduplication using copy-on-write storage tiers in accordance with an exemplary embodiment of the present invention.
  • the file virtualization appliance associates a number of files from the primary storage tier with a copy-on-write storage tier having a designated mirror server.
  • the file virtualization appliance stores in the designated mirror server a single copy of the file contents for each duplicate and non-duplicate file associated with the copy-on- write storage tier.
  • the file virtualization appliance deletes the file contents from each deduplicated file in the copy-on-write storage tier to leave a sparse file.
  • the file virtualization appliance stores metadata for each of the files, the metadata associating each sparse file with the corresponding single copy of the file contents stored in the designated mirror server.
  • the file virtualization appliance purges unused mirror copies from the designated mirror server from time to time.
  • the file virtualization appliance processes open requests for files associated with the copy-on-write storage tier including creating COW files handles for such files.
  • the file virtualization appliance processes file requests for files associated with the COW storage tier based on COW file handles. 11/11/2008
  • file deduplication as discussed herein may be implemented using a file switches of the types described above and in the provisional patent application referred to by Attorney Docket No. 3193/114. It should also be noted that embodiments of the present invention may incorporate, utilize, supplement, or be combined with various features described in one or more of the other referenced patent applications.
  • a device may include, without limitation, a bridge, router, bridge-router (brouter), switch, node, server, computer, appliance, or other type of device.
  • Such devices typically include one or more network interfaces for communicating over a communication network and a processor (e.g., a microprocessor with memory and other peripherals and/or application-specific hardware) configured accordingly to perform device functions.
  • Communication networks generally may include public and/or private networks; may include local-area, wide-area, metropolitan-area, storage, and/or other types of networks; and may employ communication technologies including, but in no way limited to, analog technologies, digital technologies, optical technologies, wireless technologies (e.g., Bluetooth), networking technologies, and internetworking technologies.
  • communication technologies including, but in no way limited to, analog technologies, digital technologies, optical technologies, wireless technologies (e.g., Bluetooth), networking technologies, and internetworking technologies.
  • devices may use communication protocols and messages (e.g., messages created, transmitted, received, stored, and/or processed by the device), and such messages may be conveyed by a communication network or medium.
  • a communication message generally may include, without limitation, a frame, packet, datagram, user datagram, cell, or other type of communication message. 11/11/2008
  • logic flows may be described herein to demonstrate various aspects of the invention, and should not be construed to limit the present invention to any particular logic flow or logic implementation.
  • the described logic may be partitioned into different logic blocks (e.g., programs, modules, functions, or subroutines) without changing the overall results or otherwise departing from the true scope of the invention.
  • logic elements may be added, modified, omitted, performed in a different order, or implemented using different logic constructs (e.g., logic gates, looping primitives, conditional logic, and other logic constructs) without changing the overall results or otherwise departing from the true scope of the invention.
  • the present invention may be embodied in many different forms, including, but in no way limited to, computer program logic for use with a processor (e.g., a microprocessor, microcontroller, digital signal processor, or general purpose computer), programmable logic for use with a programmable logic device (e.g., a Field Programmable Gate Array (FPGA) or other PLD), discrete components, integrated circuitry (e.g., an Application Specific Integrated Circuit (ASIC)), or any other means including any combination thereof.
  • a processor e.g., a microprocessor, microcontroller, digital signal processor, or general purpose computer
  • programmable logic for use with a programmable logic device
  • FPGA Field Programmable Gate Array
  • ASIC Application Specific Integrated Circuit
  • predominantly all of the described logic is implemented as a set of computer program instructions that is converted into a computer executable form, stored as such in a computer readable medium, and executed by a microprocessor under the control of an operating system.
  • Source code may include a series of computer program instructions implemented in any of various programming languages (e.g., an object code, an assembly language, or a high-level language such as Fortran, C, C++, JAVA, or HTML) for use with various operating systems or operating environments.
  • the source code may define and use various data structures and communication messages.
  • the source code may be in a computer executable form (e.g., via an interpreter), or the source code may be converted (e.g., via a translator, assembler, or compiler) into a computer executable form.
  • the computer program may be fixed in any form (e.g., source code form, computer executable form, or an intermediate form) either permanently or transitorily in a tangible storage medium, such as a semiconductor memory device (e.g., a RAM, ROM, PROM, EEPROM, or Flash-Programmable RAM), a magnetic memory device (e.g., a diskette or fixed disk), an optical memory device (e.g., a CD-ROM), a PC card (e.g., PCMCIA card), or other memory device.
  • a semiconductor memory device e.g., a RAM, ROM, PROM, EEPROM, or Flash-Programmable RAM
  • a magnetic memory device e.g., a diskette or fixed disk
  • an optical memory device e.g., a CD-ROM
  • PC card e.g., PCMCIA card
  • the computer program may be fixed in any form in a signal that is transmittable to a computer using any of various communication technologies, including, but in no way limited to, analog technologies, digital technologies, optical technologies, wireless technologies (e.g., Bluetooth), networking technologies, and internetworking technologies.
  • the computer program may be distributed in any form as a removable storage medium with accompanying printed or electronic documentation (e.g., shrink wrapped software), preloaded with a computer system (e.g., on system ROM or fixed disk), or distributed from a server or electronic bulletin board over the communication system (e.g., the Internet or World Wide Web).
  • Hardware logic including programmable logic for use with a programmable logic device
  • implementing all or part of the functionality previously described herein may be designed using traditional manual methods, or may be designed, captured, simulated, or documented electronically using various tools, such as Computer Aided Design (CAD), a hardware description language (e.g., VHDL or AHDL), or a PLD programming language (e.g., PALASM, ABEL, or CUPL).
  • CAD Computer Aided Design
  • a hardware description language e.g., VHDL or AHDL
  • PLD programming language e.g., PALASM, ABEL, or CUPL
  • Programmable logic may be fixed either permanently or transitorily in a tangible storage medium, such as a semiconductor memory device (e.g., a RAM, ROM, PROM, EEPROM, or Flash-Programmable RAM), a magnetic memory device (e.g., a diskette or fixed disk), an optical memory device (e.g., a CD-ROM), or other memory device.
  • a semiconductor memory device e.g., a RAM, ROM, PROM, EEPROM, or Flash-Programmable RAM
  • a magnetic memory device e.g., a diskette or fixed disk
  • an optical memory device e.g., a CD-ROM
  • the programmable logic may be fixed in a signal that is transmittable to a computer using any of various communication technologies, including, but in no way limited to, analog technologies, digital technologies, optical technologies, wireless technologies (e.g., Bluetooth), networking technologies, and internetworking technologies.
  • the programmable logic may be distributed as a removable storage medium with accompanying printed or electronic documentation (e.g., shrink wrapped software), preloaded with a computer system (e.g., on system ROM or fixed disk), or distributed 11/11/2008 from a server or electronic bulletin board over the communication system (e.g., the Internet or World Wide Web).
  • printed or electronic documentation e.g., shrink wrapped software
  • a computer system e.g., on system ROM or fixed disk
  • 11/11/2008 from a server or electronic bulletin board over the communication system (e.g., the Internet or World Wide Web).

Abstract

Selon l'invention, la virtualisation de fichiers est utilisée dans le partage de charges, le déplacement de fichiers, la configuration de réseau et la déduplication de fichiers dans des réseaux de stockage.
PCT/US2008/083117 2007-11-12 2008-11-11 Partage de charges, deplacement de fichiers, configuration de reseau et deduplication de fichiers par virtualisation de fichiers WO2009064720A2 (fr)

Applications Claiming Priority (14)

Application Number Priority Date Filing Date Title
US98720607P 2007-11-12 2007-11-12
US98719707P 2007-11-12 2007-11-12
US98719407P 2007-11-12 2007-11-12
US98717407P 2007-11-12 2007-11-12
US98718107P 2007-11-12 2007-11-12
US60/987,194 2007-11-12
US60/987,197 2007-11-12
US60/987,206 2007-11-12
US60/987,181 2007-11-12
US60/987,174 2007-11-12
US98830607P 2007-11-15 2007-11-15
US98826907P 2007-11-15 2007-11-15
US60/988,269 2007-11-15
US60/988,306 2007-11-15

Publications (2)

Publication Number Publication Date
WO2009064720A2 true WO2009064720A2 (fr) 2009-05-22
WO2009064720A3 WO2009064720A3 (fr) 2009-07-30

Family

ID=40328867

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2008/083117 WO2009064720A2 (fr) 2007-11-12 2008-11-11 Partage de charges, deplacement de fichiers, configuration de reseau et deduplication de fichiers par virtualisation de fichiers

Country Status (1)

Country Link
WO (1) WO2009064720A2 (fr)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9165015B2 (en) 2010-07-29 2015-10-20 International Business Machines Corporation Scalable and user friendly file virtualization for hierarchical storage
EP2446346A4 (fr) * 2009-06-25 2016-09-07 Emc Corp Système et procédé de fourniture d'un stockage à long terme de données
US10833943B1 (en) 2018-03-01 2020-11-10 F5 Networks, Inc. Methods for service chaining and devices thereof
CN112579621A (zh) * 2020-12-25 2021-03-30 平安银行股份有限公司 数据展示方法、装置、电子设备及计算机存储介质
CN112948354A (zh) * 2021-03-01 2021-06-11 北京金山云网络技术有限公司 副本集群的创建方法和装置、电子设备和存储介质

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5548724A (en) * 1993-03-22 1996-08-20 Hitachi, Ltd. File server system and file access control method of the same
US20030115218A1 (en) * 2001-12-19 2003-06-19 Bobbitt Jared E. Virtual file system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5548724A (en) * 1993-03-22 1996-08-20 Hitachi, Ltd. File server system and file access control method of the same
US20030115218A1 (en) * 2001-12-19 2003-06-19 Bobbitt Jared E. Virtual file system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ANONYMOUS: "How DFS Works: Remote File Systems" DISTRIBUTED FILE SYSTEM (DFS) TECHNICAL REFERENCE, [Online] 28 March 2003 (2003-03-28), XP002514939 Retrieved from the Internet: URL:http://technet.microsoft.com/en-us/library/cc782417.aspx> [retrieved on 2009-02-13] *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2446346A4 (fr) * 2009-06-25 2016-09-07 Emc Corp Système et procédé de fourniture d'un stockage à long terme de données
US10108353B2 (en) 2009-06-25 2018-10-23 EMC IP Holding Company LLC System and method for providing long-term storage for data
US9165015B2 (en) 2010-07-29 2015-10-20 International Business Machines Corporation Scalable and user friendly file virtualization for hierarchical storage
US10963432B2 (en) 2010-07-29 2021-03-30 International Business Machines Corporation Scalable and user friendly file virtualization for hierarchical storage
US10833943B1 (en) 2018-03-01 2020-11-10 F5 Networks, Inc. Methods for service chaining and devices thereof
CN112579621A (zh) * 2020-12-25 2021-03-30 平安银行股份有限公司 数据展示方法、装置、电子设备及计算机存储介质
CN112579621B (zh) * 2020-12-25 2024-03-19 平安银行股份有限公司 数据展示方法、装置、电子设备及计算机存储介质
CN112948354A (zh) * 2021-03-01 2021-06-11 北京金山云网络技术有限公司 副本集群的创建方法和装置、电子设备和存储介质

Also Published As

Publication number Publication date
WO2009064720A3 (fr) 2009-07-30

Similar Documents

Publication Publication Date Title
US8117244B2 (en) Non-disruptive file migration
EP2962218B1 (fr) Contenu et métadonnées découplés dans un écosystème de stockage réparti d'objets
US20180011874A1 (en) Peer-to-peer redundant file server system and methods
EP3149606B1 (fr) Réplication favorisée de métadonnées dans des topologies actives
US8473582B2 (en) Disconnected file operations in a scalable multi-node file system cache for a remote cluster file system
US8838624B2 (en) System and method for aggregating query results in a fault-tolerant database management system
US7509322B2 (en) Aggregated lock management for locking aggregated files in a switched file system
US7788335B2 (en) Aggregated opportunistic lock and aggregated implicit lock management for locking aggregated files in a switched file system
US7383288B2 (en) Metadata based file switch and switched file system
CA2512312C (fr) Commutateur de fichier utilisant des metadonnees et systeme fichier commute
US20090204649A1 (en) File Deduplication Using Storage Tiers
US8429360B1 (en) Method and system for efficient migration of a storage object between storage servers based on an ancestry of the storage object in a network storage system
US20090204650A1 (en) File Deduplication using Copy-on-Write Storage Tiers
WO2009064720A2 (fr) Partage de charges, deplacement de fichiers, configuration de reseau et deduplication de fichiers par virtualisation de fichiers
Devi et al. Architecture for Hadoop Distributed File Systems

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08849005

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase in:

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 03.09.2010)

122 Ep: pct application non-entry in european phase

Ref document number: 08849005

Country of ref document: EP

Kind code of ref document: A2