WO2012170234A2 - Clustered file service - Google Patents

Clustered file service Download PDF

Info

Publication number
WO2012170234A2
WO2012170234A2 PCT/US2012/039879 US2012039879W WO2012170234A2 WO 2012170234 A2 WO2012170234 A2 WO 2012170234A2 US 2012039879 W US2012039879 W US 2012039879W WO 2012170234 A2 WO2012170234 A2 WO 2012170234A2
Authority
WO
WIPO (PCT)
Prior art keywords
file
devices
cluster
namespace
file server
Prior art date
Application number
PCT/US2012/039879
Other languages
French (fr)
Other versions
WO2012170234A3 (en
Inventor
Vyacheslav Kuznetsov
Andrea D'amato
Alan Warwick
Vladimir Petter
Henry ALOYSIUS
Original Assignee
Microsoft Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Microsoft Corporation filed Critical Microsoft Corporation
Priority to CN201280027196.2A priority Critical patent/CN103608798B/en
Priority to EP12796591.1A priority patent/EP2718837B1/en
Publication of WO2012170234A2 publication Critical patent/WO2012170234A2/en
Publication of WO2012170234A3 publication Critical patent/WO2012170234A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/176Support for shared access to files; File sharing support
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/178Techniques for file synchronisation in file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • G06F16/1824Distributed file systems implemented using Network-attached Storage [NAS] architecture
    • G06F16/183Provision of network file services by network file servers, e.g. by using NFS, CIFS

Definitions

  • File services operate to share files with various client devices.
  • the file services may present files to client devices in the form of shares, which are directory structures or portions of directory structures in which files may be stored. In some cases, the same file may be made available in different shares.
  • Many file services may define different sets of permissions for different users for each share. Some users may have read/write permissions, while other users may have read only permissions and still other users may have no access to the share. Some file systems may apply different permissions to subsets of the share, such as defining different permissions for individual files, directories, or groups of files or directories within a single share.
  • a cluster based file service may operate on a cluster of two or more independent devices that have access to a common data storage.
  • the file service may have a namespace definition with each device in the cluster, but may be modified by any device operating the file service.
  • Each instance of the file service may identify and capture a command that changes the namespace structure and cause the change to be propagated to the other members of cluster. If one of the devices in the cluster does not successfully perform an update to the namespace structure, that device may be brought offline.
  • the cluster based file service may permit adding or removing devices from the cluster while the file service is operating, and may provide a high throughput and high availability file service.
  • FIGURE 1 is a diagram of an embodiment showing a network environment with clustered file service.
  • FIGURE 2 is a functional diagram of an embodiment showing a conceptual topology for a file service cluster.
  • FIGURE 3 is a timeline flowchart of an embodiment showing a method for managing cluster operations.
  • FIGURE 4 is a flowchart of an embodiment showing a method for operating a file service.
  • FIGURE 5 is a flowchart of an embodiment showing a method for updating a slave node.
  • a cluster based file service may provide file services to multiple clients using multiple devices in parallel.
  • Each of the file service providers may have identical copies of the file namespace, and may identify and capture changes to the namespace. Those changes may be propagated to each of the members of the cluster that provide the same file service.
  • the architecture of the cluster may allow several different namespaces to be provided by different groups of devices within the cluster. For example, one namespace may be served by three devices within a cluster, while a second namespace may be served by four devices, two of which may be members of the group providing the first namespace. In such embodiments, some devices in the cluster may serve two or more namespaces, while other devices may serve only one namespace.
  • the cluster may operate a group of devices using a leader and follower
  • a leader is defined as a device within the cluster that manages an application.
  • the leader may be the device that starts and stops the file service, adds or removes additional cluster devices to the file service, and performs other administrative tasks.
  • each device may act as a master or slave, depending on the situation.
  • a device detects a change to the namespace, such as when a user adds or deletes a file
  • the device may operate as a master to update the namespace and transmit the namespace to the other devices, which act as slaves.
  • Any of the devices may act as masters or slaves during the course of operation of the file system.
  • Other embodiments may have different
  • the namespace may identify any type of shared resource, which typically is a file system.
  • the file system may include directory or folders, files, or other objects.
  • the namespace may be a pointer to a starting point within a directory structure.
  • the namespace may include various permission settings or other information about the namespace.
  • the subject matter may be embodied as devices, systems, methods, and/or computer program products. Accordingly, some or all of the subject matter may be embodied in hardware and/or in software (including firmware, resident software, microcode, state machines, gate arrays, etc.) Furthermore, the subject matter may take the form of a computer program product on a computer-usable or computer-readable storage medium having computer-usable or computer-readable program code embodied in the medium for use by or in connection with an instruction execution system.
  • a computer-usable or computer-readable medium may be any medium that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
  • the computer-usable or computer-readable medium may be, for example but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, device, or propagation medium.
  • computer readable media may comprise computer storage media and communication media.
  • Computer storage media includes volatile and nonvolatile, removable and non- removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data.
  • Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by an instruction execution system.
  • the computer- usable or computer-readable medium could be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted, of otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.
  • Communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media.
  • modulated data signal means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal.
  • communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, F, infrared and other wireless media. Combinations of the any of the above should also be included within the scope of computer readable media.
  • the embodiment may comprise program modules, executed by one or more systems, computers, or other devices.
  • program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types.
  • functionality of the program modules may be combined or distributed as desired in various embodiments.
  • Figure 1 is a diagram of an embodiment 100, showing a clustered file service.
  • Embodiment 100 is an example architecture that may be used to provide file services in a highly parallel, fault tolerant system with high availability.
  • the diagram of Figure 1 illustrates functional components of a system.
  • the component may be a hardware component, a software component, or a combination of hardware and software.
  • Some of the components may be application level software, while other components may be operating system level components.
  • the connection of one component to another may be a close connection where two or more components are operating on a single hardware platform. In other cases, the connections may be made over network connections spanning long distances.
  • Each embodiment may use different hardware, software, and interconnection architectures to achieve the described functions.
  • Embodiment 100 is an example of a computer cluster where several computers may operate in parallel to provide various services, such as file services.
  • a cluster may have several computers that execute the same application or service and may
  • Clustering may be one mechanism by which multiple computers may be arranged to provide a service for fault tolerance and/or high throughput.
  • two or more devices may process operations in parallel.
  • the devices may be configured so that one of the devices may fail, be pulled offline, or otherwise stop operating yet the service may still be operating on another device.
  • Such a configuration may be a failsafe system where the system may tolerate failure of one or more devices while still providing the service.
  • a cluster may provide very high throughput by processing multiple requests for the service simultaneously.
  • a single cluster may provide many times the bandwidth or throughput of a single device.
  • each node that provides the file service may use the same namespace definition.
  • the namespace definition may define the contents of the share being served.
  • the share may include various objects, such as files, directories, folders, or other objects.
  • Each request to the file service may fall into two categories: those requests that cause the share to change and those that do not.
  • Requests that cause the share to change may include requests that add or delete files, change the file directory structure, or perform other operations.
  • Requests that do not change the share may include reads to a file.
  • write operations performed on a file may be considered a change to the namespace while other embodiments may treat write operations as not changing the namespace.
  • the change may be propagated to all nodes that serve the share.
  • the other nodes may pause until the change is completed on that node prior to responding to any other requests. If a device detects that a change is not properly implemented, the device may take itself offline until the problem may be resolved.
  • the namespace may be shared amongst the nodes in several different manners.
  • each of the device's operating systems may have a registry in which various configuration settings or other information are stored.
  • the namespace of the share being served may be stored in the registry.
  • the registry may be a database used by the operating system or other applications that may be quickly and readily accessed.
  • a portion of the registry may be shared across several nodes. The shared portion of the registry may operate by detecting a change to the registry on one of the nodes and propagating the change to the other nodes that share the portion of the registry.
  • the namespace may be stored in another database, such as a master namespace stored in a storage system, which may be the cluster storage system.
  • each node may maintain a local copy of the namespace.
  • the local copy may be located in a registry or other database.
  • a node that operates as a master node may cause the change to be propagated to the other nodes.
  • the cluster may be managed by a cluster management application, which may execute on one of the cluster nodes.
  • the cluster management application may perform various administrative operations on the cluster, such as adding, removing, and configuring nodes, as well as launching and managing applications on the cluster.
  • the cluster management application may identify the nodes on which the file service may execute, assign a leader node, and cause the leader node to configure and operate the file service on the assigned nodes.
  • the device 102 represents one node of a cluster.
  • a cluster may have many nodes, from a mere few to many tens, hundreds, or more nodes.
  • the devices within the cluster are typically made up of a hardware platform 104 and various software components 106.
  • the device 102 may be a server computer, but some embodiments may utilize desktop computers, game consoles, and even portable devices such as laptop computers, mobile telephones, or other devices.
  • the hardware platform 104 may include a processor 108, random access memory 110, and nonvolatile storage 112.
  • the processor 108 may be a single microprocessor, multi-core processor, or a group of processors.
  • the random access memory 110 may store executable code as well as data that may be immediately accessible to the processor 108, while the nonvolatile storage 112 may store executable code and data in a persistent state.
  • the hardware platform 104 may include various peripherals that make up a user interface 114.
  • the user interface peripherals may be monitors, keyboards, pointing devices, or other user interface peripherals. Some embodiments may not include such user interface peripherals.
  • the hardware platform 104 may also include a network interface 116.
  • the network interface 116 may include hardwired and wireless interfaces through which the device 102 may communicate with other devices.
  • the software components 106 may include an operating system 118 on which various applications may execute.
  • the operating system 118 may be a specialized operating system for cluster computing. Such operating systems may include various services, databases, or mechanisms that may be used to join devices together into a cluster.
  • the operating system may be a generic operating system on which various cluster applications are executed so that the device may operate as part of a cluster.
  • a cluster management application 123 may execute on the device 102.
  • the cluster management application 123 may operate on just one or several nodes of a cluster.
  • that node may be considered a head node or management node.
  • the cluster management application 123 may perform various management and administrative functions for the cluster. Such functions may include configuring the cluster, adding or removing nodes from the cluster, and starting and stopping applications on the cluster.
  • a cluster client application 120 may also execute on the device 102.
  • the cluster client application 120 may allow the device 102 to join the cluster and respond to management operations from a cluster management application.
  • the cluster management application 123 and the cluster client application 120 may execute on the same device. Other embodiments may not be so configured.
  • Device 102 may include a file service 122 that may respond to file service requests from various client devices 148.
  • the file service 122 may make a share available to the client devices 148, where the share may physically reside on a storage system 138.
  • a set of namespace definitions 125 may reside on the device 102.
  • the namespace definitions 125 may include metadata about the files stored in a share.
  • the metadata may include the directory structure and metadata for each file in the directory structure.
  • the namespace definitions 125 may be sufficient to respond to some file service requests, such as requests for the names of files in a specific directory.
  • the namespace definitions 125 may be used to make calls to a cluster storage system 142 to retrieve file contents, to write information to a file, or perform other operations on the share.
  • the namespace definitions 125 may include a pointer to a share's starting point in an existing directory structure.
  • the namespace definitions 125 may include various metadata, such as permission settings, access controls, or other metadata for the share.
  • the namespace definitions 125 may reside in a database, which may be any type of data storage mechanism such as a relational database, file, table, or other mechanism.
  • the namespace definitions 125 may be stored in a registry 119 that may be a database used by the operating system 118.
  • the device 102 may execute various other applications and services 124 in addition to the file service 122.
  • a cluster may execute many applications and services, with each application or service having different sets of resources applied.
  • the cluster may consist of several nodes.
  • Device 102 may be one of the nodes, while cluster nodes 128 may be additional nodes.
  • Cluster nodes 128 may operate on a hardware platform 130 similar to the hardware platform 104 of device 102.
  • Each of the cluster nodes 128 may include a cluster client application 132 that allows the node to operate within the cluster, along with a file service 134 and other services 136.
  • Not shown on the cluster nodes 128 is set of namespace definitions that may be used by the file service 134 to process file service requests.
  • Each of the cluster nodes may be connected to each other through a cluster network 126.
  • the cluster network may be a separate local area network from a network 146 where the client devices 148 may operate.
  • the cluster network 146 may have a dedicated high speed network where cluster nodes may communicate with each other.
  • the cluster network 126 may be a wide area network, the Internet, or other network. In such embodiments, the cluster network 126 may or may not be optimized for cluster nodes to communicate with each other.
  • the cluster nodes may communicate with a storage system 138, which may have a hardware platform 140 on which a cluster storage system 142 may operate.
  • the storage system 138 may be a storage area network or other system that provides storage that may be accessed by each of the cluster nodes.
  • the shares may be stored on the storage system 138.
  • Each node operating the file service may communicate with the storage system 138 to retrieve files, directories, or other objects associated with the namespace being served. In such a configuration, each node may access the same file, as opposed to having multiple copies of a file.
  • a cluster may be arranged with a load balancer 144.
  • the load balancer 144 may distribute incoming requests to any of the various nodes that execute a specific file service.
  • the load balancer may operate using any type of load balancing scheme.
  • a load balancer 144 may assign a request to each node in succession.
  • Such a scheme may be known as a round robin scheme.
  • Other schemes may analyze the bandwidth or response times of each node and assign a new request using such data as criteria.
  • a file service executing on a cluster may make a share available to various client devices 148.
  • the client devices 148 may be any type of computer device that may access a share.
  • the cluster may provide redundancy, where one node may be taken off line due to a failure, maintenance, or other reason, and another node may continue to operate.
  • the cluster may also provide increased throughput, as many nodes may service requests in parallel. Such uses may provide a higher throughput than a single node may be able to perform on its own.
  • Figure 2 is a diagram of an embodiment 200, showing a functional diagram of a clustered file service.
  • Embodiment 200 is an example architecture that may be used to provide multiple file services across a cluster.
  • the diagram of Figure 2 illustrates functional components of a system.
  • the component may be a hardware component, a software component, or a combination of hardware and software. Some of the components may be application level software, while other components may be operating system level components.
  • the connection of one component to another may be a close connection where two or more components are operating on a single hardware platform. In other cases, the connections may be made over network connections spanning long distances.
  • Each embodiment may use different hardware, software, and interconnection architectures to achieve the described functions.
  • Embodiment 200 illustrates merely one example of a cluster on which three different file services may operate.
  • Each of the file services may have a different resource allocation in that they may operate on different numbers of nodes. Further, each node may operate one, two, three, or more different file services.
  • the cluster 202 is illustrated as having five compute nodes 204, 206, 208, 210, and 212. Each of the compute nodes may be the computers that perform much of the processing for the various applications. There may be other nodes in the cluster, such as management nodes, storage nodes, load balancing nodes, proxy nodes, or additional compute nodes.
  • a file service 214 may operate on nodes 204, 206, and 208.
  • File service 216 may operate on nodes 206, 208, 210, and 212, while file service 218 may operate on nodes 210 and 212.
  • Each file service may operate as separate instances of a file service on their respective nodes.
  • node 208 may operate two instances of a file service. In such an embodiment, each instance may serve a different share and each instance may be operating on a different group of nodes.
  • a single node may operate a single instance that may serve two or more shares.
  • node 208 may, for example, execute a single instance of a file service that may respond to requests for the shares associated with file service 216 and file service 218.
  • nodes may be loaded differently than others.
  • node 204 may only have one file service 214, while node 206 may have two file services 214 and 216.
  • Such situations may occur when a file service or other application is initially configured.
  • the number of nodes to meet an anticipated demand may be determined and the nodes may be selected.
  • the nodes may be selected using various criteria, including selecting the nodes based on lowest usage, random assignment, or other selection criteria.
  • the nodes may not be identical. Some nodes may have more processing power, network bandwidth, or other capabilities than other nodes and therefore may support more instances of a file service.
  • Unequal loading for nodes may occur as a result of adding or removing nodes after a service is executing. Some embodiments may identify an increased loading for a service and may be able to add new nodes to the service to respond to additional requests.
  • some embodiments may identify that the loading for a service has decreased and may be able to remove some nodes from a service. After several different services add or remove nodes, an unbalanced or unequal condition may occur, such as the one illustrated in embodiment 200.
  • Each of the various nodes may connect to cluster storage 220.
  • the cluster storage 220 may contain files, directories, and other items in a share that are accessed by any node providing a file service for the share.
  • the cluster storage 220 may be a storage area network or other storage system that may have the capacity, speed, or other performance parameters to respond to the various nodes providing the file service.
  • some nodes may be directly connected to the cluster storage 220 while other nodes may only be indirectly connected. In such embodiments, the indirectly connected nodes may access the cluster storage 220 by communicating with a directly connected node using the cluster network.
  • Some clusters may have a load balancer 222.
  • the load balancer 222 may assign new file system requests to the various nodes.
  • the load balancer 222 may have various algorithms that spread the processing load amongst the various compute nodes. A simple algorithm may be a round robin algorithm that may assign requests to each node in sequence. A more elaborate algorithm may examine the nodes to determine which node may be the least loaded, and the algorithm may assign a new request to that node.
  • a load balancer 222 may include a common cluster name 224 that a client device 228 may use to address the cluster 202 over the network 226.
  • the common cluster name 224 may be a single network name that may represent the entire cluster.
  • the client device 228 may transmit the request to the common cluster name 224.
  • the file service may be provided by a single device, even though the file service may actually be provided by any one of a number of devices within the cluster.
  • the cluster 202 may appear on the network 226 as a single device.
  • the cluster 202 may include a cluster management application 230 that may perform various administrative tasks on the cluster.
  • the cluster management application 230 may operate on one or more of the nodes of a cluster.
  • a dedicated management node may execute the cluster management application 230.
  • Each of the compute nodes 202, 204, 206, 208, and 210 may access a cluster database 232, which may contain namespaces for each of the various file services.
  • the cluster database 232 may be implemented in several different manners.
  • the cluster database 232 may contain a master copy of a namespace definition.
  • the master copy may be synchronized or copied to each of the nodes that serve the corresponding file service.
  • the cluster database 232 may again contain the namespace definition and each node that serves the file service may link to the cluster database.
  • a node may have a redirection or other link that causes a local call within the node to be directed to the cluster database.
  • Such embodiments may not maintain a local copy of the cluster database at each node.
  • FIG. 3 is a timeline illustration of an embodiment 300 showing a method for managing cluster operations.
  • Embodiment 300 is a simplified example of a method that may be performed by cluster manager 302, a leader node 304, and a follower node 306.
  • the operations of the cluster manager 302 are illustrated in the left hand column, while the operations of the leader node 304 are illustrated in the center column and the operations of a follower node 306 are illustrated in the right hand column.
  • Embodiment 300 illustrates an embodiment that uses a 'leader' and 'follower' model.
  • a leader may be the first node that implements a service and may manage additional nodes, called follower nodes.
  • a cluster manager may communicate with the leader to start, stop, and perform other management activities for a service. The leader may communicate with the various followers to execute those management activities.
  • the leader node may be identified.
  • the leader node may be the same configuration as other nodes in the cluster, but may manage a particular service.
  • a file structure to share from the cluster storage may be identified in block 310 and a corresponding namespace may be defined in block 312.
  • the namespace may be stored in a cluster database in block 314.
  • the number of nodes that may execute the file service may be determined in block 316, and the file service configuration may be transmitted to the leader node in block 318.
  • the leader node 304 may receive the configuration in block 320 and begin the configuration process.
  • the leader node 304 may identify the namespace and may retrieve the namespace in block 324 from the cluster database.
  • the file service may be started using the namespace.
  • the leader node 304 may be the only node providing the file service.
  • each follower node may be processed. In many embodiments, there may be multiple follower nodes, each of which may provide the file service using the namespace.
  • the leader node 304 may transmit configuration commands in block 330 to the follower node 306, and may update a load balancer in block 332 with the follower node information.
  • the follower node 306 may receive the configuration commands in block 334.
  • the follower node 306 may identify the namespace in block 336, retrieve the namespace definition from the cluster database in block 338, and may start the file service with the namespace in block 340.
  • the process of blocks 328 through 340 may be performed each time a new follower node is added to the file service.
  • a node may be identified to disable or remove from the file service.
  • the load balancer may be updated in block 344 so that the load balancer may stop sending new requests to the soon-to-be disabled node.
  • the leader node 304 may transmit a disable notification in block 346 to the follower node 306, which may receive the notification in block 348 and stop the file service in block 350.
  • Embodiment 300 illustrates one method by which a file service may be started, then expanded to other follower nodes or contracted by removing follower nodes.
  • Figure 4 is a flowchart illustration of an embodiment 400 showing a method for operating a file service.
  • Embodiment 400 is a simplified example of a method that may be performed by any node executing a file service, and is an example of an operation that may be performed when changes are made to a namespace.
  • Embodiment 400 illustrates an example of a method that may be performed when a change may be made to the namespace.
  • Embodiment 400 illustrates a method by which a node detects that a change is made to the namespace, then propagates the changes to other nodes.
  • FIG. 8 Other embodiments may implement such updates using a master-slave operation within a file service.
  • any node executing a file service may turn into a master node when that node detects that a request may cause a change to the namespace or some other condition where the data in the cluster database will be modified.
  • a consistent cluster database is used by all the nodes so that each file service request is consistent, regardless of which node services the request.
  • the master-slave embodiments may operate by detecting that a request may change the information in the cluster database, and the node may set itself as master and cause the other nodes to operate as slaves until the change is propagated to each node.
  • any node operating the file service may declare itself to be master at any time.
  • every node may be capable of handling any type of request.
  • Such embodiments may permit only one node to be master at any given time.
  • the file service may begin operation.
  • a file service request may be received. If an update is being processed in block 406, the node may wait until the update has finished in block 408 prior to continuing. The loop of block 408 may ensure that no request is processed using an out of date or inconsistent database. An example of a process performed during the update may be illustrated in embodiment 500 presented later in this specification.
  • the update process may begin in block 416.
  • the change may be made to the namespace in block 418 and the change may be propagated to the other nodes in block 420.
  • the change may be stored locally in a local storage or cache.
  • FIG. 5 is a flowchart illustration of an embodiment 500 showing a method for updating a slave device.
  • Embodiment 500 is a simplified example of a method that may be performed by any node operating as a slave device during an update by a master device.
  • any device may operate as a slave or a master throughout the time the file service is executing.
  • Embodiment 500 is a simplified example of the operations that may be performed by a slave device while a master device is causing a change to the namespace to be propagated.
  • a notification of an update may be received.
  • New file service requests may be stopped from being handled in block 504.
  • An attempt at updating the namespace definition or other information may be made in block 506. If the update is a success in block 508, the slave may resume handling requests in block 514. [0099] If the update is not a success in block 508, the node may be taken offline in block 510 and an alert may be transmitted to a cluster manager or leader node in block 512.
  • embodiment 500 illustrates that when a slave node attempts to update and encounters a failure, the node may take itself offline. When the node is offline, corrective action may be taken while leaving the remaining nodes to continue operating.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Stored Programmes (AREA)

Abstract

A cluster based file service may operate on a cluster of two or more independent devices that have access to a common data storage. The file service may have a namespace definition with each device in the cluster, but may be modified by any device operating the file service. Each instance of the file service may identify and capture a command that changes the namespace structure and cause the change to be propagated to the other members of cluster. If one of the devices in the cluster does not successfully perform an update to the namespace structure, that device may be brought offline. The cluster based file service may permit adding or removing devices from the cluster while the file service is operating, and may provide a high throughput and high availability file service.

Description

CLUSTERED FILE SERVICE
Background
[0001] File services operate to share files with various client devices. The file services may present files to client devices in the form of shares, which are directory structures or portions of directory structures in which files may be stored. In some cases, the same file may be made available in different shares.
[0002] Many file services may define different sets of permissions for different users for each share. Some users may have read/write permissions, while other users may have read only permissions and still other users may have no access to the share. Some file systems may apply different permissions to subsets of the share, such as defining different permissions for individual files, directories, or groups of files or directories within a single share.
Summary
[0003] A cluster based file service may operate on a cluster of two or more independent devices that have access to a common data storage. The file service may have a namespace definition with each device in the cluster, but may be modified by any device operating the file service. Each instance of the file service may identify and capture a command that changes the namespace structure and cause the change to be propagated to the other members of cluster. If one of the devices in the cluster does not successfully perform an update to the namespace structure, that device may be brought offline. The cluster based file service may permit adding or removing devices from the cluster while the file service is operating, and may provide a high throughput and high availability file service.
[0004] This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
Brief Description of the Drawings
[0005] In the drawings,
[0006] FIGURE 1 is a diagram of an embodiment showing a network environment with clustered file service.
[0007] FIGURE 2 is a functional diagram of an embodiment showing a conceptual topology for a file service cluster. [0008] FIGURE 3 is a timeline flowchart of an embodiment showing a method for managing cluster operations.
[0009] FIGURE 4 is a flowchart of an embodiment showing a method for operating a file service.
[0010] FIGURE 5 is a flowchart of an embodiment showing a method for updating a slave node.
Detailed Description
[0011] A cluster based file service may provide file services to multiple clients using multiple devices in parallel. Each of the file service providers may have identical copies of the file namespace, and may identify and capture changes to the namespace. Those changes may be propagated to each of the members of the cluster that provide the same file service.
[0012] The architecture of the cluster may allow several different namespaces to be provided by different groups of devices within the cluster. For example, one namespace may be served by three devices within a cluster, while a second namespace may be served by four devices, two of which may be members of the group providing the first namespace. In such embodiments, some devices in the cluster may serve two or more namespaces, while other devices may serve only one namespace.
[0013] The cluster may operate a group of devices using a leader and follower
arrangement. A leader is defined as a device within the cluster that manages an application. In the case of a file service, the leader may be the device that starts and stops the file service, adds or removes additional cluster devices to the file service, and performs other administrative tasks.
[0014] Within the group of devices providing a file service, some embodiments may have each device may act as a master or slave, depending on the situation. When a device detects a change to the namespace, such as when a user adds or deletes a file, the device may operate as a master to update the namespace and transmit the namespace to the other devices, which act as slaves. Any of the devices may act as masters or slaves during the course of operation of the file system. Other embodiments may have different
mechanisms for updating the other nodes within a cluster.
[0015] The namespace may identify any type of shared resource, which typically is a file system. The file system may include directory or folders, files, or other objects. In some embodiments, the namespace may be a pointer to a starting point within a directory structure. The namespace may include various permission settings or other information about the namespace.
[0016] Throughout this specification, like reference numbers signify the same elements throughout the description of the figures.
[0017] When elements are referred to as being "connected" or "coupled," the elements can be directly connected or coupled together or one or more intervening elements may also be present. In contrast, when elements are referred to as being "directly connected" or "directly coupled," there are no intervening elements present.
[0018] The subject matter may be embodied as devices, systems, methods, and/or computer program products. Accordingly, some or all of the subject matter may be embodied in hardware and/or in software (including firmware, resident software, microcode, state machines, gate arrays, etc.) Furthermore, the subject matter may take the form of a computer program product on a computer-usable or computer-readable storage medium having computer-usable or computer-readable program code embodied in the medium for use by or in connection with an instruction execution system. In the context of this document, a computer-usable or computer-readable medium may be any medium that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
[0019] The computer-usable or computer-readable medium may be, for example but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, device, or propagation medium. By way of example, and not limitation, computer readable media may comprise computer storage media and communication media.
[0020] Computer storage media includes volatile and nonvolatile, removable and non- removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by an instruction execution system. Note that the computer- usable or computer-readable medium could be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted, of otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.
[0021] Communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. The term "modulated data signal" means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, F, infrared and other wireless media. Combinations of the any of the above should also be included within the scope of computer readable media.
[0022] When the subject matter is embodied in the general context of computer-executable instructions, the embodiment may comprise program modules, executed by one or more systems, computers, or other devices. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. Typically, the functionality of the program modules may be combined or distributed as desired in various embodiments.
[0023] Figure 1 is a diagram of an embodiment 100, showing a clustered file service. Embodiment 100 is an example architecture that may be used to provide file services in a highly parallel, fault tolerant system with high availability.
[0024] The diagram of Figure 1 illustrates functional components of a system. In some cases, the component may be a hardware component, a software component, or a combination of hardware and software. Some of the components may be application level software, while other components may be operating system level components. In some cases, the connection of one component to another may be a close connection where two or more components are operating on a single hardware platform. In other cases, the connections may be made over network connections spanning long distances. Each embodiment may use different hardware, software, and interconnection architectures to achieve the described functions.
[0025] Embodiment 100 is an example of a computer cluster where several computers may operate in parallel to provide various services, such as file services. A cluster may have several computers that execute the same application or service and may
independently process requests for the application or service. Clustering may be one mechanism by which multiple computers may be arranged to provide a service for fault tolerance and/or high throughput.
[0026] In a cluster, two or more devices may process operations in parallel. In many cluster environments, the devices may be configured so that one of the devices may fail, be pulled offline, or otherwise stop operating yet the service may still be operating on another device. Such a configuration may be a failsafe system where the system may tolerate failure of one or more devices while still providing the service.
[0027] Further, a cluster may provide very high throughput by processing multiple requests for the service simultaneously. In such uses, a single cluster may provide many times the bandwidth or throughput of a single device.
[0028] For a file service, each node that provides the file service may use the same namespace definition. The namespace definition may define the contents of the share being served. The share may include various objects, such as files, directories, folders, or other objects.
[0029] Each request to the file service may fall into two categories: those requests that cause the share to change and those that do not. Requests that cause the share to change may include requests that add or delete files, change the file directory structure, or perform other operations. Requests that do not change the share may include reads to a file. In some embodiments, write operations performed on a file may be considered a change to the namespace while other embodiments may treat write operations as not changing the namespace.
[0030] When a request changes the namespace, the change may be propagated to all nodes that serve the share. When a change is being propagated, the other nodes may pause until the change is completed on that node prior to responding to any other requests. If a device detects that a change is not properly implemented, the device may take itself offline until the problem may be resolved.
[0031] The namespace may be shared amongst the nodes in several different manners. In one manner, each of the device's operating systems may have a registry in which various configuration settings or other information are stored. The namespace of the share being served may be stored in the registry. The registry may be a database used by the operating system or other applications that may be quickly and readily accessed. In some embodiments, a portion of the registry may be shared across several nodes. The shared portion of the registry may operate by detecting a change to the registry on one of the nodes and propagating the change to the other nodes that share the portion of the registry. [0032] In another manner, the namespace may be stored in another database, such as a master namespace stored in a storage system, which may be the cluster storage system. In such a system, each node may maintain a local copy of the namespace. The local copy may be located in a registry or other database. In such an embodiment, a node that operates as a master node may cause the change to be propagated to the other nodes.
[0033] The cluster may be managed by a cluster management application, which may execute on one of the cluster nodes. The cluster management application may perform various administrative operations on the cluster, such as adding, removing, and configuring nodes, as well as launching and managing applications on the cluster. For the file service application, the cluster management application may identify the nodes on which the file service may execute, assign a leader node, and cause the leader node to configure and operate the file service on the assigned nodes.
[0034] The device 102 represents one node of a cluster. In many embodiments, a cluster may have many nodes, from a mere few to many tens, hundreds, or more nodes. The devices within the cluster are typically made up of a hardware platform 104 and various software components 106. The device 102 may be a server computer, but some embodiments may utilize desktop computers, game consoles, and even portable devices such as laptop computers, mobile telephones, or other devices.
[0035] The hardware platform 104 may include a processor 108, random access memory 110, and nonvolatile storage 112. The processor 108 may be a single microprocessor, multi-core processor, or a group of processors. The random access memory 110 may store executable code as well as data that may be immediately accessible to the processor 108, while the nonvolatile storage 112 may store executable code and data in a persistent state.
[0036] The hardware platform 104 may include various peripherals that make up a user interface 114. In some cases, the user interface peripherals may be monitors, keyboards, pointing devices, or other user interface peripherals. Some embodiments may not include such user interface peripherals.
[0037] The hardware platform 104 may also include a network interface 116. The network interface 116 may include hardwired and wireless interfaces through which the device 102 may communicate with other devices.
[0038] The software components 106 may include an operating system 118 on which various applications may execute. In some embodiments, the operating system 118 may be a specialized operating system for cluster computing. Such operating systems may include various services, databases, or mechanisms that may be used to join devices together into a cluster. In other embodiments, the operating system may be a generic operating system on which various cluster applications are executed so that the device may operate as part of a cluster.
[0039] A cluster management application 123 may execute on the device 102. The cluster management application 123 may operate on just one or several nodes of a cluster. When the cluster management application 123 operates on just one node of a cluster, that node may be considered a head node or management node.
[0040] The cluster management application 123 may perform various management and administrative functions for the cluster. Such functions may include configuring the cluster, adding or removing nodes from the cluster, and starting and stopping applications on the cluster.
[0041] A cluster client application 120 may also execute on the device 102. The cluster client application 120 may allow the device 102 to join the cluster and respond to management operations from a cluster management application. In some embodiments, the cluster management application 123 and the cluster client application 120 may execute on the same device. Other embodiments may not be so configured.
[0042] Device 102 may include a file service 122 that may respond to file service requests from various client devices 148. The file service 122 may make a share available to the client devices 148, where the share may physically reside on a storage system 138.
[0043] In some embodiments, a set of namespace definitions 125 may reside on the device 102. The namespace definitions 125 may include metadata about the files stored in a share. In some embodiments, the metadata may include the directory structure and metadata for each file in the directory structure. The namespace definitions 125 may be sufficient to respond to some file service requests, such as requests for the names of files in a specific directory. In some cases, the namespace definitions 125 may be used to make calls to a cluster storage system 142 to retrieve file contents, to write information to a file, or perform other operations on the share.
[0044] In some embodiments, the namespace definitions 125 may include a pointer to a share's starting point in an existing directory structure. In such embodiments, the namespace definitions 125 may include various metadata, such as permission settings, access controls, or other metadata for the share.
[0045] In some embodiments, the namespace definitions 125 may reside in a database, which may be any type of data storage mechanism such as a relational database, file, table, or other mechanism. In some embodiments, the namespace definitions 125 may be stored in a registry 119 that may be a database used by the operating system 118.
[0046] The device 102 may execute various other applications and services 124 in addition to the file service 122. In many embodiments, a cluster may execute many applications and services, with each application or service having different sets of resources applied.
[0047] The cluster may consist of several nodes. Device 102 may be one of the nodes, while cluster nodes 128 may be additional nodes. Cluster nodes 128 may operate on a hardware platform 130 similar to the hardware platform 104 of device 102. Each of the cluster nodes 128 may include a cluster client application 132 that allows the node to operate within the cluster, along with a file service 134 and other services 136. Not shown on the cluster nodes 128 is set of namespace definitions that may be used by the file service 134 to process file service requests.
[0048] Each of the cluster nodes may be connected to each other through a cluster network 126. In some embodiments, the cluster network may be a separate local area network from a network 146 where the client devices 148 may operate. In such embodiments, the cluster network 146 may have a dedicated high speed network where cluster nodes may communicate with each other. In other embodiments, the cluster network 126 may be a wide area network, the Internet, or other network. In such embodiments, the cluster network 126 may or may not be optimized for cluster nodes to communicate with each other.
[0049] The cluster nodes may communicate with a storage system 138, which may have a hardware platform 140 on which a cluster storage system 142 may operate. The storage system 138 may be a storage area network or other system that provides storage that may be accessed by each of the cluster nodes.
[0050] When the cluster nodes are operating a file service, the shares may be stored on the storage system 138. Each node operating the file service may communicate with the storage system 138 to retrieve files, directories, or other objects associated with the namespace being served. In such a configuration, each node may access the same file, as opposed to having multiple copies of a file.
[0051] A cluster may be arranged with a load balancer 144. The load balancer 144 may distribute incoming requests to any of the various nodes that execute a specific file service. The load balancer may operate using any type of load balancing scheme. In one load balancing scheme, a load balancer 144 may assign a request to each node in succession. Such a scheme may be known as a round robin scheme. Other schemes may analyze the bandwidth or response times of each node and assign a new request using such data as criteria.
[0052] In a normal operation, a file service executing on a cluster may make a share available to various client devices 148. The client devices 148 may be any type of computer device that may access a share. The cluster may provide redundancy, where one node may be taken off line due to a failure, maintenance, or other reason, and another node may continue to operate. The cluster may also provide increased throughput, as many nodes may service requests in parallel. Such uses may provide a higher throughput than a single node may be able to perform on its own.
[0053] Figure 2 is a diagram of an embodiment 200, showing a functional diagram of a clustered file service. Embodiment 200 is an example architecture that may be used to provide multiple file services across a cluster.
[0054] The diagram of Figure 2 illustrates functional components of a system. In some cases, the component may be a hardware component, a software component, or a combination of hardware and software. Some of the components may be application level software, while other components may be operating system level components. In some cases, the connection of one component to another may be a close connection where two or more components are operating on a single hardware platform. In other cases, the connections may be made over network connections spanning long distances. Each embodiment may use different hardware, software, and interconnection architectures to achieve the described functions.
[0055] Embodiment 200 illustrates merely one example of a cluster on which three different file services may operate. Each of the file services may have a different resource allocation in that they may operate on different numbers of nodes. Further, each node may operate one, two, three, or more different file services.
[0056] The cluster 202 is illustrated as having five compute nodes 204, 206, 208, 210, and 212. Each of the compute nodes may be the computers that perform much of the processing for the various applications. There may be other nodes in the cluster, such as management nodes, storage nodes, load balancing nodes, proxy nodes, or additional compute nodes.
[0057] Three different file services are illustrated. A file service 214 may operate on nodes 204, 206, and 208. File service 216 may operate on nodes 206, 208, 210, and 212, while file service 218 may operate on nodes 210 and 212. [0058] Each file service may operate as separate instances of a file service on their respective nodes. For example, node 208 may operate two instances of a file service. In such an embodiment, each instance may serve a different share and each instance may be operating on a different group of nodes.
[0059] In some embodiments, a single node may operate a single instance that may serve two or more shares. In such an embodiment, node 208 may, for example, execute a single instance of a file service that may respond to requests for the shares associated with file service 216 and file service 218.
[0060] In the example of embodiment 200, some nodes may be loaded differently than others. For example, node 204 may only have one file service 214, while node 206 may have two file services 214 and 216. Such situations may occur when a file service or other application is initially configured. During the configuration, the number of nodes to meet an anticipated demand may be determined and the nodes may be selected. The nodes may be selected using various criteria, including selecting the nodes based on lowest usage, random assignment, or other selection criteria.
[0061] In some embodiments, the nodes may not be identical. Some nodes may have more processing power, network bandwidth, or other capabilities than other nodes and therefore may support more instances of a file service.
[0062] Unequal loading for nodes may occur as a result of adding or removing nodes after a service is executing. Some embodiments may identify an increased loading for a service and may be able to add new nodes to the service to respond to additional requests.
Similarly, some embodiments may identify that the loading for a service has decreased and may be able to remove some nodes from a service. After several different services add or remove nodes, an unbalanced or unequal condition may occur, such as the one illustrated in embodiment 200.
[0063] Each of the various nodes may connect to cluster storage 220. The cluster storage 220 may contain files, directories, and other items in a share that are accessed by any node providing a file service for the share. In many embodiments, the cluster storage 220 may be a storage area network or other storage system that may have the capacity, speed, or other performance parameters to respond to the various nodes providing the file service.
[0064] In some embodiments, some nodes may be directly connected to the cluster storage 220 while other nodes may only be indirectly connected. In such embodiments, the indirectly connected nodes may access the cluster storage 220 by communicating with a directly connected node using the cluster network. [0065] Some clusters may have a load balancer 222. The load balancer 222 may assign new file system requests to the various nodes. The load balancer 222 may have various algorithms that spread the processing load amongst the various compute nodes. A simple algorithm may be a round robin algorithm that may assign requests to each node in sequence. A more elaborate algorithm may examine the nodes to determine which node may be the least loaded, and the algorithm may assign a new request to that node.
[0066] A load balancer 222 may include a common cluster name 224 that a client device 228 may use to address the cluster 202 over the network 226. The common cluster name 224 may be a single network name that may represent the entire cluster. When a client device 228 generates a file service request, the client device 228 may transmit the request to the common cluster name 224. From the client device's perspective, the file service may be provided by a single device, even though the file service may actually be provided by any one of a number of devices within the cluster. In such embodiments, the cluster 202 may appear on the network 226 as a single device.
[0067] The cluster 202 may include a cluster management application 230 that may perform various administrative tasks on the cluster. The cluster management application 230 may operate on one or more of the nodes of a cluster. In some embodiments, a dedicated management node may execute the cluster management application 230.
[0068] Each of the compute nodes 202, 204, 206, 208, and 210 may access a cluster database 232, which may contain namespaces for each of the various file services. The cluster database 232 may be implemented in several different manners.
[0069] In one manner, the cluster database 232 may contain a master copy of a namespace definition. The master copy may be synchronized or copied to each of the nodes that serve the corresponding file service.
[0070] In another manner, the cluster database 232 may again contain the namespace definition and each node that serves the file service may link to the cluster database. In such embodiments, a node may have a redirection or other link that causes a local call within the node to be directed to the cluster database. Such embodiments may not maintain a local copy of the cluster database at each node.
[0071] Figure 3 is a timeline illustration of an embodiment 300 showing a method for managing cluster operations. Embodiment 300 is a simplified example of a method that may be performed by cluster manager 302, a leader node 304, and a follower node 306. The operations of the cluster manager 302 are illustrated in the left hand column, while the operations of the leader node 304 are illustrated in the center column and the operations of a follower node 306 are illustrated in the right hand column.
[0072] Other embodiments may use different sequencing, additional or fewer steps, and different nomenclature or terminology to accomplish similar functions. In some embodiments, various operations or set of operations may be performed in parallel with other operations, either in a synchronous or asynchronous manner. The steps selected here were chosen to illustrate some principles of operations in a simplified form.
[0073] Embodiment 300 illustrates an embodiment that uses a 'leader' and 'follower' model. A leader may be the first node that implements a service and may manage additional nodes, called follower nodes. A cluster manager may communicate with the leader to start, stop, and perform other management activities for a service. The leader may communicate with the various followers to execute those management activities.
[0074] In block 308, the leader node may be identified. The leader node may be the same configuration as other nodes in the cluster, but may manage a particular service.
[0075] A file structure to share from the cluster storage may be identified in block 310 and a corresponding namespace may be defined in block 312. The namespace may be stored in a cluster database in block 314.
[0076] The number of nodes that may execute the file service may be determined in block 316, and the file service configuration may be transmitted to the leader node in block 318.
[0077] The leader node 304 may receive the configuration in block 320 and begin the configuration process.
[0078] In block 322, the leader node 304 may identify the namespace and may retrieve the namespace in block 324 from the cluster database. In block 326, the file service may be started using the namespace. At this point, the leader node 304 may be the only node providing the file service.
[0079] In block 328, each follower node may be processed. In many embodiments, there may be multiple follower nodes, each of which may provide the file service using the namespace. The leader node 304 may transmit configuration commands in block 330 to the follower node 306, and may update a load balancer in block 332 with the follower node information.
[0080] The follower node 306 may receive the configuration commands in block 334. The follower node 306 may identify the namespace in block 336, retrieve the namespace definition from the cluster database in block 338, and may start the file service with the namespace in block 340. [0081] The process of blocks 328 through 340 may be performed each time a new follower node is added to the file service.
[0082] In block 342, a node may be identified to disable or remove from the file service. The load balancer may be updated in block 344 so that the load balancer may stop sending new requests to the soon-to-be disabled node. The leader node 304 may transmit a disable notification in block 346 to the follower node 306, which may receive the notification in block 348 and stop the file service in block 350.
[0083] Embodiment 300 illustrates one method by which a file service may be started, then expanded to other follower nodes or contracted by removing follower nodes.
[0084] Figure 4 is a flowchart illustration of an embodiment 400 showing a method for operating a file service. Embodiment 400 is a simplified example of a method that may be performed by any node executing a file service, and is an example of an operation that may be performed when changes are made to a namespace.
[0085] Other embodiments may use different sequencing, additional or fewer steps, and different nomenclature or terminology to accomplish similar functions. In some embodiments, various operations or set of operations may be performed in parallel with other operations, either in a synchronous or asynchronous manner. The steps selected here were chosen to illustrate some principles of operations in a simplified form.
[0086] Embodiment 400 illustrates an example of a method that may be performed when a change may be made to the namespace. Embodiment 400 illustrates a method by which a node detects that a change is made to the namespace, then propagates the changes to other nodes.
[0087] Other embodiments may implement such updates using a master-slave operation within a file service. In such an embodiment, any node executing a file service may turn into a master node when that node detects that a request may cause a change to the namespace or some other condition where the data in the cluster database will be modified. A consistent cluster database is used by all the nodes so that each file service request is consistent, regardless of which node services the request.
[0088] The master-slave embodiments may operate by detecting that a request may change the information in the cluster database, and the node may set itself as master and cause the other nodes to operate as slaves until the change is propagated to each node.
[0089] In such embodiments, any node operating the file service may declare itself to be master at any time. In such an embodiment, every node may be capable of handling any type of request. Such embodiments may permit only one node to be master at any given time.
[0090] In block 402, the file service may begin operation.
[0091] In block 404, a file service request may be received. If an update is being processed in block 406, the node may wait until the update has finished in block 408 prior to continuing. The loop of block 408 may ensure that no request is processed using an out of date or inconsistent database. An example of a process performed during the update may be illustrated in embodiment 500 presented later in this specification.
[0092] If no updates are being processed in block 406, and the request does not cause a change to the namespace in block 410, the file service request may be processed in block 412.
[0093] If the request does cause a change in block 412, the update process may begin in block 416. The change may be made to the namespace in block 418 and the change may be propagated to the other nodes in block 420. The change may be stored locally in a local storage or cache.
[0094] Figure 5 is a flowchart illustration of an embodiment 500 showing a method for updating a slave device. Embodiment 500 is a simplified example of a method that may be performed by any node operating as a slave device during an update by a master device. In many embodiments, any device may operate as a slave or a master throughout the time the file service is executing.
[0095] Other embodiments may use different sequencing, additional or fewer steps, and different nomenclature or terminology to accomplish similar functions. In some embodiments, various operations or set of operations may be performed in parallel with other operations, either in a synchronous or asynchronous manner. The steps selected here were chosen to illustrate some principles of operations in a simplified form.
[0096] Embodiment 500 is a simplified example of the operations that may be performed by a slave device while a master device is causing a change to the namespace to be propagated.
[0097] In block 502, a notification of an update may be received. New file service requests may be stopped from being handled in block 504.
[0098] An attempt at updating the namespace definition or other information may be made in block 506. If the update is a success in block 508, the slave may resume handling requests in block 514. [0099] If the update is not a success in block 508, the node may be taken offline in block 510 and an alert may be transmitted to a cluster manager or leader node in block 512.
[00100] The operations of embodiment 500 illustrate that when a slave node attempts to update and encounters a failure, the node may take itself offline. When the node is offline, corrective action may be taken while leaving the remaining nodes to continue operating.
[00101] The foregoing description of the subject matter has been presented for purposes of illustration and description. It is not intended to be exhaustive or to limit the subject matter to the precise form disclosed, and other modifications and variations may be possible in light of the above teachings. The embodiment was chosen and described in order to best explain the principles of the invention and its practical application to thereby enable others skilled in the art to best utilize the invention in various embodiments and various modifications as are suited to the particular use contemplated. It is intended that the appended claims be construed to include other alternative embodiments except insofar as limited by the prior art.

Claims

Claims
1. A system comprising:
a plurality of devices, each of said devices having a file service operable on each of said plurality of devices;
a data store comprising files, said data store being accessible to each of said plurality of devices;
a namespace definition defining an organization for said files into a share, said namespace definition being stored in a cluster database accessible to each of said plurality of devices, said share being made available to client devices;
said file service that identifies changes to said namespace definition and updates said namespace definition on said cluster database to an updated namespace definition, said file service that further updates a locally cached version of said namespace definition.
2. The system of claim 1 further comprising:
a load balancer that receives requests from said client devices and, for each of said requests, determining one of said plurality of devices to process a request.
3. The system of claim 1 further comprising:
a leader application executing on a first device of said plurality of devices, said leader application that identifies a new device and adds said new device to said plurality of devices.
4. The system of claim 3, said leader application that further identifies a first device and removes said first device from said plurality of devices.
5. The system of claim 1 further comprising:
a second plurality of devices being a subset of said plurality of devices; and a second namespace definition defining a second organization for at least a subset of said files into a second share, said second namespace definition being stored said a cluster database accessible to each of said second plurality of devices, said second share being made available to client devices.
6. The system of claim 5, said second plurality of devices being the same as said plurality of devices.
7. The system of claim 1 , said file service that further:
detects a failure when attempting to update said a locally cached version of said namespace definition for a first device and disables said file service for said first device.
8. The system of claim 7, said file service that further:
retries updating said locally cached version of said namespace definition and adds said first device to said plurality of devices when said updating is successful.
9. A method comprising:
for each of a plurality of file server devices, installing and executing a file service and connecting said file service to a file store comprising files;
defining a namespace definition defining a share for at least some of said files; storing said namespace definition in a cluster database, said cluster database being accessible for each of said plurality of file server devices;
starting said file service on a first file server device being one of said plurality of file server devices, said file service using said namespace definition;
said file service identifying changes to said namespace definition and updating said namespace definition in said cluster database;
starting a second file server device using said namespace definition; and servicing file requests by both said first file server and said second file server in parallel.
10. The method of claim 9, further comprising:
copying said namespace definition to a first local cache on said first file server device and to a second local cache on said second file server device.
11. The method of claim 10 further comprising:
said first file server device detecting a first change to said namespace, updating said first local cache with said first change, and updating said cluster database with said first change;
said second file server device updating said second local cache with said first change.
12. The method of claim 11 further comprising:
said second file server device detecting a second change to said namespace, updating said second local cache with said second change, and updating said cluster database with said second change;
said first file server device updating said first local cache with said second change.
13. The method of claim 12 further comprising:
detecting a problem when said first file server device updates said first local cache with said second change and disabling said first file server from servicing said file requests.
14. The method of claim 13 further comprising:
starting a third file server device using said namespace definition; and
servicing file requests by said first file server, said second file server, and said third file server in parallel.
15. The method of claim 14 further comprising:
disabling said second file server from servicing said file requests and operating said first file server and said third file server in parallel to service said file requests.
PCT/US2012/039879 2011-06-04 2012-05-29 Clustered file service WO2012170234A2 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201280027196.2A CN103608798B (en) 2011-06-04 2012-05-29 Group document services
EP12796591.1A EP2718837B1 (en) 2011-06-04 2012-05-29 Clustered file service

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US13/153,416 US9652469B2 (en) 2011-06-04 2011-06-04 Clustered file service
US13/153,416 2011-06-04

Publications (2)

Publication Number Publication Date
WO2012170234A2 true WO2012170234A2 (en) 2012-12-13
WO2012170234A3 WO2012170234A3 (en) 2013-02-07

Family

ID=47262503

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2012/039879 WO2012170234A2 (en) 2011-06-04 2012-05-29 Clustered file service

Country Status (4)

Country Link
US (1) US9652469B2 (en)
EP (1) EP2718837B1 (en)
CN (1) CN103608798B (en)
WO (1) WO2012170234A2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104715001A (en) * 2013-12-12 2015-06-17 国际商业机器公司 Method and system performing wirite operation on shared resource in cluster of data processing system

Families Citing this family (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8364633B2 (en) 2005-01-12 2013-01-29 Wandisco, Inc. Distributed computing systems and system components thereof
US9361311B2 (en) * 2005-01-12 2016-06-07 Wandisco, Inc. Distributed file system using consensus nodes
US9332069B2 (en) 2012-12-28 2016-05-03 Wandisco, Inc. Methods, devices and systems for initiating, forming and joining memberships in distributed computing systems
US9424272B2 (en) * 2005-01-12 2016-08-23 Wandisco, Inc. Distributed file system using consensus nodes
US20120246609A1 (en) 2011-03-24 2012-09-27 International Business Machines Corporation Automatic generation of user stories for software products via a product content space
JP6102108B2 (en) * 2012-07-24 2017-03-29 富士通株式会社 Information processing apparatus, data providing method, and data providing program
US10649607B2 (en) 2012-12-28 2020-05-12 Facebook, Inc. Re-ranking story content
US9069647B2 (en) 2013-01-15 2015-06-30 International Business Machines Corporation Logging and profiling content space data and coverage metric self-reporting
US9075544B2 (en) 2013-01-15 2015-07-07 International Business Machines Corporation Integration and user story generation and requirements management
US9081645B2 (en) 2013-01-15 2015-07-14 International Business Machines Corporation Software product licensing based on a content space
US9111040B2 (en) 2013-01-15 2015-08-18 International Business Machines Corporation Integration of a software content space with test planning and test case generation
US9396342B2 (en) 2013-01-15 2016-07-19 International Business Machines Corporation Role based authorization based on product content space
US9659053B2 (en) 2013-01-15 2017-05-23 International Business Machines Corporation Graphical user interface streamlining implementing a content space
US9141379B2 (en) 2013-01-15 2015-09-22 International Business Machines Corporation Automated code coverage measurement and tracking per user story and requirement
US9063809B2 (en) * 2013-01-15 2015-06-23 International Business Machines Corporation Content space environment representation
US9087155B2 (en) 2013-01-15 2015-07-21 International Business Machines Corporation Automated data collection, computation and reporting of content space coverage metrics for software products
US9218161B2 (en) 2013-01-15 2015-12-22 International Business Machines Corporation Embedding a software content space for run-time implementation
US9020893B2 (en) * 2013-03-01 2015-04-28 Datadirect Networks, Inc. Asynchronous namespace maintenance
WO2015153045A1 (en) * 2014-03-31 2015-10-08 Wandisco, Inc. Geographically-distributed file system using coordinated namespace replication
CN105991565B (en) 2015-02-05 2019-01-25 阿里巴巴集团控股有限公司 Method, system and the database proxy server of read and write abruption
CN106484587B (en) * 2015-08-26 2019-07-19 华为技术有限公司 A kind of NameSpace management method, device and computer system
US11360942B2 (en) 2017-03-13 2022-06-14 Wandisco Inc. Methods, devices and systems for maintaining consistency of metadata and data across data centers
WO2018235132A1 (en) * 2017-06-19 2018-12-27 Hitachi, Ltd. Distributed storage system
US10826984B2 (en) * 2018-04-24 2020-11-03 Futurewei Technologies, Inc. Event stream processing
US11204940B2 (en) * 2018-11-16 2021-12-21 International Business Machines Corporation Data replication conflict processing after structural changes to a database
US11960763B2 (en) * 2021-04-23 2024-04-16 EMC IP Holding Company LLC Load balancing combining block and file storage

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020161855A1 (en) 2000-12-05 2002-10-31 Olaf Manczak Symmetric shared file storage system

Family Cites Families (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5394555A (en) 1992-12-23 1995-02-28 Bull Hn Information Systems Inc. Multi-node cluster computer system incorporating an external coherency unit at each node to insure integrity of information stored in a shared, distributed memory
US7058696B1 (en) 1996-11-22 2006-06-06 Mangosoft Corporation Internet-based shared file service with native PC client access and semantics
US6119143A (en) 1997-05-22 2000-09-12 International Business Machines Corporation Computer system and method for load balancing with selective control
US6748416B2 (en) 1999-01-20 2004-06-08 International Business Machines Corporation Client-side method and apparatus for improving the availability and performance of network mediated services
US6801949B1 (en) 1999-04-12 2004-10-05 Rainfinity, Inc. Distributed server cluster with graphical user interface
US6954881B1 (en) 2000-10-13 2005-10-11 International Business Machines Corporation Method and apparatus for providing multi-path I/O in non-concurrent clustering environment using SCSI-3 persistent reserve
US7062490B2 (en) 2001-03-26 2006-06-13 Microsoft Corporation Serverless distributed file system
US20040139125A1 (en) 2001-06-05 2004-07-15 Roger Strassburg Snapshot copy of data volume during data access
US6865597B1 (en) 2002-12-20 2005-03-08 Veritas Operating Corporation System and method for providing highly-available volume mount points
US7653699B1 (en) 2003-06-12 2010-01-26 Symantec Operating Corporation System and method for partitioning a file system for enhanced availability and scalability
US7525902B2 (en) 2003-09-22 2009-04-28 Anilkumar Dominic Fault tolerant symmetric multi-computing system
US7577688B2 (en) 2004-03-16 2009-08-18 Onstor, Inc. Systems and methods for transparent movement of file services in a clustered environment
US20050283658A1 (en) 2004-05-21 2005-12-22 Clark Thomas K Method, apparatus and program storage device for providing failover for high availability in an N-way shared-nothing cluster system
US7496565B2 (en) 2004-11-30 2009-02-24 Microsoft Corporation Method and system for maintaining namespace consistency with a file system
US7506009B2 (en) 2005-01-28 2009-03-17 Dell Products Lp Systems and methods for accessing a shared storage network using multiple system nodes configured as server nodes
US7739677B1 (en) 2005-05-27 2010-06-15 Symantec Operating Corporation System and method to prevent data corruption due to split brain in shared data clusters
US7617216B2 (en) 2005-09-07 2009-11-10 Emc Corporation Metadata offload for a file server cluster
JP4795787B2 (en) 2005-12-09 2011-10-19 株式会社日立製作所 Storage system, NAS server, and snapshot method
US8019812B2 (en) 2007-04-13 2011-09-13 Microsoft Corporation Extensible and programmable multi-tenant service architecture
US20090204705A1 (en) 2007-11-12 2009-08-13 Attune Systems, Inc. On Demand File Virtualization for Server Configuration Management with Limited Interruption
US20090282046A1 (en) 2008-05-06 2009-11-12 Scott Alan Isaacson Techniques for accessing remote files
US7840730B2 (en) 2008-06-27 2010-11-23 Microsoft Corporation Cluster shared volumes

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020161855A1 (en) 2000-12-05 2002-10-31 Olaf Manczak Symmetric shared file storage system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP2718837A4

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104715001A (en) * 2013-12-12 2015-06-17 国际商业机器公司 Method and system performing wirite operation on shared resource in cluster of data processing system
CN104715001B (en) * 2013-12-12 2018-01-26 国际商业机器公司 The method and system of write operation is performed for the shared resource in the cluster to data handling system

Also Published As

Publication number Publication date
EP2718837A4 (en) 2015-08-12
EP2718837B1 (en) 2017-11-22
EP2718837A2 (en) 2014-04-16
WO2012170234A3 (en) 2013-02-07
CN103608798A (en) 2014-02-26
US20120311003A1 (en) 2012-12-06
US9652469B2 (en) 2017-05-16
CN103608798B (en) 2016-11-16

Similar Documents

Publication Publication Date Title
EP2718837B1 (en) Clustered file service
EP3792760B1 (en) Live migration of clusters in containerized environments
US20240291887A1 (en) Commissioning and decommissioning metadata nodes in a running distributed data storage system
US10439953B2 (en) System and method for partition migration in a multitenant application server environment
US8856091B2 (en) Method and apparatus for sequencing transactions globally in distributed database cluster
EP3350723B1 (en) Hosted file sync with stateless sync nodes
US20070061379A1 (en) Method and apparatus for sequencing transactions globally in a distributed database cluster
US20080244552A1 (en) Upgrading services associated with high availability systems
KR20120072908A (en) Distribution storage system having plural proxy servers, distributive management method thereof, and computer-readable recording medium
KR20120018178A (en) Swarm-based synchronization over a network of object stores
US20120166492A1 (en) Database transfers using constraint free data
US20160054993A1 (en) Modular architecture for distributed system management
Honnutagi The Hadoop distributed file system
US20160259812A1 (en) Method and system for accessing a distributed file system
WO2011071104A1 (en) Distributed file system, data selection method of same and program
KR20150111608A (en) Method for duplication of virtualization server and Virtualization control apparatus thereof
WO2014107901A1 (en) Data storage method, database storage node failure processing method and apparatus
US20210200648A1 (en) Distributed recovery of server information
CN116954816A (en) Container cluster control method, device, equipment and computer storage medium
JP2014041550A (en) Data migration processing system and data migration processing method
US8583774B2 (en) Mapping meaningful hostnames
WO2016046951A1 (en) Computer system and file management method therefor
US11853177B2 (en) Global entity distribution
US11556334B2 (en) Systems and methods for gradually updating a software object on a plurality of computer nodes
CN117806815B (en) Data processing method, system, electronic device and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12796591

Country of ref document: EP

Kind code of ref document: A2

REEP Request for entry into the european phase

Ref document number: 2012796591

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: DE