WO2016046951A1

WO2016046951A1 - Computer system and file management method therefor

Info

Publication number: WO2016046951A1
Application number: PCT/JP2014/075593
Authority: WO
Inventors: 鵜飼　敏之; 正道岡嶌; 雄介白神; 雅史柏木
Original assignee: 株式会社日立製作所
Priority date: 2014-09-26
Filing date: 2014-09-26
Publication date: 2016-03-31

Abstract

A computer system includes a plurality of clusters having a plurality of servers connected to a network and a shared storage device connected to the servers and shared by the respective servers, wherein a server belonging to at least one of the plurality of clusters is configured as a stand-by server. The stand-by server, in the event of a fault in a server belonging to any of the clusters, executes a fail-over process on condition that no stand-by server that is standing by for a process exists in the cluster to which the faulty server belongs. If the stand-by server standing by for a process exists in the cluster to which the faulty server belongs, the stand-by server executes the fail-over process in coordination with a take-over server which is any of the servers in the cluster to which the faulty server belongs and with which the stand-by server coordinates.

Description

Computer system and file management method thereof

The present invention relates to a computer system having a plurality of servers and a plurality of storage apparatuses, and a file management method thereof.

As an in-memory database constructed in such a way that all data stored in a storage device is expanded on a memory (main storage device) as a database of a server in a computer system, and data input / output processing for the memory is executed. Memory data processing programs are known. The server managing the file can execute data input / output processing at a higher speed than accessing the storage device by accessing the memory in which the in-memory database is constructed. However, since the data in the main storage device is generally volatilized, the data used in the in-memory database must be made permanent in the storage device. Accordingly, the server that operates the in-memory database is connected to the storage device for holding the data as in the conventional case.

If the amount of data handled by the in-memory database increases, increase the main memory capacity of one (or relatively few) servers, or increase the main memory capacity of a single server without increasing the main memory capacity of a single server. One option is to distribute the data to cover the main storage capacity in total. The former is classified as a so-called scale-up configuration, and the latter is classified as a so-called scale-out configuration. Compared with the scale-out configuration, the scale-up configuration generally does not require much attention to data distribution and is easy to use, but is inferior in scalability when the mounting cost and data amount further increase. For this reason, for example, a scale-out configuration is often used when data that is large to some extent is targeted, such as for the cloud. When configuring a system including a plurality of servers having a memory in which an in-memory processing program for managing an in-memory database is installed and a plurality of storage apparatuses, a scale-out type is adopted for cloud support. Yes.

The scale-out type system includes a storage non-shared type and a storage shared type. In the former system, a configuration in which each server and each storage device are individually connected is adopted, so that a single storage device can be used for each server, and the storage device is configured at low cost. be able to. However, when a server failure occurs, it is necessary for the other server to take over the processing of the failed server. Therefore, a copy (replica) of the file (data) to be processed by each server is under the control of the other server. It must be created in the storage device. For example, Patent Document 1 describes that a copy is created in another server (node) to increase availability.

On the other hand, in the case of the latter system, each server is configured to be connected to a storage device shared by each server, so that it is possible to cope with server failures without creating a replica in the storage device. I can. For example, in Patent Document 2, when a failure occurs in the active server, when the task in the computer system is taken over to a server that is not in operation, the failure of the active server is detected and the same as the active server in the computer system. Search for a server that is not running a business with a hardware configuration, enable access to the external disk device from the server found as a result of the search, and take over the business by booting the server from the external disk device Is described.

The storage device shared by each server has fault tolerance by providing data redundancy inside the storage or by duplicating internal paths and interfaces. For example, Patent Document 3 discloses a RAID (Redundant Array of RAID), which is a disk array having a fault tolerance, in which redundant data is stored in at least one of a plurality of disks in a storage device composed of a plurality of disks. Inexpensive Disks). Each server sharing the storage uses a logical disk unit (hereinafter, logical unit) cut out from the RAID as a logical storage device as a disk device. When a failure occurs in a certain server, the logical unit that was performing input / output on that server is taken over by making it accessible from another server that shares the storage device.

International Publication Number WO2012-121316 JP 2006-163963 A JP 2004-295457 A

When a shared storage system is adopted as a scale-out system, it is necessary to prepare a standby server as a server for taking over the processing of the failed server among a plurality of servers. At this time, if it is necessary to configure the system using more servers than the number of servers that can be connected to the shared storage device in the entire system, at least standby servers for the number of shared storage devices are required, which increases the installation cost. .

Further, when a storage non-sharing system is adopted as the scale-out type system, a single storage device can be used for each server, so that the storage device can be configured at low cost. However, when a server failure occurs, it is necessary for another server to take over the processing of the failed server, so make a copy (replica) of the file (data) to be processed by each server on the other server. Will be forced. In this case, the server that took over the processing of the failed server reconfigures the data of the failed server based on the replica stored in the server other than the failed server, so that the memory under the management of the failed server Services that are accessed by the device can be resumed.

However, in consideration of the occurrence of a failure on one of the servers, if each server adopts a method of creating a replica of the file to be processed by the other server, with each server, As the write processing on the storage device increases, the traffic on the network connecting each server and each storage device also increases. In addition, when reconstructing the data of the failed server based on the replica stored on the server other than the failed server when a failure occurs, the replica stored on the server other than the failed server is used as the storage device of the takeover server. After restoration, a data load must be executed, which increases the time until service restart.

An object of the present invention is to take over the processing of a faulty server using a minimum standby server when a fault occurs in the server. Another object of the present invention is to speed up recovery from a failure when a failure occurs without creating a replica in each server.

In order to solve the above problems, the present invention includes a plurality of clusters including a plurality of servers connected to a network and a shared storage device connected to each of the servers and shared by the servers, A server belonging to at least one of the plurality of clusters is configured as a standby server, and at least the standby server performs a failover for taking over the processing of the failed server when a failure occurs in a server belonging to any one of the clusters. A process is executed.

According to the present invention, it is possible to take over the processing of a failed server by using a minimum number of standby servers when a server failure occurs without creating a replica on each server.

It is a block diagram of a computer system showing a first embodiment of the present invention. It is a block diagram of a load timing setting information table. It is a block diagram of a copy policy information table. It is a block diagram of a file correspondence information table. It is a flowchart for demonstrating the initialization process of an active server. It is a flowchart for demonstrating the initialization process of a standby server. It is a flowchart for demonstrating the failure time taking over process. It is a flowchart for demonstrating copy processing at the time of a load. It is a flowchart for demonstrating a copy incomplete file copy process. It is a schematic block diagram for demonstrating operation | movement of the server at the time of failure occurrence and after completion of taking over. It is a block diagram of the computer system which shows the 2nd Example of this invention. It is a flowchart for demonstrating the failure time taking over process. It is a schematic block diagram for demonstrating operation | movement of the server at the time of failure occurrence and after completion of taking over.

(Example 1)
In this example, when a failure occurs, the takeover server publishes information about the logical unit managed by the failure server on the network, and the takeover server and standby server cooperate to execute failover processing. To do.

FIG. 1 is a block diagram of a computer system showing a first embodiment of the present invention. In FIG. 1, the computer system includes a plurality (# 1 to #n) of clusters 100. Each cluster 100 has a plurality of servers 101, and each server 101 belonging to each cluster 100 is connected to a storage device (shared storage device) 102 shared by each server 101 for each cluster 100. Each server 101 is connected to the network 103. At this time, the computer system is a scale-out system and is configured as a shared storage system. For example, a LAN (Local Area Network) is used as the network 103. At least one of the servers 101 belonging to each cluster 100 is configured as a standby server, and the other servers are configured as active servers. For example, among the plurality of servers 101 (server M1 to server MN) belonging to the #n cluster 100, the #MN server 101 is configured as a standby server MN, and the other servers 101 (belonging to the # 1 cluster 100) Servers 11 to 1N,..., #N belonging to the cluster 100 are configured as active servers.

Each server 101 includes a memory 111, a processor 112, and input / output interfaces (I / F) 113 and 114, and each unit is connected to each other via an internal bus 115. Each memory 111 stores a program 121, data 122 (122-11, ..., 122-1N, ..., 122-M1, ...), a file system 123, and a virtual drive program 124. (Only a part is shown). The program 121 is, for example, a user program or an application program, and is configured as an in-memory processing program for managing an in-memory database (DB). Each data 122 includes data transferred from the storage apparatus 102, a file body, a log, and the like.

Each file system 123 is a software resource that executes various processes in response to processing requests from the program 121, for example, a shared file system client, a shared file server (not shown), or each transferred from the storage apparatus 102. Consists of files (not shown). The file system 123 of each server 101 stores a table storage unit 130 (only a part of which is shown) for storing various tables. The file system 123 of the standby server MN includes software resources for executing the failure takeover process 134, the load copy process 135, and the copy incomplete file copy process 136, such as a shared file system client or a shared file server ( Neither of them is stored).

Each processor 112 is configured as a control unit that performs overall control of the server 101, and a terminal (not shown) connected to another server 101 or the network 103 via the input / output interface 113 and the network 103, information and data. In addition, it is determined whether or not a failure has occurred in another server 101 by transmitting / receiving a heartbeat signal. Each processor 112 transmits and receives information and data to and from the storage apparatus 102 via the input / output interface 114.

Each storage apparatus 102 includes a plurality of input / output interfaces 141, a controller 142, and a plurality of storage devices 143, and each unit is connected to each other via

internal buses

144 and 145. Each input / output interface 141 is arranged corresponding to each server 101 and connected to the input / output interface 114 of each server 101. Each storage device 143 is arranged corresponding to each server 101. In each storage device 143, data 151 (151-11,..., 151-1N,..., 151-M1,...) To be managed (accessed) by each server 101 is stored in each storage device 143. Stored as file data.

Each controller 142 performs overall control of the entire storage apparatus 102 and responds to an access request (read request or write request) from each server 101 to input / output data (files) to / from the storage area of the designated storage device 143. Control. At this time, a logical unit (LU) that is a management target (access target) of each server 101 is set in the storage area (storage area) of each storage device 143. For example, in the storage area of each storage device 143 corresponding to the server 101 (server 11 to server 1N) belonging to the cluster # 1, LU11 to LU1N are set as logical units LU and belong to the cluster 100 #n. LUM1 to LUMN are set as logical units LU in the storage area of each storage device 143 corresponding to the server 101 (server M1 to server MN).

FIG. 2 is a configuration diagram of the load timing setting information table. In FIG. 2, a load timing setting information table 131 is a table stored in the table storage unit 130 of the file system 123 in each server 101, and manages the timing for loading a plurality of files into the file system 123. The table includes a file name field 131A and a file load designation field 131B.

The file name is a name for specifying each file stored in the file system 123. For example, information of “file 1” is stored in the entry of the file name field 131A. As the file name, a directory name for representing the entire file under a specific directory, or a wild card for indicating a specific file name pattern (a symbol that can be used in place of a character, for example, “*” or “?”). ) May be specified. The file load designation is information for designating the timing for loading each file into the file system 123. In the entry of the file load designation field 131B, for example, when the file 1 or the entire file 3 is loaded into the file system 123 when the program 121 is activated, information of “all loads at activation” is stored. In addition, when part of the file 2 is loaded into the file system 123 when the program 121 is activated, information of “partial load at activation” is stored. Further, when the program 121 accesses the table in the file system 123 for the first time, when loading the entire file 4 into the file system 123, information of “full load at first access” is stored. In addition, when the program 121 accesses the table in the file system 123 for the first time, when loading a part of the file 5 into the file system 123, information of “partial load at first access” is stored.

Note that here, the load timing setting information table 131 associates the file with the load designation information, but it may be a DB table handled by the in-memory database and its load designation information. In this case, a data structure that associates the DB table with the file is separately provided (not shown). At this time, a plurality of DB tables may constitute one file, or one DB table may comprise a plurality of files.

FIG. 3 is a configuration diagram of the copy policy information table. In FIG. 3, a copy policy information table 132 is a table for managing copy policy information stored in the table storage unit 130 of the file system 123 in each server 101 and copying each file. A field 132A, a file load designation field 132B, and a prescribed operation field 132C are included.

Policy ID is an identifier for identifying the policy ID of each file. For example, information of “1” to “5” is stored in the entry of the policy ID field 132A in order to identify five types of policy IDs. The file load designation field 132B stores the same information as the file load designation field 131B of the load timing setting information table 131. In the file load designation field 132B of the policy ID “5”, information other than the above is stored as information related to the load designation of a file different from the policy IDs “1” to “4”.

Specified operation is an operation that is set in correspondence with the policy ID, and is information that specifies an operation when copying a file. In the entry of the prescribed operation field 132C, for example, “automatic (automatic copy at loading)” information is stored as a prescribed operation when the file with the policy ID “1” is automatically copied. In addition, information of “whole prefetching (automatic copy at loading)” is stored as a prescribed operation when the file of policy ID “2” is copied by “partial loading at startup”. Furthermore, as a specified operation when copying the file with the policy ID “3” with “full load at first access” or as a specified operation at the time of copying the file with the policy ID “4” with “partial load at first access” , “Read complete / uncompleted record, copy when reusing LU if not complete” is stored. At this time, if the reading of the file is completed, the “read complete” information is recorded. If the read of the file is not complete, the “read incomplete” information is recorded and the LU is read again. Files will be copied when used.

Also, “Copy before loading” information is stored as a prescribed operation when copying the file with the policy ID “5” with “other than the above”. In this case, the program 121 reads the file and copies the read file before loading it into the file system 123.

FIG. 4 is a configuration diagram of the file correspondence information table. In FIG. 4, a file correspondence information table 133 is a table stored in the table storage unit 130 of the file system 123 in the standby server MN in order to manage the file correspondence information generated by the active server 101. The field 133A includes a policy ID field 133B, a migration source location information field 133C, a migration (complete / incomplete) field 133D, and a migration destination location information field 133E. The file name field 133A stores the same information as the file name field 131A of the load timing setting information table 131. However, when a directory name for representing the entire file under a specific directory or a wild card for indicating a specific file name pattern is used, information on the expanded file name may be stored. The policy ID field 133B stores the same information as the policy ID field 132A of the copy policy information table 132.

The migration source location information is information that identifies the migration source location of a file that is subject to failover processing when a failure occurs in any of the servers 101. In the entry of the migration source location information field 133C, for example, “file 1” is a file managed by the server 11 among the servers 101 belonging to the cluster # 1, and the management target (access target) of the server 11 and When the logical unit assigned to the storage area of the storage device 143 is LU11, information of “server 11: LU11” is stored.

“Migration” is information indicating whether or not migration of a file to be subjected to failover processing is completed. In the entry of the migration field 133D, for example, when migration of a file to be subjected to failover processing is incomplete, information of “incomplete” is stored, and migration of the file to be subject to failover processing is completed. In this case, “complete” information is stored.

The migration destination location information is information that identifies the location of the migration destination of the file that is subject to failover processing. In the entry of the migration destination position information field 133E, for example, the migration destination of “file 1” is the standby server MN, and is set in the storage area of the storage device 143 to be managed (accessed) by the standby server MN. If the logical unit is a LUMN, information of “server MN: LUMN” is stored. Note that the migration source location information and the migration destination location information may be information that can uniquely identify the logical unit in the system. For example, other information such as UUID (Universally Unique ID) may be used.

FIG. 5 is a flowchart for explaining the initialization process of the active server. This process is performed by each active server 101. Each server 101 in each cluster 100 reads the load timing setting information registered in the load timing setting information table 131 stored in the table storage unit 130 of the file system 123 (S501), and then the copy policy information table 132. The file correspondence information including the file name, the policy ID, and the migration source position information is generated for each file, and the generated file correspondence information is registered in the file system 123 (S502). The processing in this routine is terminated.

FIG. 6 is a flowchart for explaining the standby server initialization process. This process is performed by the standby server MN. The standby server MN acquires, from each server 101, the file correspondence information generated by each server 101 (information generated in step S502 in FIG. 5), collects the acquired file correspondence information, and integrates the file correspondence information. (File name, policy ID, source position information) are registered in the file correspondence information table 133 (S601), and the processing in this routine is terminated.

FIG. 7 is a flowchart for explaining the failure takeover process. This failure takeover process 134 is executed by any active server 101 or standby server MN of the cluster 100 to which the failure server belongs when a failure occurs in any active server 101 among the active servers 101.

First, when a failure occurs in any active server 101 among the plurality of servers 101 belonging to each cluster 100, any active server (a server other than the failed server) 101 in the cluster 100 to which the failed server belongs. Determines whether there is a standby server (standby server waiting for processing), for example, a standby server MN, in the cluster 100 to which the failed server belongs (S701). At this time, even if there is a standby server in the cluster, for example, if the standby server is still taking over from another server or is being set up as a standby server, it is not determined that the server can be used.

If a negative determination result is obtained in step S701, that is, if there is no standby server that can be used in the cluster 100 to which the failure server belongs, for example, there is no standby server MN, one of the clusters 100 to which the failure server belongs. The active server 101 makes the logical unit LU managed by the failed server available to the virtual drive program 124 as a target (S702). In other words, the active server 101 functions as a takeover server (device server) in order to perform failover processing for taking over the processing of the failed server in cooperation with the standby server MN waiting for processing. A process for publishing the logical unit LU managed by the server on the network 103 with the virtual drive program 124 as a target is executed.

For example, when the server 1N among the active servers 101 belonging to the cluster # 1 becomes a failure server, for example, the server 11 among the active servers 101 belonging to the cluster 100 # 1 is a takeover server (device server). ) And the virtual drive program 124 of the server 11 executes a process for making the logical unit LU1N managed by the failure server 1N public on the network 103. That is, the server 11 publishes on the network 103 access information (information including the server ID and LUID) for the storage area of the storage apparatus 102 (storage area of the storage device 143) managed by the failure server 1N.

Next, the file system 123 of the standby server MN that has detected that the server 1N has become a failure occurrence server executes a process for failing over from the failure occurrence server 1N to the standby server MN (S703). That is, the file system 123 of the standby server MN performs a process for migrating the file managed by the failure server 1N to the storage device 102 of the standby server MN with the logical unit LU1N disclosed on the network 103 as a target. Execute.

Next, the file system 123 of the standby server MN acquires the server ID and LUID of the failure server 1N from the virtual drive program 124 under the management of the server 11 as migration source location information (access information) (S704).

Next, the standby server MN refers to the file correspondence information table 133 based on the acquired transfer source location information, determines the file name and policy ID of the file specified by the transfer source location information, and determines the determined file name The copy policy information table 132 is referred to based on the policy ID of “No.”, and “copy before loading” is executed as a prescribed operation for this file on condition that the policy ID is “5” (S705).

That is, when the standby server MN reads a file specified by the logical unit LU managed by the failure server 1N, the standby server MN transfers a read request including the migration source location information (access information) to the server 11. When the server 11 receives the read request, the server 11 acquires the file managed by the failure server 1N based on the read request from the failure server 1N and the controller 142 connected to the server 11, and acquires the acquired file on the network. Transfer to the standby server MN via 103. The standby server MN loads the file acquired via the network 103 into the memory 111 and copies it to the storage area of the storage device 143 managed by the standby server MN. At this time, the standby server MN registers “complete” in the entry of the migration field 133D of the file correspondence information table 133 as information indicating the completion of the migration of the file, and migrates the file to the entry of the migration destination location information field 133E. Information of “server MN: LUMN” is registered as the destination position information, and the process proceeds to step S706.

On the other hand, when a positive determination result is obtained in step S701, that is, when there is an available standby server (standby server waiting for processing) in the cluster 100 to which the failed server belongs, the cluster 100 to which the failed server belongs. If any of the active servers 101 among them is a standby server waiting for processing, for example, the failure server and the standby server MN belong to the same cluster, the standby server MN waiting for processing is requested to perform failover processing. As a result, the standby server MN executes failover processing for transferring the file stored in the storage area of the storage device 143 managed by the failed server to the storage device 143 managed by the standby server MN. (S707). When the server 1N is a failure server and there is a standby server waiting for processing in the cluster 100 to which the failure server 1N belongs, this standby server executes failover processing.

Thereafter, the file system 123 of the standby server MN acquires the server ID and LUID of the failed server as migration source location information (access information) from the controller 142 of the storage apparatus 102 under the management of the failed server ( S708).

Next, the standby server MN refers to the file correspondence information table 133 based on the acquired migration source location information, registers the server MN: LUMN as the migration source information of the file in the migration source location information field 133C, and The access destination of the file managed by the generation server is rewritten to the access destination of the standby server 101 (S709). Thereafter, when the active server 101 in the cluster 100 to which the standby server MN belongs receives information indicating the completion of the file migration from the standby server MN, for example, it fails over to a monitoring monitor (not shown) that monitors the program 121. Is registered (S706), and the processing in this routine is terminated.

FIG. 8 is a flowchart for explaining the copy process at the time of loading. This load copy process 135 is executed by the file system 123 of the standby server MN. First, when the file system 123 of the standby server MN receives a read request including a file name from the program (application program) 121 of the standby server MN (S801), the file system 123 of the standby server MN stores the file correspondence information. With reference to the table 133, it is determined whether or not it is a read request for a file registered in the file correspondence information table 133 (S802).

If the file system 123 obtains a negative determination result in step S802, the file system 123 proceeds to the process of step S805. On the other hand, if the file system 123 obtains a positive determination result in step S802, the file system 123 sets the policy ID of the file specified in the read request. Based on the copy policy information table 132, it is determined whether or not the prescribed operation of the file designated by the read request is “automatic copy at load” (S803). At this time, the file system 123 determines whether the policy ID of the file specified by the read request is “1” or “2”.

If the file system 123 obtains a negative determination result in step S803, the file system 123 proceeds to the process of step S805. On the other hand, if the file system 123 obtains a positive determination result in step S803, the file system 123 moves the file specified in the read request to the standby server MN. Is copied to the storage area of the storage device 143 to be accessed, and then the file correspondence information table 133 is referred to, and the file specified in the read request is copied to the migration field 133D of the file correspondence information table 133. “Complete” is stored as information indicating completion, and “server MN: LUMN” is registered as migration destination location information in the migration destination location information field 133E of the file correspondence information table 133 (S804).

Thereafter, in response to the read request from the program 121, the file system 123 returns the result of the read process to the program 121 (S805), and ends the process in this routine.

FIG. 9 is a flowchart for explaining copy incomplete file copy processing. This copy incomplete file copy process 136 is executed by the file system 123 of the standby server MN. This process is executed before the storage area of the copy source storage device 143 is released, such as when the failure server is used as a standby system. In addition, this process is performed when the policy ID of the file to be subjected to the failover process is any one of “1”, “3”, and “4”, and copying is not executed in the specified operation, or for other reasons Is executed in consideration of the fact that the copy process is leaked.

First, the file system 123 of the standby server MN is the management target (access target) of the failed server, and the file list in the copy target storage (storage device 143) is stored in any of the clusters 100 to which the failed server belongs. From the current active server, for example, the server (device server) 11 (S901), and copying to the storage area of the storage device 143 that is the management target (access target) of the standby server MN is completed in the acquired file list It is determined whether there is a file that has not been processed (S902).

Next, if the file system 123 obtains a negative determination result in step S902, the file system 123 proceeds to the process of step S904. If the file system 123 obtains a positive determination result in step S902, the file system 123 locks the incomplete copy file and copies it. The file is acquired from the storage area of the copy source storage device 143 in which the completed file is stored, and the acquired file is copied to the storage area of the storage device 143 that is the management target (access target) of the standby server MN (S903). ).

Thereafter, if necessary, the original standby server MN is switched to the active server, and the failure server is switched to the standby server (S904), and the processing in this routine is terminated. Thereby, the storage area of the copy source storage device can be released.

In the copy incomplete file copy process, when the policy ID of the file to be subjected to the failover process is “5” and the specified operation is “copy before load”, the file system 123 of the standby server MN It is assumed that the copying of other files is terminated before the operation is resumed. In addition, the copy incomplete file copy process is a process that considers the case where the operation of the program 121 is to be resumed as soon as possible, or the case where a copy is leaked.

FIG. 10 is a schematic configuration diagram for explaining the operation of the server when a failure occurs and after the takeover is completed. In FIG. 10, among the plurality of servers 101 belonging to each cluster 100, for example, when the active server 1N belonging to the # 1 cluster 100 becomes a failure server, the server 11 belonging to the # 1 cluster 100 is the device server. The virtual drive program 124 of the server 11 executes a process for releasing the logical unit LU1N managed by the failure server 1N on the network 103 as a target. The virtual drive program 124 of the server MN can access the published logical unit LU1N as an initiator, and the shared file system server 162 and the local file system 163 of the server MN can be mounted and used as a shared file system. To.

At this time, the shared file system client 161 and the shared file system server 162 of the file system 123 detect that the active server 1N has become a failure server (heartbeat monitoring between the shared file systems or FIG. As a processing request for taking over the processing of the failed server 1N) (for example, a processing request for releasing the logical unit LU1N managed by the failed server 1N on the network 103) (Virtual drive disclosure setting request) operates on one of the servers selected based on a predetermined rule (by majority decision or a predetermined priority) among the file systems 123 operating on a plurality of servers File system 12 But, to produce. The file system 123 transfers this processing request to the shared file system server 162 on the server 11 selected to take over and expose the logical unit LU1N. The shared file system server 162 that has received the processing request sends to the virtual drive program 124 that has been started in advance to execute processing for releasing the logical unit LU1N managed by the failure server 1N on the network 103. Transfer processing request. The virtual drive program 124 publishes the logical unit LU1N (access information) managed by the server 1N on the network 103 via the network communication driver 165.

Further, the file system 123 operating on any one of the servers selected based on the predetermined rule (by majority decision or by a predetermined priority order) uses the logical unit LU1N disclosed on the network 103. A file server takeover request for the used file system 123 is also generated as a processing request. The file system 123 transfers this processing request to the shared file system server 162 on the server MN selected to take over processing of the processing request for the file system 123 using the logical unit LU1N.

The shared file system server 162 requests the virtual drive program 124 to make the logical unit LU1N published on the network 103 available as an initiator. The virtual drive program 124 of the server MN transfers an instruction for making the logical unit LU1N available as a virtual drive to the target virtual drive program 124 via the network communication driver 165 of the standby server MN.

At this time, the shared file system server 162 on the server MN makes the logical unit LU1N available by mounting the virtual drive provided by the virtual drive program 124 as an initiator through the local file system 163. As a result, the logical unit LU1N managed by the shared file system server 162 on the failure server 1N can be accessed through the shared file system server 162 on the server MN. A read request for data on the logical unit LU1N issued by the user program 121 is sent through the shared file system client 161, the shared file system server 162, the local file system 163, the virtual drive program 124, and the network communication driver 165. And transferred to the server 11.

The server 11 that has received the read request instructs the storage driver 164 to access the local storage through the network communication driver 165 and the virtual drive program 124. The storage driver 164 reads the specified file from the storage area of the storage device 143 using the storage device 143 managed by the failure server 1N as an access target, and transfers the read file to the target virtual drive program 124. To do.

The virtual drive program 124 transfers the file to the virtual drive program 124 via the network communication driver 165, the network 103, and the network communication driver 165 of the server MN. The virtual drive program 124 as an initiator transfers the file to the shared file system server 162 through the local file system 163. The shared file system server 162 transfers the received file to the local file system 163 as a file that is under the management of the failure occurrence server 1N and is subject to failover processing.

At this time, the shared file system server 162 processes the file transferred from the virtual drive program 124 serving as the initiator via the network 103 as a file obtained by accessing the remote storage, and processes the file as a local storage. Is written in the storage area of the storage device 143 to be managed (accessed) by the standby server MN. Thereafter, the shared file system server 162 of the standby server MN rewrites the access destination for the file under the management of the failure server 1N to the storage area of the storage device 143 under the management of the standby server MN. Restart services for files that were under management.

According to the present embodiment, since each cluster 100 is configured as a shared storage type and the standby server MN is arranged in at least one cluster 100, each active server 101 uses another server when a server failure occurs. The processing of the failed server 1N can be taken over with a minimum number of standby servers without creating 101 replicas, which can contribute to the reduction of overhead associated with the creation of replicas. Further, according to the present embodiment, when a file to be subjected to failover processing is loaded into the memory 111 of the standby server MN, the file is simultaneously copied to the storage area of the storage device 143 managed by the standby server MN. As a result, recovery from failures can be accelerated. That is, it is possible to shorten the time until the service is resumed for the failure server 1N and to speed up the file copy to the storage area of the storage device 143 to be managed by the standby server MN. Furthermore, according to the present embodiment, when a failure occurs, the takeover server 11 may publish information on the logical unit managed by the failure occurrence server 1N on the network 103. The load can be distributed more than when all the input / output processing for the storage area of the storage device 143 managed by the generation server 1N is managed.

(Example 2)
In this embodiment, instead of publishing information on the logical unit managed by the failed server on the network when a failure occurs, the takeover server performs input / output processing for the storage area managed by the failed server. The takeover server and standby server cooperate with each other to execute failover processing.

FIG. 11 is a block diagram of a computer system showing a second embodiment of the present invention. In this embodiment, one of the active servers 101 in the cluster 100 to which the failed server belongs, for example, the server 11 executes the takeover process 137 at the time of failure. The other configuration is the same as that of the first embodiment. It is the same.

FIG. 12 is a flowchart for explaining the failure takeover process. This failure takeover process 137 is executed by one of the active servers 101 or the standby server MN of the cluster 100 to which the failure server belongs when a failure occurs in any one of the active servers 101.

First, when a failure occurs in any active server 101 among the plurality of servers 101 belonging to each cluster 100, any active server (a server other than the failed server) 101 in the cluster 100 to which the failed server belongs. Determines whether there is a standby server (a standby server waiting for processing), for example, a standby server MN, which functions as a file server and can be used in the cluster 100 to which the failed server belongs (S1101). At this time, even if there is a standby server in the cluster, for example, if the standby server is still taking over from another server or is being set up as a standby server, it is not determined that the server can be used.

When a negative determination result is obtained in step S1101, that is, for example, when there is no standby server MN as a standby server waiting for processing in the cluster 100 to which the failure server belongs, any of the clusters 100 to which the failure server belongs. The current server 101 takes over the logical unit LU managed by the failed server and makes a failover to the active server (takeover server) 101 so that the logical unit LU can be used as a file system. Is executed (S1102).

For example, when the server 1N among the active servers 101 belonging to the cluster # 1 becomes a failure server, for example, the server 11 among the active servers 101 belonging to the cluster 100 # 1 is a takeover server or a file server. The logical unit LU1N managed by the failure server 1N is made available as the file system 123. That is, the server 11 manages the storage area of the storage device 143 managed by the failed server 1N (storage area specified by the logical unit LU1N) as a new storage area to be managed, Access information (information including the server ID and LUID of the server 11) indicating that the server 11 is the management source of the failure server 1N is generated, and the generated access information is transferred to the standby server MN via the network 103.

Next, the file system 123 of the standby server MN acquires the server ID and LUID of the server 11 from the server 11 as migration source location information (access information) (S1103).

Next, the standby server MN refers to the file correspondence information table 133 based on the acquired transfer source location information, determines the file name and policy ID of the file specified by the transfer source location information, and determines the determined file name The copy policy information table 132 is referred to based on the policy ID of “No.”, and “copy before load” processing is performed as a prescribed operation for this file on condition that the policy ID is “5” (S1104). At this time, when the standby server MN reads the file specified by the logical unit LU managed by the failure server 1N, the standby server MN generates a read request based on the access information including the server ID and LUID of the server 11, The generated read request is transferred to the server 11. The server 11 that has received the read request is the storage area of the storage device 143 managed by the failure server 1N, and is designated by the read request from the takeover storage area (the storage area specified by the logical unit LU1N). The read file is read, and the read file is transferred to the standby server MN via the network 103.

The standby server MN loads the file acquired from the server 11 via the network 103 into the memory 111 and copies it to the storage area of the storage device 143 under the management of the standby server MN. At this time, the standby server MN registers “complete” in the entry of the migration field 133D of the file correspondence information table 133 as information indicating the completion of the migration of the file, and the migration destination position information field 133E of the file correspondence information table 133 is registered. In the entry, information of “server MN: LUMN” is registered as file migration destination position information, and the process proceeds to step S1105.

On the other hand, when a positive determination result is obtained in step S1101, that is, when there is an available standby server (standby server waiting for processing) in the cluster 100 to which the failed server belongs, the cluster 100 to which the failed server belongs. If any of the active servers 101 among them is a standby server waiting for processing, for example, the failure server and the standby server MN belong to the same cluster, the standby server MN waiting for processing is requested to perform failover processing. As a result, the standby server MN performs a failover for transferring the file stored in the storage area of the storage device 143 under the management of the failed server to the storage area of the storage device 143 under the management of the standby server MN. Processing is executed (S1106). When the server 1N is a failure server and there is a standby server waiting for processing in the cluster 100 to which the failure server 1N belongs, this standby server executes failover processing.

Thereafter, the file system 123 of the standby server MN acquires the server ID and the LU ID of the failure server 1N from the controller 142 of the storage apparatus 102 under the management of the failure server 1N as migration source location information (S1107). .

Next, the standby server MN refers to the file correspondence information table 133 based on the acquired migration source location information, and stores the server MN: as the migration source information of the file in the migration source location information field 133C of the file correspondence information table 133. The LUMN is registered, and the access destination of the file managed by the failure server 1N is rewritten to the standby server 101 (S1108).

Thereafter, when the server 11 receives information indicating the completion of file migration from the standby server MN, for example, the server 11 registers the completion of failover in a monitoring monitor (not shown) that monitors the program 121 (S1105). The routine processing is terminated.

FIG. 13 is a schematic configuration diagram for explaining the operation of the server when a failure occurs and after the takeover is completed. In FIG. 13, among the plurality of servers 101 belonging to each cluster 100, for example, when the active server 1N belonging to the # 1 cluster 100 becomes a failure server, the server 11 belonging to the # 1 cluster 100 is a file server. And executes a process for taking over the process of the faulty server 1N. At this time, the shared file system client 161 and the shared file system server 162 of the file system 123 detect that the active server 1N has become a failure server (heartbeat monitoring between the shared file systems or FIG. A processing rule (file server takeover request) for taking over the processing of the failure server 1N (by a server alive monitoring program not shown) and a predetermined rule (majority decision, out of file systems 123 operating on a plurality of servers) Alternatively, a file system 123 that operates on any server selected based on a predetermined priority order is generated. The file system 123 transfers this processing request to the shared file system server 162 on the server 11 selected to take over and expose the logical unit LU1N.

Upon receiving the processing request, the shared file system server 162 receives, as information for executing failover processing in cooperation with the standby server MN, access information (server 11) indicating that the server 11 is the management source of the failure server 1N. Information including the server ID and LUID of the server), and the generated access information is transmitted to the shared file system client 161 in the system including the standby server MN via the network communication driver 165, the network 103, and the network communication driver 165. Forward. However, as long as the file system server can be associated with the server, the representative server may have a correspondence table, and the contents of the correspondence table may be updated. The read request for the data on the logical unit LU1N issued by the user program 121 is a read request when the shared file system client 161 of the standby server MN determines that the server 11 is the management source of the failed server 1N. Are transferred to the server 11 via the network 103.

When the server 11 receives a read request from the standby server MN via the network communication driver 165, the server 11 outputs a processing request for accessing the local storage to the shared file system server 162 of the file system 123. The shared file system server 162 that has received the processing request transfers the processing request to the local file system 163. The local file system 163 instructs the storage driver 164 to access the local storage in order to execute access to the local storage.

The storage driver 164 executes a read process for the storage area (the takeover storage area specified by the logical unit LU1N) of the storage device 143, with the storage device 143 managed by the failure server 1N as an access target. Then, the file (file specified by the read request) subject to failover processing is read from the storage area specified by the logical unit LU1N, and the read file is transferred to the shared file system server 162 via the local file system 163. To do. The shared file system server 162 of the server 11 processes the file obtained by accessing the local storage as a file specified by the read request from the server MN, and processes this file via the network 103. Transfer to system client 161.

The shared file system client 161 of the server MN processes the file transferred from the server 11 as a file obtained by accessing the remote storage (file to be subjected to failover processing), and processes this file on the standby server MN. Processing for writing to the storage area of the storage device 143 to be managed (accessed) is executed. At this time, the shared file system client 161 of the file system 123 communicates with the shared file system server 162 on the server MN and the storage driver 164 via the local file system 163 in order to take over the processing of the failure server 1N. Instructs access to local storage.

The storage driver 164 stores the file transferred from the server 11 in the storage area managed by the server MN in the storage area of the storage device 143. Thereafter, in the standby server MN, the shared file system client 161 rewrites the access destination for the file under the management of the failure server 1N to the storage area of the storage device 143 under the management of the standby server MN, and the failure server 1N Restart the services for files that were under the control of.

According to the present embodiment, as in the first embodiment, the processing of the failed server 1N can be taken over with the minimum number of standby servers MN, which can contribute to the reduction of overhead associated with the creation of the replica. , Recovery from failure can be speeded up. Further, according to the present embodiment, the takeover server 11 executes file input / output processing for the storage area of the storage device 143 managed by the failure server 1N, and cooperates with the standby server MN to execute the file Over processing can be executed.

In addition, this invention is not limited to the above-mentioned Example, Various modifications are included. For example, the above-described embodiments have been described in detail for easy understanding of the present invention, and are not necessarily limited to those having all the configurations described. Further, a part of the configuration of one embodiment can be replaced with the configuration of another embodiment, and the configuration of another embodiment can be added to the configuration of one embodiment. Further, it is possible to add, delete, and replace other configurations for a part of the configuration of each embodiment.

Also, each of the above-described configurations, functions, etc. may be realized by hardware by designing a part or all of them, for example, by an integrated circuit. Each of the above-described configurations, functions, and the like may be realized by software by interpreting and executing a program that realizes each function by the processor. Information such as programs, tables, and files that realize each function is stored in memory, a hard disk, a recording device such as an SSD (Solid State Drive), an IC (Integrated Circuit) card, an SD (Secure Digital) memory card, a DVD ( It can be recorded on a recording medium such as Digital Versatile Disc).

100 cluster, 101 server, 102 storage device, 103 network, 111 memory, 112 processor, 113, 114 I / O interface, 121 program, 122 data, 123 file system, 124 virtual drive program, 130 table storage, 131 load timing setting Information table, 132 copy policy information table, 133 file correspondence information table, 134 takeover process at failure, 135 copy process at load, 136 copy process for incomplete files, 137 takeover process at failure.

Claims

A server having a plurality of clusters including a plurality of servers connected to a network and a shared storage device connected to each of the servers and shared by each of the servers, and belonging to at least one of the plurality of clusters Is configured as a standby server,
At least the standby server, when a failure occurs in a server belonging to any one of the clusters, executes a failover process for taking over the process of the failed server.
The computer system according to claim 1,
The standby server is
When a failure of a server belonging to any one of the clusters occurs, the failover processing is executed on the condition that the server to which the failure occurs belongs as a standby server waiting for processing,
If there is no standby server waiting for processing in the cluster to which the failed server belongs when a failure of the server belonging to any one of the clusters occurs, any server in the cluster to which the failed server belongs, A computer system, wherein the failover processing is executed in cooperation with a takeover server as a cooperation destination of a standby server.
The computer system according to claim 2,
The takeover server is
If there is no standby server waiting for processing in the cluster to which the failed server belongs, the access information for the storage area managed by the failed server in the storage area of the shared storage device is disclosed on the network. The file specified in the read request was managed by the failed server on condition that a read request including access information published on the network was received from the standby server via the network. Read from the storage area, transfer the read file to the standby server via the network,
The standby server is
When the access information published on the network is received, a read request including the received access information is generated, the generated read request is transferred to the takeover server via the network, and the read request is received. A computer system characterized in that the received file is managed as a file to be subjected to the failover process on condition that the file transferred from the takeover server in response is received.
The computer system according to claim 2,
The takeover server is
When the standby server waiting for processing does not exist in the cluster to which the failed server belongs, the storage area that was the management target of the failed server among the storage areas of the shared storage device is taken over as a new management target. Managing as a storage area for storage, generating access information indicating that the takeover server is a management source of the failed server, transferring the generated access information to the standby server via the network, On the condition that a read request including access information is received from the standby server via the network, the file specified by the read request is read from the takeover storage area, and the read file is read via the network. Forward to the standby server,
The standby server is
When the access information transferred from the takeover server is received, a read request including the received access information is generated, the generated read request is transferred to the takeover server via the network, and the read request The computer system is characterized in that the received file is managed as a file to be subject to the failover process on condition that the file transferred from the takeover server in response to the request is received.
The computer system according to claim 3 or 4,
The standby server is
When a file transferred from the takeover server is received, the received file is written into a memory including a file system, and the management target of the standby server in the storage area of the shared storage device connected to the standby server And a storage area where the received file is copied is managed as a file access destination for the failed server.
The computer system according to claim 5,
The standby server is
In the file system of the memory, if there is a file that is specified by the read request and is not a copy target, the file that is not a copy target is transferred from the memory to the standby A computer system that is copied to a storage area to be managed by a server.
A server having a plurality of clusters including a plurality of servers connected to a network and a shared storage device connected to each of the servers and shared by each of the servers, and belonging to at least one of the plurality of clusters Is a file management method in a computer system configured as a standby server,
At least the standby server includes a step of executing a failover process for taking over the process of the failed server when a failure of the server belonging to any one of the clusters occurs.
The file management method according to claim 7, wherein
Executing the failover process on the condition that the standby server exists as a standby server waiting for processing in a cluster to which the failure server belongs when a failure of a server belonging to any of the clusters occurs;
If the standby server has a failure in a server belonging to any of the clusters, and there is no standby server waiting for processing in the cluster to which the failed server belongs, any server in the cluster to which the failed server belongs A file management method comprising: performing a failover process in cooperation with a takeover server as a cooperation destination of the standby server.
The file management method according to claim 8, wherein
When the takeover server does not have the standby server waiting for processing in the cluster to which the failed server belongs, access information for the storage area managed by the failed server in the storage area of the shared storage device is obtained. Publishing on the network;
On the condition that the takeover server has received a read request including access information published on the network from the standby server via the network, the file specified in the read request is stored in the failure server. Reading from the storage area that was the management target, and transferring the read file to the standby server via the network;
When the standby server receives the access information published on the network, the standby server generates a read request including the received access information, and transfers the generated read request to the takeover server via the network. Steps,
Managing the received file as a file to be subject to the failover process on the condition that the standby server has received the file transferred from the takeover server in response to the read request; A file management method comprising:
The file management method according to claim 8, wherein
When the takeover server does not have the standby server waiting for processing in the cluster to which the failed server belongs, the storage area that is the management target of the failed server among the storage areas of the shared storage device is newly set. Manage as a takeover storage area to be managed, generate access information indicating that the takeover server is a management source of the failed server, and send the generated access information to the standby via the network Transferring to the server;
The takeover server reads and reads the file specified in the read request from the takeover storage area on condition that the read request including the access information is received from the standby server via the network. Transferring the file to the standby server via the network;
When the standby server receives the access information transferred from the takeover server, the standby server generates a read request including the received access information and transfers the generated read request to the takeover server via the network. And steps to
Managing the received file as a file to be subject to the failover process on the condition that the standby server has received the file transferred from the takeover server in response to the read request; A file management method comprising:
The file management method according to claim 9 or 10, wherein
When the standby server receives the file transferred from the takeover server, the received file is written to a memory including a file system, and the storage area of the shared storage device connected to the standby server A file management method comprising: copying to a storage area to be managed by a standby server, and managing the storage area to which the received file is copied as a file access destination for the failed server.
The file management method according to claim 11, comprising:
When the standby server includes a file that is specified in the read request and is not a copy target in the file system of the memory, the file that is not a copy target is A file management method comprising: copying from the memory to a storage area to be managed by the standby server.