WO2013094007A1 - 負荷分散システム - Google Patents
負荷分散システム Download PDFInfo
- Publication number
- WO2013094007A1 WO2013094007A1 PCT/JP2011/079425 JP2011079425W WO2013094007A1 WO 2013094007 A1 WO2013094007 A1 WO 2013094007A1 JP 2011079425 W JP2011079425 W JP 2011079425W WO 2013094007 A1 WO2013094007 A1 WO 2013094007A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- processing
- server
- threshold
- request
- access
- Prior art date
Links
- 238000012545 processing Methods 0.000 claims abstract description 491
- 230000004044 response Effects 0.000 claims abstract description 106
- 238000000034 method Methods 0.000 claims description 64
- 230000003247 decreasing effect Effects 0.000 claims description 8
- 230000005540 biological transmission Effects 0.000 claims description 4
- 230000007423 decrease Effects 0.000 claims description 3
- 230000008569 process Effects 0.000 description 54
- 238000010586 diagram Methods 0.000 description 26
- 230000005856 abnormality Effects 0.000 description 15
- 238000012544 monitoring process Methods 0.000 description 12
- 238000004891 communication Methods 0.000 description 11
- 230000007704 transition Effects 0.000 description 11
- 238000007796 conventional method Methods 0.000 description 5
- 230000008859 change Effects 0.000 description 4
- 230000007246 mechanism Effects 0.000 description 4
- 238000012360 testing method Methods 0.000 description 4
- 238000012546 transfer Methods 0.000 description 3
- 238000012790 confirmation Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 239000000835 fiber Substances 0.000 description 2
- 230000002159 abnormal effect Effects 0.000 description 1
- 230000001174 ascending effect Effects 0.000 description 1
- 239000012141 concentrate Substances 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 210000000707 wrist Anatomy 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1001—Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
- H04L67/1004—Server selection for load balancing
- H04L67/1008—Server selection for load balancing based on parameters of servers, e.g. available memory or workload
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/061—Improving I/O performance
- G06F3/0611—Improving I/O performance in relation to response time
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0629—Configuration or reconfiguration of storage systems
- G06F3/0635—Configuration or reconfiguration of storage systems by changing the path, e.g. traffic rerouting, path reconfiguration
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/067—Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1097—Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2206/00—Indexing scheme related to dedicated interfaces for computers
- G06F2206/10—Indexing scheme related to storage interfaces for computers, indexing schema related to group G06F3/06
- G06F2206/1012—Load balancing
Definitions
- the following embodiment relates to a load balancing system.
- a cloud service In such a cloud service, a large number of servers are provided, and a processing request from a user is allocated to each server. In that case, it is necessary to perform load distribution so that processing is not concentrated on one server.
- the load on the server is greatly influenced by the processing speed of the storage device until the server issues an I / O processing request to the storage device and processes it. If the storage device performs processing quickly, processing of the server that sent the I / O processing request can be processed quickly, and the load is reduced. Therefore, the load on the server is determined by the response time of the storage device.
- FIG. 1 is a diagram for explaining a conventional technique for performing load distribution.
- the system shown in FIG. 1 includes access servers 10-1 and 10-2 that accept processing requests from users, and a load balancer 12. Also provided are I / O processing servers 14-1 to 14-3 for processing access requests from the access servers 10-1 and 10-2, and storage devices 16-1 to 16-4 for writing and reading data. These are connected by the networks 11, 13, and 15. As for the connection between the I / O processing servers 14-1 to 14-3 and the storage devices 16-1 to 16-4, a plurality of storage devices are connected to one I / O processing server via the network 15. Or a plurality of I / O processing servers are connected to one storage device.
- the access server determines whether to transfer the I / O processing request to the desired storage device via the I / O processing server. Any I / O processing server is set to be able to transfer an I / O processing request to any storage device.
- the load of the I / O processing servers 14-1 to 14-3 that perform actual processing is leveled using the load balancer 12.
- FIG. 1 there are a plurality of access servers 10-1 and 10-2 that perform I / O processing requests (write and read requests to the storage devices 16-1 to 16-4).
- I / O processing requests write and read requests to the storage devices 16-1 to 16-4.
- An example is shown in which the loads of the O processing servers 14-1 to 14-3 are distributed.
- the load on the I / O processing server is large.
- the load here is described in the meaning of the load balance as a whole on the system side composed of the I / O processing server and the storage apparatus as seen from the access server. If the access server wants to access certain data, it does not make sense unless it can access the storage device that holds the data. Therefore, when access concentrates on a certain storage device, the load cannot be distributed.
- the load on the I / O processing server may be distributed in the sense of distributing the load on the system composed of the I / O processing server and the storage device as viewed from the access server.
- the plurality of I / O processing servers and the plurality of storage apparatuses are connected to each other, and the same storage apparatus can be accessed from different I / O processing servers.
- the load on the I / O processing server can be considered to indicate the response time from when the I / O processing request is transmitted to the storage apparatus until the processing is completed. Therefore, when a response time of a certain I / O processing server is large, an I / O processing request is sent to an I / O processing server having a small response time. As a result, even if the same storage apparatus is accessed, it is possible to distribute the load on the system including the I / O processing server and the storage apparatus.
- the multipath method is used for taking over the processing of the I / O processing request when an abnormality occurs.
- FIG. 2 is a configuration diagram when the conventional multipath method is used.
- the access server 10 a plurality of paths (access paths (1) and (2)) to the I / O processing server 14-1 and the I / O processing server 14-2 are defined in advance. Assume that an abnormality occurs in the I / O processing server 14-1 while the I / O processing servers 14-1 and 14-2 are processing the I / O processing request. At that time, the access server 10 continues processing by switching the access path from the access path (1) to (2).
- This method has the following problems.
- a situation in which an abnormality has occurred in one of the I / O processing servers is considered.
- the load on a specific I / O processing server may increase after path switching.
- Conventional techniques include a server, a storage management server that manages information such as an application running on the server, a storage device, and an access path, and performs load distribution.
- a controller in the storage subsystem monitors the load status of each connection port and performs load distribution based on the monitoring contents.
- the service is stopped when an abnormality occurs in the load balancer in the configuration of FIG. 1 that can efficiently distribute the load of the I / O processing servers. Resulting in.
- the load between a large number of access servers is equalized in order to equalize the load on the large number of I / O processing servers. It is necessary to exchange adjustment information. In this case, there are problems such as a complicated system configuration and a weak security point.
- an object of the present invention is to achieve load distribution in a system having a plurality of access servers and a plurality of I / O processing servers.
- a load distribution system receives a plurality of storage devices that receive an input processing request or an output processing request and returns a processing result, and the input processing request or the output processing to one of the plurality of storage devices.
- the processing for the input processing request or the output processing request is performed.
- a plurality of I / O processing servers for sending an overload response indicating an overload state, and an input processing request or output processing request from the user And a plurality of access servers that transmit to an I / O processing server that is not in an overload state.
- FIG. 1 is an overall system configuration diagram including a load distribution system of an embodiment. It is a block diagram of an access server and an I / O processing server. It is a sequence diagram which shows the basic flow of the communication performed between an access server and an I / O processing server. It is a sequence diagram which shows the whole flow when abnormality generate
- FIG. (3) explaining the process of an access server.
- FIG. (4) explaining the process of an access server.
- FIG. 6 is a figure explaining the transition state of the list
- FIG. 6 is a diagram (part 1) illustrating processing of an I / O processing server.
- FIG. (2) explaining the process of an I / O processing server.
- FIG. 6 is a diagram (part 3) for explaining the processing of the I / O processing server;
- It which shows the example of the table which an access server manages, and its transition.
- the following embodiment is applied to a system having a large number of servers and storage units such as a cloud system. And, providing a system with reliability that enables service to be continued even if a failure occurs in any of the devices during load leveling, and service leveling by leveling the load of the entire system .
- the information on the I / O processing request sent from the access server to the I / O processing server and the response information on which the I / O processing server returns a response to the access server are expanded.
- a request from the access server is transmitted as a command that is an extension of the SCSI command.
- a response from the I / O processing server to the access server is also realized by extending the response information of the SCSI command.
- the I / O processing server is connected as a FibreChannel target, the request from the access server and the response from the I / O processing server to the access server are realized as an extension of the Fiber Channel command. Therefore, the iSCSI case is described below as an example.
- the target of load distribution is the entire system including the I / O processing server and the storage device as viewed from the access server.
- the access server wants to access certain data, it needs to access the storage device in which the data is stored. In this case, even if the storage device has a high load and is difficult to access, it is not possible to access a storage device that does not have the data to be accessed.
- the storage apparatus and the I / O processing server are connected to each other in a plurality of pairs, it is possible to switch the I / O processing server that can access the desired storage apparatus to another one. Thereby, although the load on the storage apparatus cannot be distributed, the load on the I / O processing server that transmits the I / O processing request to the storage apparatus can be distributed.
- the load on the I / O processing server By distributing the load on the I / O processing server, the load on the system including the I / O processing server and the storage device as seen from the access server can be made uniform.
- the load on the I / O processing server is a response time from when an I / O processing request is issued to the storage apparatus until this is completed.
- the load can be distributed by switching to the O processing server.
- the load distribution method is as follows.
- the I / O processing server responds to the I / O processing request (request) from the access server to the effect that it is overloaded. Receiving this, the access server distributes the I / O processing request to another I / O processing server. Which I / O processing server is connected to which storage device at the start of system operation is determined when the system is started up. This is to prevent concentration of access to some I / O processing servers at the start of operation.
- the access server distributes itself to the load status of the I / O processing server, so communication and adjustment between the access servers are not required even when there are a plurality of access servers.
- the I / O processing server responds until the response from the storage apparatus is returned to the access server after the I / O processing server receives the I / O processing request from the access server (processing in the local server). Whether the local server is overloaded or not. When the load is overloaded, the I / O processing server notifies the access server that the load is overloaded. When the access server receives notification from the I / O processing server that an overload has occurred, the access server transmits an I / O processing request to another I / O processing server.
- the access server transmits a load information response command to all the I / O processing servers. .
- the I / O processing server with the lowest load is requested to process an I / O processing request that has no response or an error.
- load distribution when the access server receives a response indicating an overload from the I / O processing server, a load information response command is transmitted from the access server to all the I / O processing servers. Then, by requesting the I / O processing to the I / O processing server having a lower load based on the result, load distribution can be performed more efficiently.
- the performance of the entire system can be improved by efficiently distributing the load of I / O processing requests in the cloud system.
- the use of a large number of servers increases the possibility that one of the I / O processing servers will fail, but even in that case, the service provided by the cloud system can be continued.
- the access server can detect the abnormality as no response to the I / O processing request or an error. At this time, the access server issues an I / O processing request to another I / O processing server.
- FIG. 3 is an overall system configuration diagram including the load distribution system of the present embodiment.
- the entire system includes a plurality of access servers 20-1 to 20-n (a physical server for providing a cloud service. A large number of virtual machines are operated on the access server). Further, it is composed of a plurality of I / O processing servers 21-1 to 21-m (servers that process I / O processing requests issued by access servers) and a plurality of storage devices 22-1 to 22-N. These devices are connected by networks 23 and 24.
- the access servers 20-1 to 20-n do not have a management communication path from the viewpoint of ensuring security, and do not perform communication for load adjustment of the I / O processing server.
- the I / O processing servers 21-1 to 21-m do not have a management communication path from the viewpoint of ensuring security and do not perform communication for mutual load adjustment.
- FIG. 4 is a block diagram of the access server and the I / O processing server.
- the access server and the I / O processing server can be represented by the same block diagram.
- the I / O accepting unit 30 accepts the transmitted I / O processing request (request).
- the I / O accepting unit 30 accepts an I / O processing request issued from a user application.
- the I / O accepting unit 30 accepts an I / O processing request issued from the access server.
- the I / O time monitoring unit 32 monitors the time from receiving an I / O processing request to receiving a response to the response and returning the response to the access server, and if necessary, the management table of the management table storage unit 33 Update.
- the I / O time monitoring unit 32 includes counters 35-1 to 35-L corresponding to the number L of I / O processing servers. When an overload response to the response is not returned, the counter counts up and is managed. Update the value in the counter value holding part of the table.
- This L is the number m of I / O processing servers when the apparatus of FIG. 4 is an access server, and L is the number of storage apparatuses when the apparatus of FIG. 4 is an I / O processing server. is there.
- the management table will be described later.
- the management table monitoring unit 34 monitors the threshold level registered in the management table stored in the management table storage unit 33.
- the I / O issuing unit 31 transfers the I / O processing request received by the I / O receiving unit 30 after registering the reception time. If the apparatus in FIG. 4 is an access server, the I / O issuing unit 31 transmits an I / O processing request to the I / O processing server. If the apparatus in FIG. 4 is an I / O processing server, the I / O issuing unit 31 transmits an I / O processing request to the storage apparatus.
- FIG. 5 is a sequence diagram showing a basic flow of communication performed between the access server and the I / O processing server.
- An I / O processing request is issued from the user application, and the I / O processing request is notified to the storage apparatus via the access server and the I / O processing server (1).
- the I / O processing server (1) receives the response of the I / O processing request, and accesses the response from the storage apparatus after the I / O processing server receives the I / O processing request from the access server. Measure the response time to return to the server. If this response time does not exceed the threshold currently set for the server itself, a response indicating that the response is normal is returned from the I / O processing server to the access server and user application.
- this response time exceeds the threshold currently set for the server itself, it is determined that an overload has occurred. Since the response time includes the time from when the I / O processing request is issued until the processing is completed, the response time of the storage device and the processing time of the I / O processing device are included. Therefore, if this response time exceeds the threshold value, it is considered that an overload has occurred in either the storage device, the I / O processing device, or both.
- the I / O processing server (1) detects an overload for an I / O processing request to the accessed storage device
- the I / O processing server responds to the access server that made the I / O processing request. Is returned and an overload condition is notified.
- the access server receives the overload information from the I / O processing server
- the access server distributes the I / O processing request to another I / O processing server when issuing a subsequent I / O processing request.
- the distribution of the I / O processing request to this other I / O processing server is called reselection processing in FIG. The reselection process will be described later with reference to FIG.
- the access server refers to the management tables of FIG. 10 and FIG. 11 described later.
- the access server determines that further distribution is necessary, and further This I / O processing request is executed to another I / O processing server.
- the size of the I / O processing request is the amount of data read from the storage device or the amount of data written to the storage device specified by the I / O processing request.
- FIG. 6 is a sequence diagram illustrating an overall flow when an abnormality occurs in the I / O processing server and the I / O processing server is switched.
- an I / O processing request issued by the access server becomes an error.
- the access server can detect that the I / O processing server has become abnormal.
- the access server executes reselection processing and re-executes the I / O processing request to another I / O processing server. As a result, even when an abnormality occurs in the I / O processing server, the service can be continued.
- FIG. 7 is a flowchart of the processing of the access server when an abnormality occurs in the I / O processing server.
- the management table of the I / O processing server is referred to and the I / O size and I / O frequency are within the threshold.
- the I / O processing server having the smallest threshold level, which will be described later, that satisfies the conditions is selected as the reissue destination server for the I / O processing request. This makes it possible to issue an I / O processing request to an I / O processing server that can determine that the load is lower.
- the I / O processing server in which an abnormality has occurred is restored, the I / O processing server is set again as an I / O issue target for the access server. As a result, the restored I / O processing server becomes the target of load distribution again.
- the access server manages the size threshold of the I / O processing request that can be issued for each I / O processing server and the threshold of the I / O frequency.
- an I / O processing request is issued, an I / O processing request having a size larger than this threshold (or an I / O processing request having a frequency exceeding the threshold) is the I / O that has been requested so far.
- An I / O processing request is made to an I / O processing server different from the processing server. If there is an overload response from the I / O processing server, the threshold value is gradually reduced (if there is no overload response, the threshold size is gradually increased). This reduces the processing load on the I / O processing server.
- FIG. 8 is a management table.
- the management table is a collection of tables in which a plurality of threshold definition tables are arranged as a list.
- the threshold definition table of the management table is provided for each I / O processing server.
- Each threshold definition table holds the threshold level, the I / O processing request size threshold, the I / O processing request frequency threshold, and the threshold counter definition value in association with each other.
- Each threshold definition table for each I / O processing server has a counter value holding unit and a level value setting unit.
- the level value setting unit specifies which threshold level the I / O processing server is currently at among the threshold levels in the threshold definition table.
- the threshold counter definition value changes the threshold level set in the level value setting unit when the value holding unit of the counter reaches that value.
- the counter 35 shown in FIG. 4 increases the count value by 1 when there is no overload response.
- the value to be counted is held by the value holding unit of the counter, and the value of the value holding unit of the counter is changed each time the counter counts up.
- the counter value holding unit holds a value that the counter counts up when no overload condition exists.
- the threshold level of the level value setting unit is changed, the counter 35 is reset, and the counter value of the counter value holding unit is initialized to zero.
- FIG. 9 shows in detail the threshold definition table for one I / O processing server among the m tables of FIG.
- the I / O size is a value that becomes smaller as the threshold level increases from the unlimited threshold level 1.
- This I / O size is a threshold value for the size of an I / O processing request.
- the I / O frequency is a value that decreases from the threshold level 1 of 100 times / sec as the threshold level increases.
- This I / O frequency is a threshold value of the frequency of I / O processing requests.
- the threshold counter definition value defines how many times the counter value changes the threshold level.
- the system administrator sets the threshold level by estimating the amount of load applied to the I / O processing server against the machine power of the I / O processing server for each threshold setting value label called the threshold level. To do. When there is an overload response, each threshold value is changed by raising or lowering the threshold level. When the response to the I / O processing request is not an overload response, the counter is counted up. When this counter exceeds the threshold counter definition value, the threshold level is lowered.
- FIG. 10 is a flowchart showing the processing of the access server.
- the processing in FIG. 10 is executed every time a new I / O processing request is generated.
- the access server in step S10, for example, at the start of operation, the access server refers to the management table (described later) of the I / O processing server connected to the storage apparatus to be accessed, and performs the I / O processing.
- Each threshold definition table included in the management table has a level value setting unit that holds the current threshold level of the I / O processing server, and checks this value.
- step S11 it is determined whether or not the size of the I / O processing request is less than a threshold corresponding to the threshold level.
- step S11 determines whether the frequency of I / O processing requests is less than a threshold corresponding to the threshold level.
- the frequency is, for example, the number of I / O processing requests issued per second.
- the access server obtains the frequency by counting the number of I / O processing requests issued to the I / O processing server per second. If the determination in step S12 is No, the process proceeds to step S13. If the determination in step S12 is Yes, the process proceeds to step S14.
- step S13 the I / O processing server to which the I / O processing request is issued is reselected according to FIG. 16 described later, and the process proceeds to step S14.
- step S14 an I / O processing request is issued.
- step S9 a response to the issued I / O processing request is received, a response is returned to the user application, and the I / O processing is completed.
- step S15 it is determined whether or not the response time of the response to the issued I / O processing request indicates an overload (whether there is an overload response from the I / O processing server). If the determination in step S15 is yes, the process proceeds to step S16. If the determination in step S15 is No, the process proceeds to step S22.
- step S16 the value of the threshold level is increased by 1.
- step S24 the counter 35 and the counter value holding unit of the management table are initialized.
- step S17 the I / O processing server is sorted (described later). And finishes the process.
- step S22 it is determined whether or not the threshold level set in the level value setting unit of the threshold definition table of the I / O processing server currently being processed is 1. This determination includes a process of lowering the threshold level by 1 in the subsequent step S20. However, when the threshold level is 1, it cannot be lowered any further, so that the counter is not counted up. If the determination in step S22 is No, the process proceeds to step S18, and if the determination in step S22 is Yes, the process proceeds to step S19.
- step S18 the value of the counter value holding unit of the threshold value definition table is increased by 1 as the counter 35 counts up.
- step S19 it is determined whether or not the counter exceeds the defined value. If the determination in step S19 is No, the process ends. If the determination in step S19 is yes, the process proceeds to step S20.
- step S20 the threshold level is decreased by 1.
- step S23 the counter 35 is reset, the counter value holding unit of the management table is initialized, and in step S21, sorting of I / O processing servers (described later) is performed. And finishes the process.
- the I / O processing server to which the I / O processing request is transmitted after the system operation is started corresponds to the threshold definition table at the top of the management table list.
- an I / O processing request is transmitted to another I / O processing server.
- the sorting of the I / O processing server is to change the order of the array of threshold definition tables in the management table having a plurality of threshold definition tables.
- FIG. 11 is a flowchart of the sorting process of the I / O processing server.
- the threshold definition tables are arranged in the order of the size of the threshold level held in the level value setting unit of each threshold definition table, and a list is generated for the I / O processing servers.
- a set of tables in which the threshold definition tables are listed is defined as a management table.
- step S25 the threshold level resulting from the change in step S16 or S20 of FIG. 10 is set to a variable, for example, L.
- step S26 the threshold definition table of the I / O processing server whose threshold level has been changed is removed from the list of management tables.
- removing from the list is one process by inserting into the list in step S28 described later. That is, the data of the threshold definition table whose threshold level has been changed is read from the list, the data of the table in the list is deleted, and the data is inserted at the location of the list array that matches the order of the threshold levels.
- the loop of step S27 loops by the number of I / O processing servers.
- step S28 it is determined whether or not the threshold level is greater than L. If the determination in step S28 is No, the loop is continued. If the determination is Yes, the management table is inserted into the list of management tables and the process ends.
- the top of the management table list is the I / O processing server with the lowest load.
- FIG. 12 is a diagram for explaining the transition state of the list of the management table in the sort process.
- FIG. 12A shows a state after the overload response is generated in the process corresponding to step S17 in FIG.
- a threshold definition table for a plurality of I / O processing servers is arranged.
- the threshold value definition table of the I / O processing server (1) the value of the level value setting unit is 1.
- the threshold definition table of the I / O processing server (4) has come, and the value (after change) of the level value setting unit is 3.
- the threshold value definition table of the I / O processing server (2) is the value of the level value setting unit is 2.
- the threshold definition table of the I / O processing server (3) has come, and the value of the level value setting unit is 4.
- the threshold definition table of the I / O processing server (4) has come on the threshold definition table of the I / O processing server (2), so it is necessary to replace it.
- the state of FIG. 12 (2) shows a state where the threshold definition table of the I / O processing server (4) whose position is wrong is removed from the list. Then, in the state after sorting in FIG. 12 (3), the threshold value definition table of the I / O processing server (4) is inserted next to the I / O processing server (2).
- the I / O processing server manages a processing time from reception of an I / O processing request to completion of the I / O processing.
- the I / O processing server returns an overload response to the access server. That is, an overload response is returned to the access server when the response time of the storage device (including the processing time in the local server) exceeds a threshold value.
- a response time that matches the response time of the storage device is set. If the processing time exceeds the threshold, the response time threshold is gradually increased (if the processing time does not exceed the threshold, it is gradually shortened). This avoids frequent overload responses.
- the I / O processing server (2) when the I / O processing server (1) is overloaded and the other I / O processing server (I / O processing server (2)) is also overloaded, the I / O processing server (2) The I / O processing request may be redirected to the I / O processing server (1). At this time, if the I / O processing server (1) is still overloaded, all the two are all. The I / O processing server responds with an overload. When such a situation continues, an overload response is made to all I / O processing requests, and the exchange between the access server and the I / O processing server increases. In order to avoid such a situation, the threshold value is gradually changed.
- the I / O processing server holds a threshold definition table for each storage device to be accessed.
- the management table is a collection of tables in which a plurality of threshold definition tables are arranged as a list. Each threshold definition table holds a threshold level, a response time threshold, and a threshold counter definition value in association with each other.
- the threshold definition table for each storage device has a counter value holding unit and a level value setting unit.
- the level value setting unit specifies which threshold level the storage apparatus is currently at among the threshold levels in the threshold definition table.
- the counter value holding unit holds a value counted by the counter when no overload condition exists. When the value of the counter value holding unit becomes the threshold counter definition value, the threshold level of the level value setting unit is changed, the counter 35 is reset, and the counter value of the counter value holding unit is initialized to 0.
- FIG. 14 is a diagram showing an example of the contents of the management table for each storage device in FIG.
- a value of 1 to j is set as the threshold level.
- the threshold value of the response time is 10 msec / KB at the threshold level 1, and increases as the threshold level increases.
- the threshold counter definition value sets the counter value when changing the threshold level.
- FIG. 15 is a processing flow of the I / O processing server.
- the I / O processing server accepts an I / O processing request from the access server in step S30
- the I / O processing server issues an I / O processing request to the storage apparatus in step S31.
- step S32 a response to the effect that the I / O processing request is completed is received from the storage apparatus, and a response is returned to the access server to complete basic I / O processing.
- the I / O processing request is accepted. Calculate response time from time and completion time.
- the management table (FIGS. 13 and 14) is referred to, and the threshold level set in the level value setting unit of the threshold definition table is acquired.
- step S35 it is determined whether or not the response time exceeds the acquired response time threshold. If the determination in step S35 is Yes, the process proceeds to step S40. If the determination in step S35 is No, the process proceeds to step S41.
- step S40 an overload response is sent to the access server.
- step S36 the threshold level is increased by 1.
- step S43 the counter 35 is reset and the counter value in the management table is held. The value of the part is initialized, and the process ends.
- step S41 it is determined whether or not the threshold level of the level value setting unit of the threshold definition table is 1. This determination includes a process of lowering the threshold level by 1 in the subsequent step S39, but when the threshold level is 1, it cannot be lowered any further, so that the counter is not counted up. If the determination in step S41 is No, the process proceeds to step S37. If the determination is Yes, the process proceeds to step S38.
- step S37 the counter 35 is counted up, and the value of the counter value holding part of the management table is incremented by 1.
- step S38 it is determined whether or not the counter exceeds the defined value. If the determination in step S38 is No, the process ends. If yes, the process proceeds to step S39. In step S39, the threshold level is decreased by 1. In step S42, the counter 35 is reset, the value of the counter value holding unit of the management table is initialized, and the process is terminated.
- FIG. 16 is a flowchart of processing when the access server reselects the I / O processing server (step S40 in FIG. 7 and step S13 in FIG. 10).
- the access server refers to the threshold value in the management table, and selects the I / O processing server having the largest threshold value (considered to have a low load) among them. Re-select as the destination of the / O processing request.
- step S45 the distribution destination is the I / O processing server at the top of the list obtained by the sorting process.
- the management table of the access server is scanned. That is, the process is repeated until there is no threshold definition table for all the I / O processing servers stored in the management table list.
- step S47 it is determined whether or not the threshold value is larger than the distribution destination value. If the determination in step S47 is No, the loop of step S46 is continued, and if the determination is Yes, the process proceeds to step S48.
- step S48 the I / O processing server selected in step S47 is assigned to the distribution destination, and the process ends.
- the access server sends a load information confirmation command to the I / O processing server, and the I / O processing server processes it on its own server.
- FIG. 17 is a diagram illustrating an example of a management table managed by the access server and its transition.
- FIG. 17 is extracted from the management tables of FIGS. 8 and 9 in order to make it easy to see only the threshold level value set for each I / O processing server.
- the threshold level is changed in step S16 of FIG. 10, so that the threshold definition table of the I / O processing server (1) Is incremented by 1, and transitions from state 1 to state 2.
- the threshold level is decreased by 1, and the threshold definition table transitions from state 2 to state 1. That is, the threshold value is decreased when the response time of a certain number of I / O processing requests does not indicate an overload state. This prevents the threshold value from being reduced when the load state is not stable.
- FIG. 18 is a diagram illustrating an example of a table managed by the I / O processing server and its transition.
- FIG. 18 is extracted from the management tables of FIGS. 13 and 14 to make it easy to see only the threshold level value set for each storage device.
- the threshold level is increased in step S36 of FIG. 15, so the threshold management table of the storage apparatus (2) changes from state 1 to state 2. Transition. If the response to the I / O processing request does not exceed the threshold, the number of times that the response does not exceed is counted. When the count value exceeds the threshold counter definition value, the threshold level is decreased by 1, and the threshold definition table transitions from state 2 to state 1. That is, the threshold value is decreased when the response time for a certain number of I / O processing requests indicates that the load is not overloaded. This prevents the threshold value from being reduced when the load state is not stable.
- the I / O processing server When reselecting an I / O processing server, if reselection is performed by checking load information from the access server to the I / O processing server, the I / O processing server has the table shown in FIG. In addition, a counter for managing the number of I / O processing requests being processed by the server is prepared. Thus, the number of I / O processing requests can be responded to the confirmation of the load information from the access server.
- FIG. 19 is a diagram showing a mechanism for controlling the management table of the access server and the I / O processing server.
- the request source of the I / O processing request in FIG. 19 is an application, and the issue destination of the I / O processing request is the I / O processing server.
- the request source of the I / O processing request in FIG. 19 is the access server, and the issue destination of the I / O processing request is the storage device.
- the management table stored in the management table storage unit 33 in FIG. 19 is the same as that in FIGS. 8 and 9 for the access server, and the same as that in FIGS. 13 and 14 for the I / O processing server.
- the threshold values of the I / O size and I / O frequency change from state 1 to state 2 in FIG. 17 (in FIG. 18 for the I / O processing server) in the access server. It is a table that defines what value to transition to. This table is defined in advance by the system administrator.
- the management table monitoring unit 34 in FIG. 19 is a function for keeping the management table of the access server and the I / O processing server as up-to-date as possible. For example, when an overload state is notified in a response to an I / O processing request issued by the access server to the I / O processing server, the access server changes the threshold value of the management table. At this time, if the threshold value is lowered to the lower limit (when the threshold level is maximum), the access server will not issue an I / O processing request to the I / O processing server thereafter. It is assumed that the processing amount of the I / O processing server decreases after a while from this state.
- the management table monitoring unit 34 periodically issues a test I / O processing request (dummy I / O processing request) and checks the load state of the I / O issue destination.
- FIG. 20 is a processing flow of the management table monitoring unit.
- the management table monitoring unit tests the I / O issue destination (I / O processing server for an access server, storage device for an I / O processing server) for which an I / O processing request has not been issued for a certain period of time.
- An I / O processing request (dummy I / O processing request) is issued.
- this I / O processing request is processed normally (in the case of an access server, when there is no overload response to the I / O processing request, in the case of an I / O processing server, the I / O within the response time)
- the threshold value of the management table is changed. In this case, the threshold value is changed according to FIG. 10 for the access server and FIG. 15 for the I / O processing server.
- step S50 the management table monitoring unit determines whether an I / O processing request has been issued to the overloaded I / O processing server within a predetermined time. If the determination in step S50 is yes, the process ends. If the determination in step S50 is No, a test I / O processing request is issued in step S51. In step S52, it is determined whether or not the response to the test I / O processing request is normal. If the determination in step S52 is No, the process ends. If the determination in step S52 is yes, the threshold is changed and the process ends.
- FIG. 21 is a diagram illustrating an example of a response data format when an I / O processing request is realized by a SCSI command.
- code number data defined as a value that can be freely defined by the user is set as a response indicating an overload state. For example, “9” is set in Sense Key, and “0x80” is set in Additional sense code. Since this code number can be freely set and used by the vendor, it is defined as indicating a response indicating an overload state of the present embodiment.
- the code number defined as vendor specific is used as a response indicating an overload condition, as in SCSI, except that an optical fiber is used as the path for transmitting the SCSI command. To do.
- FIG. 22 is a sequence diagram of load inquiry processing to the I / O processing server by the access server.
- the access server (A) issues an inquiry command to the I / O processing servers (1) and (2).
- load information is put in the area of the command vendor specific as an inquiry response and a response is returned.
- FIG. 23 is a diagram showing a format of a response to Inquiry, which is a load information inquiry command.
- FIG. 23 shows a general response format for the inquiry command. In this format, there is a field designated as “vender specific”.
- the I / O processing server sets the number of I / O processing requests that have already been received when the inquiry is received and before the processing is completed, in the 36th to 55th byte areas designated as vendor specific. And return it to the access server.
- FIG. 24 is a flowchart showing the processing of the access server when inquiring about the load state.
- the process is repeated for the number of I / O processing servers.
- step S56 an inquiry is issued.
- step S57 an inquiry response is accepted.
- step S58 the number of I / O processing requests per unit time received by the response and received by the I / O processing server is set in a variable, for example, n.
- step S59 the threshold definition table of the I / O processing server currently being processed in the management table held by the access server is referenced.
- the threshold definition table is the same as that in FIGS. 8 and 9 and uses the value of the I / O frequency as shown in FIG. 9 indicated by the threshold level held by the level value setting unit as the threshold.
- step S60 determines whether the determination at step S60 is No. If the determination at step S60 is No, the loop at step S55 is repeated. If the determination in step S60 is yes, in step S61, the value of the threshold level in the level value setting unit of the threshold definition table is lowered so that the value of n is less than or equal to the threshold of the I / O frequency. In step S62, the I / O processing servers are sorted, and then the loop of step S55 is repeated. When the loop of step S55 finishes processing for all the I / O processing servers, the processing ends.
- FIG. 25 is a diagram illustrating the hardware configuration of the access server and the I / O processing server when the processing of this embodiment is realized by a program.
- the access server and the I / O processing server are realized as a computer 39 including a CPU 40.
- the CPU 40 is connected to the ROM 41, the RAM 42, the communication interface 43, the storage device 46, the medium reading device 47, and the input / output device 49 via the bus 50.
- the CPU 40 reads and executes a basic program such as BIOS stored in the ROM 41 to realize the basic operation of the computer 39.
- the CPU 40 develops and executes the program for performing the processing of the present embodiment stored in the storage device 46 such as a hard disk on the RAM 42, thereby realizing the processing of the present embodiment.
- the program for performing the processing of this embodiment is not necessarily stored in the storage device 46, but is stored in a portable recording medium 48 such as a CD-ROM, DVD, Blu-ray, IC memory, or flexible disk. Also good.
- the medium reading device 47 is used to read a program stored in the portable recording medium 48, and the program is loaded into the RAM 42 and executed by the CPU 40.
- the input / output device 49 is used by a user operating the computer 39 such as a keyboard, a tablet, a mouse, a display, and a printer to input data and output processing results.
- the communication interface 43 accesses a database or the like of the information provider 45 via the network 44 and downloads a program or the like to the computer 39.
- the downloaded program is stored in the storage device 46 or the portable recording medium 48, or directly expanded in the RAM 42 and executed by the CPU 40.
- the program may be executed by a computer of the information provider 45, and the computer 39 may perform only input / output operations.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Computer Hardware Design (AREA)
- Computer And Data Communications (AREA)
Abstract
Description
図1のシステムは、ユーザからの処理要求を受け付けるアクセスサーバ10-1、10-2、負荷分散装置12を備える。また、アクセスサーバ10-1、10-2からのアクセス要求を処理するI/O処理サーバ14-1~14-3、データの書き込み、読み出しを行なうストレージ装置16-1~16-4を備える。そして、これらは、ネットワーク11、13、15によって接続される。I/O処理サーバ14-1~14-3とストレージ装置16-1~16-4との間の接続は、ネットワーク15を介して、I/O処理サーバ1つに複数のストレージ装置が接続されたり、1つのストレージ装置に複数のI/O処理サーバが接続されたりする。アクセスサーバは、ストレージ装置にI/O処理要求を転送する場合、I/O処理サーバを介して所望のストレージ装置にI/O処理要求を転送するか決定する。どのI/O処理サーバもいずれのストレージ装置にI/O処理要求を転送できるように設定される。
・負荷分散装置12の高負荷
アクセスサーバ10-1、10-2の数が増えた際に、負荷分散装置12の負荷が高くなり、負荷分散装置12の処理でボトルネックが発生する可能性がある。
・負荷分散装置12の異常
負荷分散装置12に異常が発生し、負荷分散装置12がダウンした場合には、全てのI/O処理要求に対する処理が止まることになる。
アクセスサーバ10では、I/O処理サーバ14-1とI/O処理サーバ14-2への複数のパス(アクセスパス(1)、(2))を予め定義しておく。I/O処理サーバ14-1、14-2でI/O処理要求に対する処理を行っているときに、I/O処理サーバ14-1で異常が発生したとする。そのときには、アクセスサーバ10は、アクセスパスをアクセスパス(1)から(2)に切り替えることで処理を継続する。
アクセスサーバが複数あり、同じI/O処理サーバを使用している場合、いずれかのI/O処理サーバに異常が発生した状況を考える。このとき、個々のアクセスサーバがパスの切り替え処理を行うため、パス切り替え後に特定のI/O処理サーバの負荷が高くなる場合がある。
概略すると、I/O処理サーバが過負荷となった場合には、I/O処理サーバはアクセスサーバからのI/O処理要求(リクエスト)に対し、過負荷となった旨のレスポンスを行う。これを受けたアクセスサーバは、別のI/O処理サーバに対してI/O処理要求の振り分けを行う。システム運用開始時に、どのI/O処理サーバがどのストレージ装置に接続されるかは、システムの立ち上げ時に決定する。これは、運用開始時に一部のI/O処理サーバへのアクセス集中を防止するためである。
全体システムは、複数のアクセスサーバ20-1~20-n(クラウドサービスを提供するための物理サーバ。このアクセスサーバ上で多数の仮想マシンを動作させる)を備える。更に、複数のI/O処理サーバ21-1~21-m(アクセスサーバが発行したI/O処理要求を処理するサーバ)、複数のストレージ装置22-1~22-Nから構成される。そして、これらの装置は、ネットワーク23、24によって接続される。
アクセスサーバ及びI/O処理サーバは、同じブロック図で表すことが出来る。I/O受付部30は、送信されてきたI/O処理要求(リクエスト)を受け付ける。図4の装置がアクセスサーバの場合、I/O受付部30は、ユーザアプリケーションから発行されたI/O処理要求を受け付ける。図4の装置がI/O処理サーバの場合には、I/O受付部30は、アクセスサーバから発行されたI/O処理要求を受け付ける。
ユーザアプリケーションからI/O処理要求が発行され、アクセスサーバ、I/O処理サーバ(1)を介して、ストレージ装置にI/O処理要求が通知される。I/O処理サーバ(1)は、このI/O処理要求のレスポンスを受信し、I/O処理サーバがアクセスサーバからのI/O処理要求を受信してから、ストレージ装置からの応答をアクセスサーバに返すまでのレスポンス時間を計測する。このレスポンス時間が、現在自サーバに設定されている閾値を超えない場合は、I/O処理サーバから正常である旨のレスポンスがアクセスサーバ、ユーザアプリケーションに返送される。このレスポンス時間が、現在自サーバに設定されている閾値を超える場合には、過負荷が生じていると判断する。このレスポンス時間は、I/O処理要求が発行されてから処理が完了するまでの時間を含んでいるので、ストレージ装置の応答時間とI/O処理装置の処理時間が含まれる。したがって、このレスポンス時間が閾値を超えているという場合には、ストレージ装置、あるいは、I/O処理装置のいずれか、あるいは、その両方において過負荷が生じていると考えられる。
I/O処理サーバでエラーが発生した場合には、アクセスサーバが発行したI/O処理要求がエラーとなる。このとき、アクセスサーバはI/O処理サーバが異常となったことを検知できるものとする。アクセスサーバは、I/O処理サーバの異常を検知すると、再選択処理を実行して、別のI/O処理サーバに対してI/O処理要求を再実行する。このことで、I/O処理サーバで異常が発生した場合でもサービスの継続が可能になる。
ステップS40における、I/O処理要求発行先のI/O処理サーバの再選択の処理では、I/O処理サーバの管理テーブルを参照し、I/Oサイズ、I/O頻度が閾値内となる条件を満たした後述する閾値レベルのもっとも小さなI/O処理サーバをI/O処理要求の再発行先サーバとして選択する。このことで負荷がより低いと判断できるI/O処理サーバに対してI/O処理要求を発行できる。
アクセスサーバは、I/O処理サーバ毎に発行できるI/O処理要求のサイズの閾値とI/O頻度の閾値を管理している。I/O処理要求発行時には、この閾値よりサイズが大きなI/O処理要求(または閾値を超えた頻度のI/O処理要求)は、そのI/O処理をこれまで依頼していたI/O処理サーバとは別のI/O処理サーバに対してI/O処理要求を行う。I/O処理サーバから過負荷状態の応答があった場合は徐々に閾値を小さくする(過負荷状態の応答が無かった場合には閾値サイズを徐々に大きくする)。このことにより、I/O処理サーバの処理負荷を軽減する。
図8は、管理テーブルである。管理テーブルは、複数の閾値定義テーブルがリストとして配列されたテーブルの集まりである。管理テーブルの閾値定義テーブルは、I/O処理サーバごとに設けられる。そして、各閾値定義テーブルは、閾値レベルとI/O処理要求のサイズの閾値、I/O処理要求頻度の閾値、閾値カウンタ定義値とを対応付けて保持する。また、I/O処理サーバごとの閾値定義テーブルは、それぞれ、カウンタの値保持部とレベル値設定部を有する。レベル値設定部とは、I/O処理サーバが現在、閾値定義テーブルにある閾値レベルのうちのどの閾値レベルにあるかを指定するものである。閾値カウンタ定義値は、カウンタの値保持部がその値になったとき、レベル値設定部に設定される閾値レベルを変更するものである。図4のカウンタ35は、過負荷応答が無かった場合にカウント値を1つ増加するものである。カウンタはI/O処理サーバの数だけ設けられる。カウントする値は、カウンタの値保持部が保持しており、カウンタがカウントアップするごとに、カウンタの値保持部の値が変更される。カウンタの値保持部は、過負荷状態が存在しない場合にカウンタがカウントアップする値を保持する。カウンタの値保持部の値が閾値カウンタ定義値になると、レベル値設定部の閾値レベルを変更し、カウンタ35をリセットすると共にカウンタの値保持部のカウンタの値を0に初期化する。
図10の処理は、新たにI/O処理要求が発生するごとに実行するものである。
図10に従うと、ステップS10において、アクセスサーバは、例えば運用開始時には、アクセスすべきストレージ装置に対して接続されているI/O処理サーバの管理テーブル(後述)を参照し、当該I/O処理サーバの閾値レベルを確認する。管理テーブルに含まれる各閾値定義テーブルには、当該I/O処理サーバの現在の閾値レベルを保持するレベル値設定部が存在し、この値を確認する。ステップS11において、I/O処理要求のサイズが該閾値レベルに対応する閾値未満であるか否かを判断する。ステップS11の判断がNoの場合には、ステップS13に進む。ステップS11の判断がYesの場合には、ステップS12に進む。ステップS12では、I/O処理要求の頻度が該閾値レベルに対応する閾値未満であるか否かを判断する。頻度は、例えば、1秒間に発行するI/O処理要求の数である。アクセスサーバは、自分が1秒間に当該I/O処理サーバに発行するI/O処理要求の数を計数することによって頻度を取得する。ステップS12の判断がNoの場合には、ステップS13に進む。ステップS12の判断がYesの場合には、ステップS14に進む。
I/O処理サーバのソートにおいては、各閾値定義テーブルのレベル値設定部に保持される閾値レベルの大きさの順に閾値定義テーブルを配列し、I/O処理サーバに対しリストを生成し、複数の閾値定義テーブルがリストされたテーブルの集まりを管理テーブルとする。
図12(1)は、図10のステップS17に相当する処理で、過負荷応答発生後の状態を示す。複数のI/O処理サーバの閾値定義テーブルが配列されている。I/O処理サーバ(1)の閾値定義テーブルは、レベル値設定部の値が1となっている。次に、I/O処理サーバ(4)の閾値定義テーブルが来ており、レベル値設定部の値(変更後)は、3となっている。その次は、I/O処理サーバ(2)の閾値定義テーブルが来ており、レベル値設定部の値は、2となっている。最後に、I/O処理サーバ(3)の閾値定義テーブルが来ており、レベル値設定部の値は、4となっている。
I/O処理サーバは、I/O処理要求の受付からI/O処理の完了までの処理時間を管理している。処理時間が、I/O処理サーバが保持する管理テーブルの閾値を超えた場合には、I/O処理サーバはアクセスサーバへ過負荷応答を返す。すなわち、ストレージ装置の応答時間(自サーバでの処理時間を含む)が閾値を超えたときに、アクセスサーバに過負荷応答を返す。処理時間の閾値の初期値は、ストレージ装置の応答時間に合わせた応答時間を設定しておく。処理時間が閾値を超えた場合には、応答時間の閾値を徐々に長くする(閾値を超えない場合には徐々に短くする)。このことにより、頻繁に過負荷の応答を返すことを避ける。たとえば、I/O処理サーバ(1)が過負荷状態で、他のI/O処理サーバ(I/O処理サーバ(2))も過負荷状態である場合、I/O処理サーバ(2)に対するI/O処理要求が、I/O処理サーバ(1)に再び振り向けられる場合がある、このとき、I/O処理サーバ(1)が依然として過負荷であった場合は、2つしかない全てのI/O処理サーバが過負荷の応答をすることになる。このような状況が続いた場合には、I/O処理要求全てに過負荷の応答が行われることになり、アクセスサーバとI/O処理サーバのやりとりが多くなる。このような状況を避けるために閾値を徐々に変更する。
図13に示されるように、I/O処理サーバは、アクセスするストレージ装置ごとに閾値定義テーブルを保持する。管理テーブルは、複数の閾値定義テーブルをリストとして配列したテーブルの集まりである。各閾値定義テーブルは、閾値レベルと、応答時間の閾値と、閾値カウンタ定義値とを対応付けて保持する。また、各ストレージ装置についての閾値定義テーブルは、それぞれ、カウンタの値保持部とレベル値設定部を有する。レベル値設定部とは、ストレージ装置が現在、閾値定義テーブルにある閾値レベルのうちのどの閾値レベルにあるかを指定するものである。カウンタの値保持部は、過負荷状態が存在しない場合に、カウンタがカウントする値を保持する。カウンタの値保持部の値が閾値カウンタ定義値になると、レベル値設定部の閾値レベルを変更し、カウンタ35をリセットすると共に、カウンタの値保持部のカウンタの値を0に初期化する。
I/O処理サーバは、ステップS30で、アクセスサーバからI/O処理要求を受け付けると、ステップS31において、ストレージ装置にI/O処理要求を発行する。ステップS32において、ストレージ装置からI/O処理要求が完了した旨の応答を受け取り、アクセスサーバにレスポンスを返して基本的なI/O処理を完了し、ステップS33において、I/O処理要求の受付時間と完了時間から応答時間を算出する。ステップS34において、管理テーブル(図13、図14)を参照し、閾値定義テーブルのレベル値設定部に設定されている閾値レベルを取得する。そして、取得した閾値レベルに相当する応答時間の閾値を取得する。ステップS35において、応答時間が取得した応答時間の閾値を超えているか否かを判断する。ステップS35の判断がYesの場合には、ステップS40に進む。ステップS35の判断がNoの場合には、ステップS41に進む。
アクセスサーバは、I/O処理サーバを再選択する際には、管理テーブルにある閾値の値を参照し、その中で最も閾値の大きな(負荷が低いと考えられる)I/O処理サーバをI/O処理要求の発行先として再選択する。
図17は、図8及び図9の管理テーブルから、各I/O処理サーバについて設定された閾値レベルの値のみを見やすくするために抜き出したものである。
図18は、図13及び図14の管理テーブルから、各ストレージ装置について設定された閾値レベルの値のみを見やすくするために抜き出したものである。
アクセスサーバにおいて図19のI/O処理要求の要求元はアプリケーションであり、I/O処理要求の発行先はI/O処理サーバである。I/O処理サーバにおいて、図19のI/O処理要求の要求元はアクセスサーバであり、I/O処理要求の発行先はストレージ装置である。
管理テーブル監視部は、一定時間I/O処理要求が発行されていないI/O発行先(アクセスサーバの場合はI/O処理サーバ、I/O処理サーバの場合はストレージ装置)に対し、試験I/O処理要求(ダミーI/O処理要求)を発行する。このI/O処理要求が正常に処理された場合(アクセスサーバの場合は、I/O処理要求に対して過負荷応答が無い場合、I/O処理サーバの場合は応答時間内にI/O処理要求に対する応答を受信した場合)には、管理テーブルの閾値を変更する。このときの閾値の変更は、アクセスサーバの場合は図10、I/O処理サーバの場合は図15に従った変更を行う。
図21のSCSIコマンドのレスポンスデータフォーマットは一般的なものであるが、過負荷状態を示すレスポンスとして、ユーザが自由に定義可能な値として定義されているコード番号のデータを設定するようにする。例えば、Sense Keyに「9」を、Additional sence codeには、「0x80」を設定する。このコード番号は、ベンダが自由に設定使用可能なものであるので、本実施形態の過負荷状態を示すレスポンスを示すものであると定義する。
負荷状態の確認においては、アクセスサーバ(A)がI/O処理サーバ(1)、(2)に対してInquiryコマンドを発行する。I/O処理サーバ(1)、(2)ではInquiryの応答としてコマンドのvender specificの領域に負荷情報を入れて応答を返す。
図23は、Inquiryコマンドに対する応答フォーマットの一般的なものであるが、このフォーマットの中に、vender specificと指定されているフィールドがある。I/O処理サーバは、vender specificと指定されている36~55バイト目の領域に、Inquiryを受け付けた時に既に受付済みで、かつ、処理完了前であったI/O処理要求の数を設定し、アクセスサーバに返送する。
ステップS55のループでは、I/O処理サーバの数だけ処理を繰り返す。ステップS56において、Inquiryを発行し、ステップS57において、Inquiryの応答を受け付ける。ステップS58において、変数、例えば、nに、応答によって受信した、I/O処理サーバが受け付けている、単位時間当たりのI/O処理要求の数を設定する。ステップS59では、アクセスサーバが保持する管理テーブル内の、現在処理中のI/O処理サーバの閾値定義テーブルを参照し、ステップS60で、nがI/O頻度の閾値を超えているか否かを判断する。閾値定義テーブルは、図8、図9と同じものであって、レベル値設定部が保持する閾値レベルで示される、図9のようなI/O頻度の値を閾値として用いる。
アクセスサーバ及びI/O処理サーバは、CPU40を備えるコンピュータ39として実現される。
11、13、15、23、24 ネットワーク
12 負荷分散装置
14-1~14-3、21-1~21-m I/O処理サーバ
16-1~16-4、22-1~22-N ストレージ装置
30 I/O受付部
31 I/O発行部
32 I/O時間監視部
33 管理テーブル格納部
34 管理テーブル監視部
40 CPU
41 ROM
42 RAM
43 通信インタフェース
44 ネットワーク
45 情報提供者
46 記憶装置
47 媒体読み取り装置
48 可搬記録媒体
49 入出力装置
50 バス
Claims (10)
- 入力処理要求又は出力処理要求を受け付け、処理結果を返送する複数のストレージ装置と、
該複数のストレージ装置の1つに該入入力処理要求又は出力処理要求を送信し、該処理結果を受信し、該入力処理要求又は出力処理要求の送信から該処理の完了までのレスポンス時間が閾値を超えた場合に、該入力処理要求又は出力処理要求に対する処理が過負荷状態にあることを示す過負荷応答を送出する複数のI/O処理サーバと、
ユーザからの入力処理要求又は出力処理要求を、該I/O処理サーバからの該過負荷応答に基づいて、過負荷状態ではないI/O処理サーバへ送信する複数のアクセスサーバと、
を備えることを特徴とする負荷分散システム。 - 前記アクセスサーバは、過負荷状態にある前記I/O処理サーバに対し、ダミーの入力処理要求又は出力処理要求を送信し、過負荷状態が解消したか否かを確認することを特徴とする請求項1に記載の負荷分散システム。
- 前記アクセスサーバは、前記I/O処理サーバごとに、負荷状態を判断する閾値に対応付けられた、レベルが増加するごとに該閾値が小さくなる閾値レベルを保持し、過負荷状態を示す場合には、対応する該I/O処理サーバの閾値レベルの値を増加することを特徴とする請求項1に記載の負荷分散システム。
- 前記閾値レベルは、過負荷状態が解消された場合には、値が減少されることを特徴とする請求項3に記載の負荷分散システム。
- 前記アクセスサーバは、前記閾値レベルに基づいて、負荷の軽い前記I/O処理サーバから順にリストにリストアップし、該リストの上に登録されている該I/O処理サーバから順に前記入力処理要求又は出力処理要求を割り振ることを特徴とする請求項4に記載の負荷分散システム。
- 前記I/O処理サーバは、前記ストレージ装置ごとに、負荷状態を判断する閾値に対応付けられた、レベルが増加するごとに該閾値が大きくなる閾値レベルを保持し、前記レスポンス時間が過負荷状態を示す場合には、対応する該ストレージ装置の閾値レベルの値を増加することを特徴とする請求項1に記載の負荷分散システム。
- 前記閾値レベルは、過負荷状態が解消された場合には、値が減少されることを特徴とする請求項6に記載の負荷分散システム。
- 前記I/O処理サーバは、前記レスポンス時間が前記閾値レベルに対応した、負荷状態を判断する閾値より大きい場合、過負荷状態であることを前記アクセスサーバに送信することを特徴とする請求項7に記載の負荷分散システム。
- 入力処理要求又は出力処理要求を受け付け、処理結果を返送する複数のストレージ装置と、該複数のストレージ装置の1つに該入力処理要求又は出力処理要求を送信し、該処理結果を受信し、複数のI/O処理サーバと、ユーザからの入力処理要求又は出力処理要求を、該I/O処理サーバに送信する複数のアクセスサーバとを備える負荷分散システムの負荷分散方法であって、
該I/O処理サーバは、該入力処理要求又は出力処理要求の送信から該処理の完了までのレスポンス時間が閾値を超えた場合に、該入力処理要求又は出力処理要求に対する処理が過負荷状態にあることを示す過負荷応答を送出し、
該アクセスサーバは、該I/O処理サーバからの該過負荷応答に基づいて、過負荷状態ではないI/O処理サーバへ該入力処理要求又は出力処理要求を送信する
ことを特徴とする負荷分散方法。 - 入力処理要求又は出力処理要求を受け付け、処理結果を返送する複数のストレージ装置と、該複数のストレージ装置の1つに該入力処理要求又は出力処理要求を送信し、該処理結果を受信する複数のI/O処理サーバと、ユーザからの入力処理要求又は出力処理要求を、該I/O処理サーバからの過負荷応答に基づいて、過負荷状態ではないI/O処理サーバへ該入力処理要求又は出力処理要求を送信する複数のアクセスサーバとを備える負荷分散システムのプログラムであって、
該I/O処理サーバに、
該入力処理要求又は出力処理要求の送信から該処理の完了までのレスポンス時間が閾値を超えた場合に、該入力処理要求又は出力処理要求に対する処理が過負荷状態にあることを示す該過負荷応答を該アクセスサーバに送出させる、
ことを特徴とするプログラム。
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP11878253.1A EP2797005B1 (en) | 2011-12-19 | 2011-12-19 | Load distribution system |
JP2013549985A JP5825359B2 (ja) | 2011-12-19 | 2011-12-19 | 負荷分散システム |
PCT/JP2011/079425 WO2013094007A1 (ja) | 2011-12-19 | 2011-12-19 | 負荷分散システム |
US14/302,486 US20140297728A1 (en) | 2011-12-19 | 2014-06-12 | Load distribution system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2011/079425 WO2013094007A1 (ja) | 2011-12-19 | 2011-12-19 | 負荷分散システム |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/302,486 Continuation US20140297728A1 (en) | 2011-12-19 | 2014-06-12 | Load distribution system |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2013094007A1 true WO2013094007A1 (ja) | 2013-06-27 |
Family
ID=48667937
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2011/079425 WO2013094007A1 (ja) | 2011-12-19 | 2011-12-19 | 負荷分散システム |
Country Status (4)
Country | Link |
---|---|
US (1) | US20140297728A1 (ja) |
EP (1) | EP2797005B1 (ja) |
JP (1) | JP5825359B2 (ja) |
WO (1) | WO2013094007A1 (ja) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2021522579A (ja) * | 2018-04-23 | 2021-08-30 | マイクロン テクノロジー,インク. | ホストの論理対物理情報の更新 |
US11216345B2 (en) | 2016-06-01 | 2022-01-04 | Seagate Technology Llc | Technologies for limiting performance variation in a storage device |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2013145222A1 (ja) * | 2012-03-29 | 2013-10-03 | 富士通株式会社 | 情報処理装置およびデータ保存処理プログラム |
US9323588B2 (en) * | 2013-03-14 | 2016-04-26 | Comcast Cable Communications, Llc | Service platform architecture |
JP5929946B2 (ja) * | 2014-02-27 | 2016-06-08 | コニカミノルタ株式会社 | 画像形成システム、中継サーバー、通信制御方法及びプログラム |
US10355965B1 (en) * | 2014-07-14 | 2019-07-16 | Sprint Communications Company L.P. | Leveraging a capacity indicator of a mobility management entity |
JP2017162257A (ja) * | 2016-03-10 | 2017-09-14 | 富士通株式会社 | 負荷監視プログラム、負荷監視方法、情報処理装置、および情報処理システム |
US10523744B2 (en) * | 2017-10-09 | 2019-12-31 | Level 3 Communications, Llc | Predictive load mitigation and control in a content delivery network (CDN) |
CN109842665B (zh) * | 2017-11-29 | 2022-02-22 | 北京京东尚科信息技术有限公司 | 用于任务分配服务器的任务处理方法和装置 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2007233783A (ja) | 2006-03-02 | 2007-09-13 | Hitachi Ltd | ストレージ管理方法およびストレージ管理サーバ |
JP2010218235A (ja) * | 2009-03-17 | 2010-09-30 | Fujitsu Ltd | アーカイブ装置、分散管理装置および分散管理プログラム |
JP2010218344A (ja) * | 2009-03-18 | 2010-09-30 | Hitachi Ltd | サービス連携装置、プログラム、サービス連携方法及びサービス提供システム |
JP2011215677A (ja) * | 2010-03-31 | 2011-10-27 | Hitachi Ltd | ストレージシステム、その負荷分散管理方法及びプログラム |
Family Cites Families (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH07244642A (ja) * | 1994-03-04 | 1995-09-19 | Sanyo Electric Co Ltd | 並列処理計算機 |
US6871347B2 (en) * | 2001-04-13 | 2005-03-22 | Interland, Inc. | Method and apparatus for facilitating load balancing across name servers |
JP4232357B2 (ja) * | 2001-06-14 | 2009-03-04 | 株式会社日立製作所 | 計算機システム |
JP4318914B2 (ja) * | 2002-12-26 | 2009-08-26 | 富士通株式会社 | ストレージシステム及びその動的負荷管理方法 |
JP4512192B2 (ja) * | 2005-02-09 | 2010-07-28 | 株式会社日立製作所 | 輻輳制御装置、および、ネットワークの輻輳制御方法 |
US8539075B2 (en) * | 2006-04-21 | 2013-09-17 | International Business Machines Corporation | On-demand global server load balancing system and method of use |
US7818445B2 (en) * | 2008-10-15 | 2010-10-19 | Patentvc Ltd. | Methods and devices for obtaining a broadcast-like streaming content |
JP2011003154A (ja) * | 2009-06-22 | 2011-01-06 | Nippon Telegr & Teleph Corp <Ntt> | 情報データ収集管理装置とその送信周期推定方法及びプログラム |
JP2011197804A (ja) * | 2010-03-17 | 2011-10-06 | Fujitsu Ltd | 負荷解析プログラム、負荷解析方法、および負荷解析装置 |
US9058252B2 (en) * | 2010-03-24 | 2015-06-16 | Microsoft Technology Licensing, Llc | Request-based server health modeling |
US8533337B2 (en) * | 2010-05-06 | 2013-09-10 | Citrix Systems, Inc. | Continuous upgrading of computers in a load balanced environment |
-
2011
- 2011-12-19 JP JP2013549985A patent/JP5825359B2/ja active Active
- 2011-12-19 EP EP11878253.1A patent/EP2797005B1/en active Active
- 2011-12-19 WO PCT/JP2011/079425 patent/WO2013094007A1/ja active Application Filing
-
2014
- 2014-06-12 US US14/302,486 patent/US20140297728A1/en not_active Abandoned
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2007233783A (ja) | 2006-03-02 | 2007-09-13 | Hitachi Ltd | ストレージ管理方法およびストレージ管理サーバ |
JP2010218235A (ja) * | 2009-03-17 | 2010-09-30 | Fujitsu Ltd | アーカイブ装置、分散管理装置および分散管理プログラム |
JP2010218344A (ja) * | 2009-03-18 | 2010-09-30 | Hitachi Ltd | サービス連携装置、プログラム、サービス連携方法及びサービス提供システム |
JP2011215677A (ja) * | 2010-03-31 | 2011-10-27 | Hitachi Ltd | ストレージシステム、その負荷分散管理方法及びプログラム |
Non-Patent Citations (1)
Title |
---|
See also references of EP2797005A4 |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11216345B2 (en) | 2016-06-01 | 2022-01-04 | Seagate Technology Llc | Technologies for limiting performance variation in a storage device |
JP2021522579A (ja) * | 2018-04-23 | 2021-08-30 | マイクロン テクノロジー,インク. | ホストの論理対物理情報の更新 |
Also Published As
Publication number | Publication date |
---|---|
EP2797005A1 (en) | 2014-10-29 |
US20140297728A1 (en) | 2014-10-02 |
EP2797005B1 (en) | 2017-08-16 |
JP5825359B2 (ja) | 2015-12-02 |
EP2797005A4 (en) | 2015-01-28 |
JPWO2013094007A1 (ja) | 2015-04-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP5825359B2 (ja) | 負荷分散システム | |
US8694749B2 (en) | Control method of device in storage system for virtualization | |
JP5039951B2 (ja) | ストレージ・デバイス・ポートの選択の最適化 | |
JP4188602B2 (ja) | クラスタ型ディスク制御装置及びその制御方法 | |
US7962567B1 (en) | Systems and methods for disabling an array port for an enterprise | |
JP6607783B2 (ja) | 分散キャッシュクラスタ管理 | |
US7937481B1 (en) | System and methods for enterprise path management | |
US8683025B2 (en) | Method for managing storage system | |
US8788702B2 (en) | Storage area network multi-pathing | |
JP6040612B2 (ja) | ストレージ装置、情報処理装置、情報処理システム、アクセス制御方法、およびアクセス制御プログラム | |
CN101276366A (zh) | 防止文件重复存储的计算机系统 | |
US20050262386A1 (en) | Storage system and a method for dissolving fault of a storage system | |
JP2015510296A (ja) | ホストエンティティによってアクセスされうる格納データを識別しデータ管理サービスを提供するシステム、装置、および方法 | |
US20090328151A1 (en) | Program, apparatus, and method for access control | |
CN101471830B (zh) | Linux系统下的多路径访问远程逻辑设备的方法 | |
JP2014026529A (ja) | ストレージシステムおよびその制御方法 | |
US20150205531A1 (en) | Adding Storage Capacity to an Object Storage System | |
US20170017601A1 (en) | Systems, devices, apparatus, and methods for identifying stored data by a device located in a path between virtual fibre channel switches and performing a data management service | |
US10019182B2 (en) | Management system and management method of computer system | |
JP5531278B2 (ja) | サーバ構成管理システム | |
US20190050158A1 (en) | Information processing apparatus, method and non-transitory computer-readable storage medium | |
JP2019003586A (ja) | ストレージ制御装置およびパス切り替え制御プログラム | |
JP4595274B2 (ja) | 負荷分散方法及び負荷分散装置 | |
KR101943899B1 (ko) | San 네트워크 환경에서 볼륨을 이용하여 데이터 이동 서비스를 제공하는 시스템 | |
JP6007620B2 (ja) | 中継装置、ストレージシステムおよび中継装置の制御方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 11878253 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2013549985 Country of ref document: JP Kind code of ref document: A |
|
REEP | Request for entry into the european phase |
Ref document number: 2011878253 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2011878253 Country of ref document: EP |
|
NENP | Non-entry into the national phase |
Ref country code: DE |