JP2009294758A - Virtual computer system and driver program for host bus adapter - Google Patents

Virtual computer system and driver program for host bus adapter Download PDF

Info

Publication number
JP2009294758A
JP2009294758A JP2008145748A JP2008145748A JP2009294758A JP 2009294758 A JP2009294758 A JP 2009294758A JP 2008145748 A JP2008145748 A JP 2008145748A JP 2008145748 A JP2008145748 A JP 2008145748A JP 2009294758 A JP2009294758 A JP 2009294758A
Authority
JP
Japan
Prior art keywords
logical
failure
path
information
physical
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP2008145748A
Other languages
Japanese (ja)
Inventor
Tetsuhiro Goto
Masaru Tagawa
哲弘 後藤
大 田川
Original Assignee
Hitachi Ltd
株式会社日立製作所
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hitachi Ltd, 株式会社日立製作所 filed Critical Hitachi Ltd
Priority to JP2008145748A priority Critical patent/JP2009294758A/en
Publication of JP2009294758A publication Critical patent/JP2009294758A/en
Pending legal-status Critical Current

Links

Images

Abstract

A virtual computer system capable of suppressing the influence of a failure or error caused by an I / O request from a certain logical partition to other logical partitions sharing a physical path is obtained.
A driver program 170 that manages an HBA 140 manages failure information (number of failures and threshold value) 171 in units of logical paths 190 to 192 formed in a physical path, and when a failure of a physical path is detected, Determine whether a failure was detected by an I / O request via a logical path, and update the corresponding failure information. Also, check whether or not the number of failures has reached a predetermined threshold, and block the logical path corresponding to the threshold, so that the logical partition other than the logical partition that uses the failed logical path Reduce the impact on other logical partitions.
[Selection] Figure 1

Description

  The present invention relates to a virtual machine system and a host bus adapter driver program, and in particular, a virtual machine system in which an OS operates on a plurality of logical partitions and a process from each OS of the plurality of logical partitions of the virtual machine system. The present invention relates to a driver program for a host bus adapter that is executed simultaneously.

  In recent years, with the improvement of virtualization technology, an open system has come to support a virtual computer system that runs an OS on each of a plurality of logical partitions constructed in a system device as a single physical computer. ing.

Further, in this virtual machine system, a plurality of logical IDs are assigned to one physical host bus adapter mounted on the system device, and a physical path formed by the physical host bus adapter and the I / O device device is assigned to each logical ID unit. A technique for sharing a physical path by forming a plurality of logical paths and allowing the OS of each logical partition to access each of the above-described logical paths is described in, for example, Patent Document 1 and the like. .
JP 2006-85400 A

  The conventional technology of a virtual machine system that shares a physical path with a plurality of logical partitions described in Patent Document 1 is described below when a failure or error occurs in a part of hardware constituting the physical path. Problems arise.

  In general, a failure that occurs in a physical path is classified into a failure that affects all logical paths and a failure that affects only some logical paths among a plurality of logical paths formed on the physical path. be able to.

  Normally, if an I / O request from a certain logical partition causes a failure that affects only some of the aforementioned logical paths, only the I / O request from that logical partition should be affected. .

  However, when a failure that affects only some of the logical paths frequently occurs, there is a possibility that the influence may spread to other logical paths that should not be affected. For example, if the failure is a failure with a timeout, the host bus adapter driver program will continue to use the host bus adapter resources until it detects a timeout failure. The I / O request is also affected, and there is a possibility that slowdown or stoppage that slows down the processing in the OS on other logical partitions may occur.

  An object of the present invention is to prevent the influence of a failure or an error caused by an I / O request from a certain logical partition from spreading to other logical partitions sharing a physical path in view of the above points. The object is to provide a virtual machine system and a host bus adapter driver program that make it possible.

  According to the present invention, the object is to divide a system device into a plurality of logical partitions, an OS operates on each of the plurality of logical partitions, and a plurality of logical hosts are connected to one physical host bus adapter mounted on the system device. An ID is assigned, a plurality of logical paths are formed in units of logical IDs in a physical path formed between the physical host bus adapter and the input / output device, and the OS of each logical partition accesses each of the logical paths. The virtual computer system has a control unit for controlling the physical host bus adapter, the control unit includes physical path failure information in the logical ID unit, and as physical path failure information in the logical ID unit, Information on the number of failures, threshold information on the number of failures, and information indicating the logical path blocking status are managed for each type of failure that can occur. More is achieved.

  In addition, the purpose is to divide the system device into a plurality of logical partitions, an OS runs on each of the plurality of logical partitions, and a plurality of logical IDs are assigned to one physical host bus adapter mounted on the system device. A virtual computer system in which a plurality of logical paths are formed in units of logical IDs in a physical path formed between a physical host bus adapter and an input / output device, and the OS of each logical partition accesses each of the logical paths A host bus adapter driver program for controlling the physical host bus adapter in FIG. 1, as failure information of a physical path provided in the logical ID unit, information on the number of occurrences of failure, threshold information on the number of occurrences of failure, logical A step of detecting a physical path failure, and a step of detecting a physical path failure. When a failure is detected, among the plurality of logical paths that share the failed physical path, a step of determining which logical path is used to detect the failure through the logical path, and the logical ID based on the detection result And the step of updating failure information corresponding to the type of failure, the step of determining whether the number of failures has reached a threshold when the failure information has been updated, and the number of failures has reached the threshold, This is achieved by causing the CPU to execute the step of closing the logical path corresponding to the logical ID.

  Note that the host bus adapter driver program of the present invention may be applied to a host bus adapter that controls any protocol. For example, the virtual computer system is controlled by Fiber Channel or Ethernet (registered trademark). It may be constructed.

  According to the present invention, in a virtual machine system in which an OS on each of a plurality of logical partitions constructed on a physical computer shares one physical host bus adapter, an I / O request from a certain logical partition is obtained. Even when a failure is detected, it is possible to suppress the influence on other logical partitions sharing the physical path.

  Embodiments of a virtual computer system and a host bus adapter driver program according to the present invention will be described below in detail with reference to the drawings.

  FIG. 1 is a block diagram showing the configuration of a virtual machine system according to an embodiment of the present invention. The example shown in FIG. 1 is an example in which each OS on three logical partitions shares one physical host bus adapter, but the present invention may be configured with a larger number of logical partitions. it can.

  The virtual computer system according to the embodiment of the present invention shown in FIG. 1 is configured by connecting a system device 100 as a single physical computer and a RAID device 150. Although not shown in FIG. 1, the system device 100 is configured with hardware such as a CPU and a memory, and by loading a program stored in the RAID device 150 into the memory and causing the CPU to execute the program. Various functions necessary for the present invention to be described later are constructed.

  The system apparatus 100 includes a logical partition # 1 (120), a logical partition # 2 (121), and a logical partition # 3 (122) under the control of the logical partition control program 110 and the host OS #A (111). OS # 1 (130), OS # 2 (131), and OS # 3 (132) are operating in each logical partition. In addition, the system apparatus 100 includes one physical host bus adapter (HBA) 140 and is connected to the RAID apparatus 150 via this HBA. Three logical IDs of logical ID #A (160), logical ID #B (161), and logical ID #C (162) are assigned to the HBA 140. In the host OS #A (111), there is a driver program 170 that is a driver that actually controls the HBA 140. The driver program 170 manages the failure information 171.

  The RAID device 150 has logical units LU # 1 (180), LU # 2 (181), and LU # 3 (182) defined therein, and LUN mapping that associates each LU with the logical ID of the HBA. A table 183 is provided. Then, logical paths 190, 191, and 192 are formed in the physical path connecting the system apparatus 100 and the RAID apparatus 150 by the LUN mapping table 183.

  The OS # 1 (130), OS # 2 (131), and OS # 3 (132) access the logical paths 190, 191, and 192, respectively, thereby causing LU # 1 (180) and LU # 2 ( 181), I / O request is made to LU # 3 (182).

  In the embodiment of the present invention shown in FIG. 1, the driver program 170 and the failure information 171 are shown to exist in the host OS #A (111). The failure information 171 may be present in each OS of OS # 1 (130), OS # 2 (131), and OS # 3 (132).

  FIG. 2 is a diagram showing the configuration of the LUN mapping table 183 provided in the RAID device 150. The LUN mapping table 183 is configured to store a plurality of records of LUs that map the HBA logical ID and the LU in the RAID device in association with each other. The logical ID of the HBA indicates the logical ID of the HBA connected to the RAID device, and the LU to be mapped indicates the LU in the RAID device to be assigned to the logical ID. With this LUN mapping table, each logical ID is associated with an LU, and a plurality of logical paths are formed in the physical path between the HBA and the RAID device.

  FIG. 3 is a diagram showing the configuration of the failure information 171 managed by the driver program 170. The driver program 170 manages a plurality of pieces of failure information for each logical ID, and in the embodiment of the present invention, the three pieces of failure information 300 to 302 are managed. Each of the failure information 300 to 302 includes blocking information 310 indicating the blocking state of the logical path corresponding to the logical ID, information 320 indicating the type of failure that can occur in the physical path, and the number of failures for each type of failure. 330 and threshold information 340 are managed.

  In the above, the value 311 of the blocking information 310 indicating the blocking state indicates that the logical path is not blocked when the value is 0, and indicates that the logical path is blocked when the value is 1. ing. As types of failures, interface failures 321 and timeout failures 322, which are abnormal data transfer (occurrence of data corruption at the time of data transfer, etc.), are managed, and the number of failures and threshold values are managed. The initial values 331 and 332 of the number of failures of the interface failure and the timeout failure are both 0, and are counted up each time a failure occurs. The threshold is a value set by a user such as an administrator to block the logical path when the number of failures reaches the threshold, and 2 is set in the interface failure threshold 341 of the failure information 300 shown in the figure. The timeout failure threshold value 342 is set to 1 as shown in FIG.

  FIG. 4 is a flowchart for explaining the processing operation from when the driver program 170 detects a physical path failure until the logical path corresponding to the logical ID is blocked, which will be described next. In the example of the processing described here, an interface failure is detected in an I / O request from the OS # 1 (130) on the logical partition # 1 (120) shown in FIG. This is an example when a timeout failure is detected.

(1) The driver program 170 that has detected that a failure has occurred in a physical path determines which logical path the failure has been detected by using an I / O request. Here, it is determined that the request is an I / O request from the OS # 1 (130) on the logical partition # 1 (120) via the logical path 190 (steps 400, 410, and 420).

(2) Next, the driver program 170 determines whether the failure type of the I / O request via the logical path 190 in which the failure is detected is an interface failure or a timeout failure (steps 440 and 450). 451).

(3) If the failure determined in step 440 is an interface failure, the driver program 170 determines the logical ID (= logical ID # A) of the failure information 300 corresponding to the logical path 190 and the failure count information corresponding to the interface failure. 331 is counted up from 0 to 1 (step 460).

(4) Next, the driver program 170 checks whether or not the updated failure frequency information 331 has reached the threshold value 341. If the failure frequency information 331 has not reached the threshold value 341, the driver program 170 performs the processing here. finish. In the example of the failure information 300 shown in FIG. 3, since the failure count 331 = 1 and the threshold value 341 = 2, it is determined that the threshold value has not been reached, and the process is terminated. On the other hand, when the failure frequency information 331 has reached the threshold value 341, the information value 311 indicating the blocking state is updated from 0 to 1, and the process of blocking the logical path 190 corresponding to the logical ID #A is executed ( Steps 470, 480).

(5) If the failure determined in step 440 is a timeout failure, the driver program 170 causes the failure corresponding to the logical ID (= logical ID # A) of the failure information 300 corresponding to the logical path 190 and the failure corresponding to the timeout failure. The count information 332 is counted up from 0 to 1 (step 461).

(4) Next, the driver program 170 checks whether or not the updated failure frequency information 332 has reached the threshold value 342. If the failure frequency information 332 has not reached the threshold value 342, the driver program 170 performs the processing here. finish. In the example of the failure information 300 shown in FIG. 3, the failure frequency 332 = 1 and the threshold value 342 = 1, and it is determined that the threshold value has been reached, and the value 311 indicating the blocked state is updated from 0 to 1, A process of closing the logical path 190 corresponding to the logical ID #A is executed. On the other hand, if the failure frequency information 332 has not reached the threshold value 341, the processing here ends (steps 471, 480).

  The driver program 170 responds to all the I / O requests from the logical partition # 1 (120) via the logical path 190 after the above-described processing, thereby causing a failure that occurs in the logical path 190. The influence on other logical partitions # 2 (121) and logical partitions # 3 (122) sharing the physical path can be suppressed.

  The example of the processing described above has been described on the assumption that a failure has been detected in the I / O request via the logical path 190 in step 410. However, in step 410, the I / O via the logical path 191 or the logical path 192 has been described. Even when a failure is detected by the O request, the logical path can be blocked by performing the same processing as in step 440 and subsequent steps.

  The processing in the above-described embodiment of the present invention is configured by a program and can be executed by a CPU included in the present invention. These programs are stored in a recording medium such as an FD, CDROM, or DVD and provided. And can be provided by digital information over a network.

It is a block diagram which shows the structure of the virtual computer system by one Embodiment of this invention. It is a figure which shows the structure of the LUN mapping table with which a RAID apparatus is equipped. It is a figure which shows the structure of the failure information which a driver program manages. 10 is a flowchart for explaining processing operations from when a driver program detects a physical path failure until a logical path corresponding to a logical ID is blocked.

Explanation of symbols

100 System Unit 110 Logical Partition Control Program 111 Host OS
120 to 122 Logical partitions 130 to 132 Operating system (OS)
140 Host bus adapter (HBA)
150 RAID device 160-162 Logical ID
170 Host bus adapter driver program 171 Failure information 180-182 LU
190-192 Logical path 300-302 Fault information

Claims (4)

  1. The system unit is divided into a plurality of logical partitions, the OS operates on each of the plurality of logical partitions, a plurality of logical IDs are assigned to one physical host bus adapter mounted on the system unit, and a physical host bus adapter and In a virtual computer system in which a plurality of logical paths are formed in units of logical IDs in a physical path formed between input / output device devices, and the OS of each logical partition accesses each of the logical paths,
    Control means for controlling the physical host bus adapter, the control means comprises physical path failure information in the logical ID unit, and the failure information of the physical path provided in the logical ID unit, A virtual computer system that manages information, threshold information on the number of occurrences of a failure, and information indicating a blocked state of a logical path for each type of failure that can occur.
  2.   When the control unit detects a failure in a physical path, a failure detection for determining which of the plurality of logical paths sharing the failed physical path is detected by an input / output request through which logical path The virtual computer system according to claim 1, wherein the virtual computer system has a function and updates fault information corresponding to the logical ID and the type of fault based on a detection result.
  3.   The control means has a function of determining whether the number of failures has reached a threshold when the failure information is updated, and when the threshold is reached, closes the logical path corresponding to the logical ID. The virtual computer system according to claim 2.
  4. The system unit is divided into a plurality of logical partitions, the OS operates on each of the plurality of logical partitions, a plurality of logical IDs are assigned to one physical host bus adapter mounted on the system unit, and a physical host bus adapter and The physical host bus adapter in the virtual machine system in which a plurality of logical paths are formed in units of logical IDs in the physical path formed between the input / output device devices and the OS of each logical partition accesses each of the logical paths A host bus adapter driver program for controlling
    As the failure information of the physical path provided in the logical ID unit, information on the number of occurrences of failure, threshold information on the number of occurrences of failure, and information indicating the blocking status of the logical path are provided for each type of failure that can occur. ,
    Steps for detecting a physical path failure, and when a physical path failure is detected, which logical path among the multiple logical paths sharing the failed physical path is detected by the input / output request A step of updating failure information corresponding to the logical ID and the type of failure according to a detection result, a step of determining whether the number of failures has reached a threshold when the failure information is updated, and A host bus adapter driver program that causes a CPU to execute a step of closing a logical path corresponding to the logical ID when the number of failures reaches a threshold value.
JP2008145748A 2008-06-03 2008-06-03 Virtual computer system and driver program for host bus adapter Pending JP2009294758A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2008145748A JP2009294758A (en) 2008-06-03 2008-06-03 Virtual computer system and driver program for host bus adapter

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2008145748A JP2009294758A (en) 2008-06-03 2008-06-03 Virtual computer system and driver program for host bus adapter

Publications (1)

Publication Number Publication Date
JP2009294758A true JP2009294758A (en) 2009-12-17

Family

ID=41542929

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2008145748A Pending JP2009294758A (en) 2008-06-03 2008-06-03 Virtual computer system and driver program for host bus adapter

Country Status (1)

Country Link
JP (1) JP2009294758A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2013140526A (en) * 2012-01-05 2013-07-18 Hitachi Ltd Computer system and failure processing method
US10055279B2 (en) 2014-04-02 2018-08-21 Hitachi, Ltd. Semiconductor integrated circuit for communication, storage apparatus, and method for managing failure in storage apparatus

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH10289248A (en) * 1997-04-16 1998-10-27 Nippon Telegr & Teleph Corp <Ntt> Method and system for collecting information resource
JP2000148655A (en) * 1998-11-13 2000-05-30 Hitachi Ltd Method for controlling information processing system
JP2006085400A (en) * 2004-09-16 2006-03-30 Hitachi Ltd Data processing system
JP2006107151A (en) * 2004-10-06 2006-04-20 Hitachi Ltd Storage system and communication path control method for storage system
JP2006268515A (en) * 2005-03-24 2006-10-05 Nec Corp Pci card trouble management system
JP2007265243A (en) * 2006-03-29 2007-10-11 Hitachi Ltd Computer system and logical path switching method
JP2007282153A (en) * 2006-04-12 2007-10-25 Hitachi Communication Technologies Ltd Network system and communications apparatus

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH10289248A (en) * 1997-04-16 1998-10-27 Nippon Telegr & Teleph Corp <Ntt> Method and system for collecting information resource
JP2000148655A (en) * 1998-11-13 2000-05-30 Hitachi Ltd Method for controlling information processing system
JP2006085400A (en) * 2004-09-16 2006-03-30 Hitachi Ltd Data processing system
JP2006107151A (en) * 2004-10-06 2006-04-20 Hitachi Ltd Storage system and communication path control method for storage system
JP2006268515A (en) * 2005-03-24 2006-10-05 Nec Corp Pci card trouble management system
JP2007265243A (en) * 2006-03-29 2007-10-11 Hitachi Ltd Computer system and logical path switching method
JP2007282153A (en) * 2006-04-12 2007-10-25 Hitachi Communication Technologies Ltd Network system and communications apparatus

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2013140526A (en) * 2012-01-05 2013-07-18 Hitachi Ltd Computer system and failure processing method
US10055279B2 (en) 2014-04-02 2018-08-21 Hitachi, Ltd. Semiconductor integrated circuit for communication, storage apparatus, and method for managing failure in storage apparatus

Similar Documents

Publication Publication Date Title
US9201607B2 (en) Computer system and method for balancing usage rate of pool volumes
US9430266B2 (en) Activating a subphysical driver on failure of hypervisor for operating an I/O device shared by hypervisor and guest OS and virtual computer system
CN104798349B (en) In response to the failure transfer of port failure
US9619311B2 (en) Error identification and handling in storage area networks
US8429667B2 (en) Storage system and method for controlling the same
US9047219B2 (en) Storage system, storage control device, and storage control method
US9348724B2 (en) Method and apparatus for maintaining a workload service level on a converged platform
US8806124B2 (en) Methods and structure for transferring ownership of a logical volume by transfer of native-format metadata in a clustered storage environment
US8095822B2 (en) Storage system and snapshot data preparation method in storage system
JP5489601B2 (en) Method, apparatus, system, and program for dynamically managing physical and virtual multipath I / O
JP5585844B2 (en) Virtual computer control method and computer
JP4842593B2 (en) Device control takeover method for storage virtualization apparatus
US7337353B2 (en) Fault recovery method in a system having a plurality of storage systems
US7774641B2 (en) Storage subsystem and control method thereof
US8036238B2 (en) Information processing system and access method
US8423816B2 (en) Method and computer system for failover
US6195703B1 (en) Dynamic routing for performance partitioning in a data processing network
US20160162408A1 (en) Parallel destaging with replicated cache pinning
US8412863B2 (en) Storage apparatus and virtual port migration method for storage apparatus
US7318138B1 (en) Preventing undesired trespass in storage arrays
US8074105B2 (en) High data availability SAS-based RAID system
JP4871546B2 (en) Storage system
JP4188602B2 (en) Cluster type disk control apparatus and control method thereof
US9304879B2 (en) High availability failover utilizing dynamic switch configuration
US8543762B2 (en) Computer system for controlling allocation of physical links and method thereof

Legal Events

Date Code Title Description
A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20100608

A131 Notification of reasons for refusal

Effective date: 20120417

Free format text: JAPANESE INTERMEDIATE CODE: A131

A02 Decision of refusal

Free format text: JAPANESE INTERMEDIATE CODE: A02

Effective date: 20121009