WO2015079528A1 - 計算機システム及び計算機システムの制御方法 - Google Patents
計算機システム及び計算機システムの制御方法 Download PDFInfo
- Publication number
- WO2015079528A1 WO2015079528A1 PCT/JP2013/082006 JP2013082006W WO2015079528A1 WO 2015079528 A1 WO2015079528 A1 WO 2015079528A1 JP 2013082006 W JP2013082006 W JP 2013082006W WO 2015079528 A1 WO2015079528 A1 WO 2015079528A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- request
- processor
- information
- controller
- server
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/14—Handling requests for interconnection or transfer
- G06F13/16—Handling requests for interconnection or transfer for access to memory bus
- G06F13/1605—Handling requests for interconnection or transfer for access to memory bus based on arbitration
- G06F13/1642—Handling requests for interconnection or transfer for access to memory bus based on arbitration with request queuing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/10—Program control for peripheral devices
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/061—Improving I/O performance
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/061—Improving I/O performance
- G06F3/0613—Improving I/O performance in relation to throughput
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0629—Configuration or reconfiguration of storage systems
- G06F3/0635—Configuration or reconfiguration of storage systems by changing the path, e.g. traffic rerouting, path reconfiguration
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0662—Virtualisation aspects
- G06F3/0665—Virtualisation aspects at area level, e.g. provisioning of virtual or logical volumes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/067—Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/0671—In-line storage system
- G06F3/0683—Plurality of storage devices
- G06F3/0689—Disk arrays, e.g. RAID, JBOD
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
Definitions
- the present invention relates to a host computer I / O request distribution method in a computer system composed of a host computer and a storage device.
- controllers the controller responsible for processing the access request for each volume of the storage apparatus is uniquely determined in advance.
- controller 1 when the controller in charge of processing an access request for a certain volume A is the controller 1, "the controller 1 has ownership of the volume A" ".
- the controller that does not have ownership receives an access to volume A (for example, a read request) from a host computer connected to the storage system, the controller that does not have ownership once transfers the access request to the controller that has ownership.
- Patent Document 1 discloses a storage system including dedicated hardware (LR: local router) that allocates an access request to a controller having ownership.
- LR dedicated hardware
- the LR provided in the host (channel) interface (I / F) that receives a volume access command from the host specifies a controller having ownership and transfers the command to the controller. To do. This makes it possible to appropriately assign processing to a plurality of controllers.
- the present invention provides a computer system comprising a host computer and a storage device, wherein the host computer acquires ownership information from the storage device, and the host computer executes a command based on the acquired ownership information. Determine the controller to issue to.
- the host computer when the host computer issues a volume access command to the storage device, the host computer issues a request to the storage device to acquire information on a controller having ownership of the access target volume, The host computer transmits a command to the controller having the ownership based on the ownership information returned from the storage apparatus in response to the request.
- the host computer issues a first request for acquiring information of a controller having ownership of the access target volume, and then before receiving a response to the first request from the storage device, the access target volume It is possible to issue a second request for acquiring information of a controller having the ownership of
- the host computer it is possible to prevent the host computer from issuing an I / O request to a storage controller that does not have ownership, and to improve access performance.
- FIG. 10 is a diagram showing a processing flow in the storage apparatus when an I / O command is transmitted to a representative MP. It is the figure which showed the flow of a process when the distribution module receives a some I / O command.
- FIG. 10 is a diagram showing a processing flow in the storage apparatus when an I / O command is transmitted to a representative MP. It is the figure which showed the flow of a process when the distribution module receives a some I / O command.
- FIG. 10 is a diagram illustrating a flow of processing performed by the storage apparatus when one of the controllers is stopped. It is the figure which showed the content of the index table. It is a figure showing each component of the computer system which concerns on Example 2 of this invention. It is a block diagram of the server blade and storage controller module which concern on Example 2 of this invention. It is a conceptual diagram of the command queue which the storage controller module which concerns on Example 2 of this invention has. It is the figure which showed the outline
- FIG. 1 shows a configuration of a computer system 1 according to the first embodiment of the present invention.
- the computer system 1 includes a storage device 2, a server 3, and a management terminal 4.
- the storage device 2 is connected to the server 3 via the I / O bus 7.
- I / O bus for example, PCI-Express is used.
- the storage device 2 is connected to the management terminal 4 via the LAN 6.
- the storage device 2 is a storage medium that stores a plurality of storage controllers 21a and 21b (abbreviated as “CTL” in the drawing.
- the storage controller may also be abbreviated as “controller”) and data. It is composed of a plurality of HDDs 22 (the storage controllers 21a and 21b may be collectively referred to as “controller 21”).
- the controller 21 a includes an MPU 23 a for controlling the storage device 2, a memory 24 a for storing programs executed by the MPU 23 a and control information, a disk interface (disk I / F) 25 a for connecting the HDD 22, and the server 3.
- a port 26a which is a connector for connection via an I / O bus, is provided (and the controller 21b has the same components as the controller 21a, so description of the controller 21b is omitted).
- a part of the memory 24a, 24b is also used as a disk cache.
- the controllers 21a and 21b are interconnected by an inter-controller connection path (I path) 27.
- the controllers 21 a and 21 b also have a NIC (Network Interface Controller) for connecting the storage management terminal 23.
- the HDD 22 uses a magnetic disk. However, it is also possible to use a semiconductor storage device such as SSD (Solid State Drive).
- each element (MPU 23, disk I / F 25, etc.) in the controller 21 is not limited to the number of configurations shown in FIG. 1, and a plurality of MPUs 23 and disk I / Fs 25 exist in the controller 21. Even if it exists, this invention is applicable.
- the server 3 has a configuration in which an MPU 31, a memory 32, and a distribution module 33 are connected to an interconnection switch 34 (abbreviated as “SW” in the drawing).
- the MPU 31, the memory 32, the distribution module 33, and the interconnection switch 34 are connected by an I / O bus such as PCI-Express.
- the distribution module 33 selectively transfers a command (I / O request such as read or write) sent from the MPU 31 to the storage apparatus 2 to one of the controllers 21a and 21b of the storage apparatus 2.
- a configuration in which a plurality of virtual machines operate on the server 3 may be employed.
- the number of servers 3 is not limited to one, and a plurality of servers 3 may exist.
- the management terminal 4 is a terminal for performing management operations of the storage device 2.
- the management terminal 4 includes an input / output unit 234 such as an MPU, a memory, a NIC for connecting to the LAN 6, a keyboard, a display, and the like provided in a known personal computer.
- the management operation is an operation such as defining a volume to be provided to the server 33 or the like.
- the storage apparatus 2 forms one or more logical volumes (also referred to as LDEVs) from one or more HDDs 22.
- LDEVs logical volumes
- Each logical volume is managed by giving a unique number in the storage apparatus 2 and is called a logical volume number (LDEV #).
- LDEV # logical volume number
- S_ID which is information that can uniquely identify a virtual machine
- LUN logical unit number
- the server 3 uniquely identifies the access target volume by including S_ID and LUN in the command parameters of the I / O command, and the server 3 uses the storage apparatus 2 when specifying the volume. Do not use LDEV #. Therefore, the storage apparatus 2 holds information (logical volume management table 200) for managing the correspondence between LDEV # and LUN, and uses that information to specify the S_ID and LUN specified by the I / O command from the server 3. The group information is converted into LDEV #.
- the logical volume management table 200 (also referred to as “LDEV management table 200”) shown in FIG. 2 is a table that manages the correspondence between LDEV # and LUN, and is the same table in each of the memories 24a and 24b in the controllers 21a and 21b.
- the S_ID 200-1 and LUN 200-2 columns store the S_ID and LUN of the server 3 that are associated with the logical volume specified by the LDEV # 200-4.
- MP # 200-4 is a column for storing information of ownership, which will be described below.
- the controllers (21a, 21b) responsible for processing access requests for each logical volume are uniquely determined for each logical volume.
- This controller (21a, 21b) (or processor 23a, 23b) in charge of processing a request for a logical volume is called a “controller (or processor) with ownership”, and a controller (or processor) with ownership. ) Is called “ownership information”.
- MP # 200-4 is a column for storing ownership information.
- the MPU 23a of the controller 21a is the ownership of the logical volume of the entry in which 0 is stored in the MP # 200-4 column.
- the logical volume of the entry in which 1 is stored in the MP # 200-4 column indicates that the MPU 23b of the controller 21b has ownership.
- the logical volume with LDEV # 1 indicates that MP # 200-4 is the controller (processor) with MP0 200-4, that is, MPU 23a of controller 21a has ownership.
- each controller (21a, 21b) has only one processor (23a, 23b), so the expression “the controller 21a has ownership” and "processor The expression “(MPU) 23a has ownership” is substantially synonymous.
- the server 3 receives an access request to the controller 21 for a volume for which the controller 21 does not have ownership.
- the controller 21 a has the ownership of the logical volume with the LDEV # 1, but when the controller 21 b receives a read request for the logical volume with the LDEV # 1 from the server 3, the controller 21 b Since the ownership is not possessed, the MPU 23b of the controller 21b transfers the read request to the MPU 23a of the controller 21a via the inter-controller connection path (I path) 27.
- the MPU 23a reads the read data from the HDD 22 and stores the read data in its own cache memory (in the memory 24b).
- the read data is returned to the server 3 via the inter-controller connection path (I path) 27 and the controller 21a.
- I path inter-controller connection path
- the storage apparatus 2 has a mechanism for providing owner right information of each volume to the server 3.
- functions of the server 3 will be described.
- FIG. 3 shows an overview of processing when the server 3 transmits an I / O request to the storage device 2.
- S1 is a process performed only at the initial setting after the computer system 1 is started.
- the storage controller 21a or 21b generates the distribution tables 241a and 241b, and the read destination information and the distribution table base address of the distribution table. Information is notified to the distribution module 33 of the server 3.
- the distribution table 241 is a table storing owner right information, and the contents thereof will be described later. Further, in the generation process of the distribution table 241a (or 241b) in S1, a storage area for storing the distribution table 241 is secured on the memory, and the contents are initialized (for example, 0 is written in all areas on the table). It is processing to do.
- the allocation tables 241a and 241b are stored in the memory 24 of either one of the controllers 21a and 21b.
- the allocation module 33 stores the read destination information of the allocation table. This information indicates which controller's memory 24 should be accessed to access.
- the distribution table base address information is information necessary when the distribution module 33 accesses the distribution table 241, and this will also be described in detail later.
- the distribution module 33 Upon receiving the read destination information, the distribution module 33 stores the read destination information and the distribution table base address information in the distribution module 33 (S2).
- the present invention is effective even when the distribution table 241 having the same contents is stored in both the memories 24a and 24b.
- the server 3 accesses the volume of the storage device 2.
- the MPU 31 generates an I / O command in S3.
- the I / O command includes the S_ID that is the information of the transmission source server 3 and the LUN of the volume.
- the distribution module 33 When receiving the I / O command from the MPU 31, the distribution module 33 extracts the S_ID and LUN in the I / O command, and calculates the access address of the distribution table 241 using the S_ID and LUN (S4). Details of this processing will be described later.
- the distribution module 33 is configured to be able to refer to the data at the address by issuing an access request designating the address to the memory 241 of the storage apparatus 2, and in S6, the calculation is performed in S4.
- the distribution table 241 of the controller 21 is accessed using the address. At this time, either of the controllers 21a and 21b is accessed based on the table read destination information stored in S2 (FIG. 3 describes the case of accessing the distribution table 241a).
- the distribution table 241 it is determined which of the controllers 21a and 21b has ownership of the access target volume.
- the I / O command (received in S3) is transferred to either the controller 21a or the controller 21b.
- FIG. 3 shows an example in which the controller 21b has ownership.
- the controller 21 (21b) that has received the I / O command performs processing in the controller 21, returns the response to the server 3 (MPU 31) (S8), and ends the I / O processing. Thereafter, each time an I / O command is issued from the MPU 31, the processes of S3 to S8 are performed.
- the memory 24 of the storage controller 21 is a storage area having a 64-bit address space, and the distribution table 241 is stored in a continuous area in the memory 24.
- FIG. 4 shows the format of the address information in the distribution table 241 calculated by the distribution module 33. This address information is composed of a 42-bit sorting table base address, an 8-bit index, a 12-bit LUN, and a 2-bit fixed value (value is 00).
- the distribution table base address is information that the distribution module 33 receives from the controller 21 in S2 of FIG.
- the Index 402 is 8-bit information derived by the storage apparatus 2 based on the information (S_ID) of the server 3 included in the I / O command, and the derivation method will be described later (hereinafter, the S_ID of the server 3 is described below).
- the information derived from the above is called “Index number”).
- the controllers 21a and 21b maintain and manage the information on the correspondence relationship between the S_ID and the Index number as an index table 600 as shown in FIG. 11 (this information generation trigger and generation method will also be described later).
- the LUN 403 is the logical unit number (LUN) of the access target LU (volume) included in the I / O command.
- the distribution module 33 of the server 3 generates an address according to the format of FIG.
- Each entry (row) in the distribution table 241 is information in which ownership information and LDEV # of each LU accessed by the server 3 are stored, and each entry has an enable bit (denoted as “En” in the figure).
- En501 is 1 bit
- MP # 502 is 7 bits
- LDEV # is 24 bits of information
- one entry is information of 32 bits (4 bytes) in total.
- En501 is information indicating whether or not the entry is a valid entry. If the value of En501 is 1, the entry is valid. If 0, the entry is invalid (that is, the entry is not valid).
- the LU stored in MP # 502 and LDEV # 503 in this case is invalid (unusable) information. .
- each entry in the distribution table 241 will be described.
- a case where the allocation table base address is 0 will be described.
- the 4-byte area starting from address 0 (0x0000 0000 0000) of the allocation table 241 is accessed by the server 3 with the index number 0 (or a virtual machine running on the server 3).
- Ownership information (and LDEV #) for the LU whose LUN is 0 is stored.
- the address 0x0000 0000 0004 to 0x0000 0000 0000 0007, 0x0000 0000 0000 to 0x0000 0000 000F store the ownership information for the LU with the first LUN and the LU with the second LUN.
- Information necessary for the distribution unit 35 to perform I / O distribution processing includes a search data table 3010, distribution table base address information 3110, and distribution table read destination CTL # information 3120.
- the index number 3011 of the search data table 3010 stores the index number corresponding to the S_ID stored in the S_ID 3012 column.
- the search data table 3010 is used. The index number is derived from the S_ID in the I / O command.
- the configuration of the search data table 3010 in FIG. 6 is an example.
- the present invention is also effective by using a table stored in order from the top of the S_ID 3012 column.
- the server 3 (or the virtual machine running on the server 3) first performs I / O on the storage apparatus 2.
- the storage apparatus 2 stores information in S_ID 3012 of the search data table 3010 at that time. This process will be described later.
- the distribution table base address information 3110 is information on the distribution table base address used when calculating the storage address of the distribution table 241 described above. Immediately after the computer system 1 is started, this information is transmitted from the storage device 2 to the distribution unit 35. The distribution unit 35 that has received this information stores this information in its own memory, and thereafter distributes it. This information is used when calculating the access destination address of the table 241.
- the distribution table read destination CTL # information 3120 is information for specifying which of the controllers 21 a and 21 b should be accessed when the distribution unit 35 accesses the distribution table 241.
- the distribution unit 35 accesses the memory 241a of the controller 21a, and when the content of the distribution table read destination CTL # information 3120 is “1”, the controller 21b. The memory 241b is accessed.
- the distribution table read destination CTL # information 3120 is information transmitted from the storage apparatus 2 to the distribution unit 35 immediately after the computer system 1 is activated.
- the allocating unit 35 receives an I / O command from the MPU 31 via the port 36, it extracts the S_ID of the server 3 (or the virtual machine on the server 3) and the LUN of the access target LU included in the I / O command ( S41). Subsequently, the allocating unit 35 performs processing for converting the extracted S_ID into an Index number. At this time, a search data table 3010 managed in the distribution unit 35 is used. The allocating unit 35 refers to the S_ID 3012 of the search data table 3010 and searches for a line (entry) that matches the S_ID extracted in S41.
- Index # 3011 of the line that matches the S_ID extracted in S41 is found (S43: Yes)
- a distribution table access address is created using the contents of Index # 3011 (S44).
- the distribution table 241 is accessed to obtain information on the controller 21 to which the I / O request is to be transmitted (information stored in MP # 502 in FIG. 5) (S6). Then, an I / O command is transmitted to the controller 21 specified from the information obtained in S6 (S7).
- no value is initially stored in the S_ID 3012 of the search data table 3010.
- the MPU 23 of the storage apparatus 2 determines the index number and sets the S_ID of the server 3 (or the virtual machine on the server 3). And stored in the row corresponding to the determined index number in the search data table 3010. Therefore, when the server 3 (or the virtual machine on the server 3) first issues an I / O request to the storage device 2, the S_ID 3012 of the search data table 3010 contains the server 3 (or the virtual machine on the server 3). Since the S_ID information is not stored, the index number search fails.
- the allocating unit 35 when the index number search fails, that is, when the S_ID information of the server 3 is not stored in the search data table 3010, a predetermined specific controller is used. An I / O command is to be transmitted to 21 MPUs (hereinafter, this MPU is referred to as “representative MP”). However, if the index number search fails (No in S43), the allocating unit 35 generates a dummy address (S45) and designates the dummy address to access the memory 24 (for example, read) ( S6 ′). The dummy address is an address unrelated to the address stored in the distribution table 241. After S6 ', the allocating unit 35 transmits an I / O command to the representative MP (S7'). The reason for performing the process of accessing the memory 24 by specifying a dummy address will be described later.
- FIG. 8 Update distribution table
- the processing flow in the storage apparatus 2 that has received the I / O command transmitted to the representative MP is shown in FIG. 8 will be used for explanation.
- the representative MP in this example, the case where the MPU 23a of the controller 21a is the representative MP
- the controller 21a receives the S_ID, LUN, and LDEV included in the I / O command.
- the management table 200 it is determined whether or not it owns the ownership of the LU to be accessed (S11). If the ownership is present, the subsequent processing is performed by the controller 21a. If the ownership is not present, the I / O command is transferred to the controller 21b.
- the controller 21 processes the received I / O request and returns the processing result to the server 3.
- the controller 21 performs processing for associating the S_ID included in the I / O command processed up to S12 with the Index number.
- the controller 21 refers to the index table 600, searches for an index number that is not yet associated with any S_ID, and selects any one of the index numbers. Then, the S_ID included in the I / O command is registered in the S_ID 601 column of the row corresponding to the selected index number (Index # 602).
- the controller 21 updates the distribution table 241.
- the controller 21 updates the distribution table 241.
- an entry whose S_ID (200-1) matches the S_ID included in the current I / O command is selected, and the information of the selected entry is transferred to the distribution table 241. sign up.
- the registration method in the distribution table 241 will be described by taking, for example, the case where the S_ID included in the current I / O command is AAA and the information shown in FIG. 2 is stored in the LDEV management table 200. To do.
- the LDEV # (200-3) selects 1, 2, and 3 entries (lines 201 to 203 in FIG. 2) from the LDEV management table 200, and registers the information of these three entries in the distribution table 241. To do.
- each information is stored in the distribution table 241 according to the rules described in the explanation of FIG. 5, if there is an index number and LUN information, the ownership (information stored in the MP # 502) and the LDEV # It is determined which position (address) in the distribution table 241 should be registered (information stored in LDEV # 503).
- the S_ID (AAA) included in the current I / O command is associated with the index number 01h
- information on the LDEV with the index number 1 and the LUN 0 is shown in the distribution table 241 of FIG. It can be seen that the address is stored in a 4-byte area starting from address 0x0000 0000 4000 0000.
- MP # 200-4 in the row 201 of the LDEV management table 200 (“0” in the example of FIG. 2) is added to each of MP # 502 and LDEV # 503 of the entry 0x0000 0000 4000 0000 in the allocation table 241.
- LDEV # 200-3 (“1” in the example of FIG. 2) is stored, and “1” is stored in En501.
- the information in the rows 202 and 203 in FIG. 2 is stored in the sorting table 241 (addresses 0x0000 0000 4000 0004, 0x0000 0000 4000 0008), thereby completing the updating of the sorting table 241.
- the allocation table 241 is a table storing information on ownership, LU, and LDEV, information registration / update occurs even when an LU is generated or when ownership changes. .
- the flow of information registration in the distribution table 241 will be described taking the case where an LU is generated as an example.
- the server 3 information S_ID
- the LDEV LDEV # associated with the LU to be defined and the LUN of the LU to be defined are specified. To do.
- the management terminal 4 instructs the storage controller 21 (21a or 21b) to generate an LU.
- the controller 21 registers the specified information in the S_ID 200-1, LUN 200-2, and LDEV # 200-3 columns of the LDEV management table 200 in the memories 24a and 24b.
- ownership information of the volume is automatically determined by the controller 21 and registered in MP # 200-4.
- the administrator may designate the controller 21 (MPU 23) with ownership.
- the controller 21 After registering in the LDEV management table 200 by LU definition work, the controller 21 updates the distribution table 241.
- the S_ID is converted into an Index number using the index table 600.
- the ownership information stored in MP # 502
- LDEV # information stored in LDEV # 503 are assigned to which position in distribution table 241 ( Address). For example, if the S_ID is converted to an Index number and the Index number is 0 and the LUN of the defined LU is 1, the information of the address 0x0000 0000 0000 0004 in the distribution table 241 in FIG. 5 can be updated. .
- the owner right information and LDEV # associated with the LU defined this time are stored in the MP # 502 and LDEV # 503 of the entry of the address 0x0000 0000 00000004 in the distribution table 241, and “1” is stored in the En501. Store. If the index number corresponding to the S_ID of the server 3 (or the virtual machine running on the server 3) has not been determined, registration to the distribution table 241 is not possible. 241 is not updated.
- the distribution module 33 can receive a plurality of I / O commands at the same time and perform a process of distributing the commands to the controller 21a or 21b. That is, it is possible to receive the second command from the MPU 31 while receiving the first command from the MPU 31 and performing the process of determining the transmission destination of the first command.
- the processing flow in this case will be described with reference to FIG.
- the distribution unit 35 When the MPU 31 generates the I / O command (1) and transmits it to the distribution module (FIG. 9: S3), the distribution unit 35 performs processing for determining the transmission destination of the I / O command (1), that is, FIG. 3 S4 (or processing of S41 to S45 in FIG. 7) and processing of S6 (access to the distribution table 241) are performed.
- the process for determining the transmission destination of the I / O command (1) is referred to as “task (1)”.
- the distribution unit 35 When the MPU 31 generates the I / O command (2) and transmits it to the distribution module during the processing of the task (1) (FIG. 9: S3 ′), the distribution unit 35 temporarily interrupts the task (1) ( (Task switching) (FIG.
- task (2) a process for determining the transmission destination of the I / O command (2) is started (this process is called “task (2)”).
- task (2) also performs access processing to distribution table 241.
- the access request to the distribution table 241 by the task (2) has been issued. This is because when the distribution module 33 accesses the memory 24 outside the server 3 (storage device 2), the response time becomes longer than when accessing the memory in the distribution module 33, and the task (2).
- the task (2) can access the distribution table 241 without waiting for the completion of the access request to the distribution table 241 by the task (1).
- the distribution unit 35 When the response to the access request to the distribution table 241 by the task (1) returns from the controller 21 to the distribution module 33, the distribution unit 35 also performs task switching (S5 ′), and the task (1 ), The process of transmitting the I / O command (1) (FIG. 9: S7) is performed. Thereafter, when the response to the access request to the distribution table 241 by the task (2) returns from the controller 21 to the distribution module 33, the distribution unit 35 also performs task switching (FIG. 9: S5 ''). ), The task (2) is executed, and transmission processing of the I / O command (2) (FIG. 9: S7 ′) is performed.
- the index number search fails and the access address to the allocation table 241 is calculated. It may not be generated.
- a process of accessing the memory 24 by specifying a dummy address is performed. If the index number search fails, there is no option other than sending an I / O command to the representative MP, so there is no need to access the memory 24, but a dummy address is designated for the following reason. Then, the memory 24 is accessed.
- the index number search fails in task (2) in FIG.
- the I / O command is transmitted directly to the representative MP when the index number search fails (without accessing the memory 24)
- an access request to the distribution table 241 by the task (1) is made. It takes time and task (2) may be able to send an I / O command to the representative MP before the response returns from the controller 21 to the distribution module 33. Then, an unfavorable situation occurs in which the processing order of the I / O command (1) and the I / O command (2) is switched. Therefore, in the allocating unit 35 according to the first embodiment of the present invention, the index number is changed. Even when the search fails, the process of accessing the memory 24 is performed.
- the distribution module 33 issues a plurality of access requests to the memory 24, responses corresponding to the access requests are returned in the order in which the access requests are issued (the order is guaranteed). Is).
- the process of accessing the dummy address on the memory 24 is only one method for guaranteeing the order of the I / O commands, and other methods can be adopted. For example, even if an I / O command issuance destination (for example, a representative MP) is determined by task (2), until an I / O command issuance destination of task (1) is determined, or task (1) is I / O Until the O command is issued to the storage apparatus 2, the distribution module 33 may perform a control such as waiting for the issue of the I / O command by the task (2) (waiting for the execution of S6 in FIG. 7). .
- an I / O command issuance destination for example, a representative MP
- the distribution module 33 may perform a control such as waiting for the issue of the I / O command by the task (2) (waiting for the execution of S6 in FIG. 7).
- a process performed by the storage apparatus 2 when one of the plurality of controllers 21 is stopped will be described with reference to FIG.
- This process is started by the controller 21 that has detected the stop when any of the controllers 21 in the storage apparatus 2 detects the stop of another controller 21.
- a case where a failure occurs in the controller 21a and the controller 21a stops and the controller 21b detects that the controller 21a has stopped will be described.
- the ownership of those volumes is changed to another controller 21 (controller 21b) (S110). Specifically, the ownership information managed by the LDEV management table 200 is changed. Referring to FIG.
- the controller 21b creates the distribution table 241b using the LDEV management table 200 and the index table 600 (S130), and the distribution table for the server 3 (the distribution module 33).
- the distribution table base address of 241b and information of the table read destination controller (controller 21b) are transmitted (S140), and the process is terminated.
- the setting is changed so that the server 3 will access the distribution table 241b in the controller 21b in the future.
- the controller 21b is managing the distribution table 241b.
- the server 3 does not need to change the access destination of the distribution table 241.
- the distribution table 241 includes ownership information, and this information needs to be updated, the distribution table 241b is updated based on the information in the LDEV management table 200 and the index table 600 (S150). ), The process is terminated.
- FIG. 12 is a diagram showing main components of the computer system 1000 according to the second embodiment of the present invention and their connection relations.
- the main components of the computer system 1000 include a storage controller module 1001 (sometimes abbreviated as “controller 1001”), a server blade (abbreviated as “blade” in the drawing) 1002, a host I / F module. 1003, a disk I / F module 1004, an SC module 1005, and an HDD 1007.
- the host I / F module 1003 and the disk I / F module 1004 may be collectively referred to as “I / O module”.
- the set of the controller 1001 and the disk I / F module 1004 has the same function as the storage controller 21 of the storage apparatus 2 in the first embodiment.
- the server blade 1002 has the same function as the server 3 in the first embodiment.
- a plurality of storage controller modules 1001, server blades 1002, host I / F modules 1003, disk I / F modules 1004, and SC modules 1005 may exist in the computer system 1000.
- a configuration in which two storage controller modules 1001 exist will be described. However, when it is necessary to distinguish between the two storage controller modules 1001, they are designated as “storage controller module 1001-1” (or “controller 1001- 1 ”),“ storage controller module 1001-2 ”(or“ controller 1001-2 ”).
- a configuration in which there are eight server blades 1002 will be described. However, when a plurality of server blades 1002 need to be distinguished and described, they are expressed as server blades 1002-1, 1002-2,... 1002-8. .
- PCIe Peripheral Component Interconnect Express
- I / O serial interface a type of expansion bus
- the controller 1001 provides a logical unit (LU) to the server blade 1002 and processes an I / O request from the server blade 1002.
- the controllers 1001-1 and 1001-2 have the same configuration, and each includes an MPU 1011a, an MPU 1011b, and storage memories 1012a and 1012b.
- the MPUs 1011a and 1011b in the controller 1001 are interconnected by a QPI (QuickPath Interconnect) link which is an inter-chip connection technology of Intel, and the MPUs 1011a of the controllers 1001-1 and 1001-2, and the controllers 1001-1,
- the MPUs 1011b of 1001-2 are connected to each other via NTB (Non-Transparent Bridge).
- NTB Non-Transparent Bridge
- each controller 1001 has a NIC for connecting to a LAN, as with the storage controller 21 of the first embodiment, and can communicate with a management terminal (not shown) via the LAN. It is in a state.
- the host I / F module 1003 is a module having an interface for connecting the host 1008 existing outside the computer system 1000 to the controller 1001.
- the host I / F module 1003 is a TBA (Target) for connecting to an HBA (Host Bus Adapter) of the host 1008. (Bus Adapter).
- the disk I / F module 1004 includes a SAS controller 10040 for connecting a plurality of hard disks (HDD) 1007 to the controller 1001.
- the controller 1001 receives write data from the server blade 1002 or the host 1008 as disk I / F.
- the data is stored in a plurality of HDDs 1007 connected to the F module 1004. That is, a set of the controller 1001, the host I / F module 1003, the disk I / F module 1004, and the plurality of HDDs 1007 corresponds to the storage apparatus 2 in the first embodiment.
- the HDD 1007 can use a semiconductor storage device such as an SSD in addition to a magnetic disk such as a hard disk.
- the server blade 1002 includes one or more MPUs 1021 and a memory 1022 and a mezzanine card 1023 on which an ASIC 1024 is mounted.
- the ASIC 1024 corresponds to the distribution module 33 mounted on the server 3 in the first embodiment, and details will be described later.
- the MPU 1021 may be a so-called multi-core processor having a plurality of processor cores.
- the SC module 1005 is a module equipped with a Signal Conditioner (SC) that is a transmission signal repeater, and is provided to prevent deterioration of signals flowing between the controller 1001 and the server blade 1002.
- SC Signal Conditioner
- FIG. 18 shows an example of a front view when the computer system 1000 is mounted on a rack such as a 19-inch rack.
- the components other than the HDD 1007 are stored in a single casing called a CPF chassis 1009.
- the HDD 1007 is stored in a housing called HDD Box 1010.
- the CPF chassis 1009 and the HDD Box 1010 are mounted on a rack such as a 19-inch rack, for example, the HDD 1007 (and the HDD Box 1010) are added as the amount of data handled by the computer system 1000 increases.
- the CPF chassis 1009 is installed at the bottom of the rack, and the HDD Box 1010 is installed on the CPF chassis 1009.
- FIG. 20 is a cross-sectional view taken along line A-A ′ shown in FIG.
- the controller 1001, the SC module 1005, and the server blade 1002 are mounted on the front surface of the CPF chassis 1009, and the connectors on the back surface of the controller 1001 and the server blade 1002 are connected to the backplane 1006.
- An I / O module (disk I / F module) 1004 is mounted on the back surface of the CPF chassis 1009, and is also connected to the backplane 1006 like the controller 1001.
- the backplane 1006 is a circuit board provided with a connector for interconnecting each component of the computer system 1000 such as the server blade 1002 and the controller 1001.
- the I / O module (host I / F module) 1003 is also mounted on the back surface of the CPF chassis 1009 and connected to the backplane 1006, similar to the disk I / F module 1004.
- FIG. 19 shows an example of a rear view of the computer system 1000.
- the host I / F module 1003 and the disk I / F module 1004 are both mounted on the back of the CPF chassis 1009.
- a fan, a LAN connector, or the like is mounted in the space below the I / O modules 1003 and 1004, but these are not indispensable components for the description of the present invention, and thus description thereof is omitted. .
- the server blade 1002 and the controller 1001 are connected via a communication line conforming to the PCIe standard with the SC module 1005 interposed therebetween, and the I / O modules 1003 and 1004 and the controller 1001 are also connected via a communication line conforming to the PCIe standard.
- Controllers 1001-1 and 1001-2 are also interconnected via NTB.
- the HDD Box 1010 arranged on the CPF chassis 1009 is connected to the I / O module 1004, and the connection is connected by a SAS cable wired on the back of the casing.
- the HDD Box 1010 is arranged on the CPF chassis 1009.
- the controller 1001 and the I / O module 1004 are arranged in close proximity. Therefore, the controller 1001 is mounted on the upper part in the CPF chassis 1009 and the server blade 1002 is installed in the CPF chassis 1009. Mounted at the bottom of the. Then, in particular, the length of the communication line between the lowermost server blade 1002 and the uppermost controller 1001 becomes longer. Therefore, the SC module 1005 for preventing the deterioration of the signal flowing between the two is replaced with the server blade 1002. It is inserted between the controllers 1001.
- the server blade 1002 includes an ASIC 1024 that is a device for distributing an I / O request (read, write command) to either of the controllers 1001-1 and 1001-2.
- ASIC 1024 is a device for distributing an I / O request (read, write command) to either of the controllers 1001-1 and 1001-2.
- PCIe is used as in the communication method between the controller 1000 and the server blade 1002.
- the MPU 1021 of the server blade 1002 incorporates a root complex (Root Complex; abbreviated as “RC” in the figure) 10211 for connecting the MPU 1021 and an external device, and the ASIC 1024 is connected to the root complex 10211.
- An endpoint (Endpoint, abbreviated as “EP” in the figure) 10241 which is a PCIe tree termination device is incorporated.
- the controller 1001 uses PCIe as a communication standard between the MPU 1011 in the controller 1001 and a device such as an I / O module.
- the MPU 1011 has a root complex 10112, and each I / O module (1003, 1004) incorporates an endpoint connected to the root complex 10112.
- the ASIC 1024 includes two end points (10242 and 10243) in addition to the end point 10241 described above. These two endpoints (10242 and 10243) are endpoints connected to the root complex 10112 of the MPU 1011 in the storage controller 1011 unlike the endpoint 10241 described above.
- one of the two end points (10242, 10243) (for example, the end point 10242) is connected to the root complex 10112 of the MPU 1011 in the storage controller 1011-1.
- the other endpoint (eg, endpoint 10243) is configured to be connected to the root complex 10112 of the MPU 1011 in the storage controller 1011-2. That is, the PCIe domain including the root complex 10211 and the endpoint 10241 is different from the PCIe domain including the root complex 10112 in the controller 1001-1 and the endpoint 10242.
- the domain including the route complex 10112 in the controller 1001-2 and the endpoint 10243 is also a PCIe domain different from other domains.
- the ASIC 1024 includes the endpoints 10241, 10242, and 10243 described above, an LRP 10244 that is a processor that executes a distribution process described later, and a DMA controller (DMAC) 10245 that executes a data transfer process between the server blade 1002 and the storage controller 1001. , An internal RAM 10246 is included.
- the functional block 10240 composed of the LRP 10244, the DMAC 10245, and the internal RAM 10246 operates as a PCIe master device. This is called the PCIe master block 10240.
- the MPU 1021 of the server blade 1021 cannot directly access the controller 1001 (such as the storage memory 1012). Conversely, the MPU 1011 of the controller 1001 cannot access the server memory 1022 of the server blade 1021.
- the components (LRP 10244 and DMAC 10245) of the PCIe master block 10240 can access (read and write) both the storage memory 1012 of the controller 1001 and the server memory 1022 of the server blade 1021.
- the ASIC 1024 includes a server MMIO space 10247, which is an MMIO space accessible by the MPU 1021 of the server blade 1002, and an CTL1 MMIO space 10248, which is an MMIO space accessible by the MPU 1011 (processor core 10111) of the controller 1001-1 (CTL1).
- CTL2 MMIO space 10249 which is an MMIO space accessible by the MPU 1011 (processor core 10111) of the controller 1001-2 (CTL2), is provided.
- the MPU 1011 (processor core 10111) and the MPU 1021 are configured to be able to instruct data transfer and the like to the LRP 10244, the DMAC 1024, and the like by reading and writing control information in the MMIO space. Yes.
- the PCIe domain including the root complex 10112 in the controller 1001-1 and the endpoint 10242 and the domain including the root complex 10112 in the controller 1001-2 and the endpoint 10243 are different PCIe domains, and the controller 1001 -1, 1001-2 MPUs 1011a and controllers 1001-1, 1001-2 MPUs 1011b are connected to each other via NTB, so that the controller 1001-1 (MPU 1011 or the like) to the controller 1001-2. It is possible to write (transfer) data to the storage memory (1012a, 1012b). Conversely, data can be written (transferred) from the controller 1001-2 (such as the MPU 1011) to the storage memory (1012a, 1012b) of the controller 1001-1.
- each controller 1001 has two MPUs 1011 (MPUs 1011a and 1011b), and each of the MPUs 1011a and 1011b has four processor cores 10111 as an example.
- Each processor core 10111 processes read and write command requests for volumes that come from the server blade 1002.
- storage memories 1012a and 1012b are connected to the MPUs 1011a and 1011b, respectively.
- the MPUs 1011a and 1011b are physically independent of each other, as described above, the MPUs 1011a and 1011b are interconnected by the QPI link, so the MPUs 1011a and 1011b (and the processor cores 10111 in the MPUs 1011a and 1011b).
- Can access any of the storage memories 1012a, 1012b can be accessed as a single memory space).
- the controller 1001-1 can be regarded as a configuration in which one MPU 1011-1 and one storage memory 1012-1 exist substantially.
- the controller 1001-2 can be regarded as a configuration in which one MPU 1011-2 and one storage memory 1012-2 exist substantially.
- the end point 10242 on the ASIC 1024 may be connected to the root complex 10112 of any MPU (1011a, 1011b) of the two MPUs (1011a, 1011b) on the controller 1001-1.
- the end point 10243 may be connected to the root complex 10112 of any MPU (1011a, 1011b) on the controller 1001-2.
- the MPUs 1011a and 1011b and the storage memories 1012a and 1012b in the controller 1001-1 are not distinguished, the MPU in the controller 1001-1 is referred to as “MPU1011-1”, and the storage memory is referred to as “storage memory 1012”. -1 ”.
- the MPU in the controller 1001-2 is expressed as “MPU1011-2”, and the storage memory is expressed as “storage memory 1012-2”.
- the MPUs 1011a and 1011b each have four processor cores 10111
- the MPUs 1011-1 and 1011-2 can be regarded as MPUs each including eight processor cores.
- LDEV management table Next, management information included in the storage controller 1001 according to the second embodiment of the present invention will be described. First, management information regarding the logical volume (LU) provided by the storage controller 1001 to the server blade 1002 and the host 1008 will be described.
- LU logical volume
- the controller 1001 in the second embodiment also has the same LDEV management table 200 as the LDEV management table 200 included in the controller 21 of the first embodiment.
- the content stored in the MP # 200-4 is slightly different from the LDEV management table 200 according to the first embodiment.
- processor cores exist for one controller 1001, that is, the total number of processor cores present in the controller 1001-1 and the controller 1001-2 is 16.
- each processor core in the second embodiment has an identification number of 0x00 to 0x0F
- the controller 1001-1 has processor cores with identification numbers of 0x00 to 0x07
- the controller 1001-2 has , It is assumed that there are processor cores having identification numbers of 0x08 to 0x0F.
- a processor core having an identification number N (N is a value from 0x00 to 0x0F) may be referred to as “core N”.
- the MP # 200-4 column of the LDEV management table 200 (information on the processor having LU ownership) is stored. In the column), a value of either 0 or 1 was stored.
- the MP # 200-4 column of the LDEV management table 200 in the second embodiment stores the identification number of the processor core having ownership (values from 0x00 to 0x0F).
- the storage memories 1012-1 and 1012-2 are provided with FIFO type areas for storing I / O commands issued by the server blade 1002 to the controller 1001. In the second embodiment, these are designated as command queues.
- FIG. 14 shows an example of a command queue provided in the storage memory 1012-1. As illustrated in FIG. 14, the command queue is provided for each server blade 1002 and for each processor core of the controller 1001. For example, when the server blade 1002-1 issues an I / O command to an LU for which the processor core (core 0x01) with the identification number 0x01 has ownership, the server blade 1002-1 has a command queue for the server blade 1002-1. The command is stored in the core 0x01 queue in the set 10131-1.
- a command queue for each server blade is provided in the storage memory 1012-2, but the command queue provided in the storage memory 1012-2 is a processor core of the MPU 1011-2, that is, an identification number 0x08 to This is different from the command queue provided in the storage memory 1012-1 in that it is a queue for storing commands for a processor core of 0x0F.
- the controller 1001 in the second embodiment also has a distribution table 241.
- the contents of the distribution table 241 are the same as those described in the first embodiment (FIG. 5).
- the identification number of the processor core that is, 0x00 to 0x0F
- MPU # 502 the identification number of the processor core
- other points are different from the distribution table in the first embodiment. The same.
- the controller 1001 in the second embodiment stores the same number of distribution tables as the number of server blades 1002 (for example, the server blade 1002-1).
- the controller 1001 has a total of two distributions: a distribution table for the server blade 1002-1 and a distribution table for the server blade 1002-2. Table).
- the controller 1001 creates a distribution table 241 (a storage area for storing the distribution table 241 is secured in the storage memory 1012 and the contents are initialized), and the server blade The base address of the allocation table is notified to 1002 (assumed to be server blade 1002-1) (FIG.
- the controller At that time, the controller generates a base address based on the top address on the storage memory 1012 in which a distribution table to be accessed by the server blade 1002-1 is stored among a plurality of distribution tables. Notify the generated base address.
- the server blades 1002-1 to 1002-8 can access the distribution table to be accessed by the server blades 1002-1 to 1002-8 among the eight distribution tables in the controller 1001 when determining the issue destination of the I / O command. it can.
- the storage position of the distribution table 241 on the storage memory 1012 may be fixed in advance, or may be determined dynamically by the controller 1001 when the distribution table is generated.
- the storage controller 21 derives an 8-bit Index number based on the information (S_ID) of the server 3 (or the virtual machine running on the server 3) included in the I / O command, and the server 3 Used the Index number to determine the access destination in the distribution table.
- the controller 21 manages the information on the correspondence relationship between the S_ID and the Index number in the index table 600.
- the controller 1001 in the second embodiment also holds an index table 600 and manages information on the correspondence between S_IDs and Index numbers.
- the controller 1001 in the second embodiment manages the index table 600 for each server blade 1002 connected to the controller 1001. Therefore, the same number of index tables 600 as the number of server blades 1002 are provided.
- Blade server side management information In order for the blade server 1002 of the second embodiment of the present invention to perform I / O distribution processing, information maintained and managed by the blade server 1002 is information (search) of the server 3 (distribution unit 35) of the first embodiment. Data table 3010, distribution table base address information 3110, and distribution table read destination CTL # information 3120). In the blade server 1002 according to the second embodiment, these pieces of information are stored in the internal RAM 10246 of the ASIC 1024.
- the MPU 1021 of the server blade 1002 generates an I / O command (S1001).
- the parameters of the I / O command include S_ID, which is information that can identify the transmission source server blade 1002, and the LUN of the access target LU.
- the parameter of the I / O command includes an address on the memory 1022 where the read data is to be stored.
- the MPU 1021 stores the parameters of the generated I / O command in the memory 1022.
- the MPU 1021 After storing the parameters of the I / O command in the memory 1022, the MPU 1021 notifies the ASIC 1024 that the storage of the I / O command is completed (S1002). At this time, the MPU 1021 notifies the ASIC 1024 by writing information to a predetermined address in the server MMIO space 10247.
- the processor (LRP 10244) of the ASIC 1024 that has received the command storage completion notification from the MPU 1021 reads the parameter of the I / O command from the memory 1022, stores it in the internal RAM 10246 of the ASIC 1024 (S1004), and processes the parameter (S1005).
- the parameter format of the command is different between the server blade 1002 side and the storage controller module 1001 side (for example, the command parameter created by the server blade 1002 includes a read data storage destination memory address, which is the storage controller module 1001 Therefore, the storage controller module 1001 performs processing such as removing unnecessary information.
- the LRP 10244 of the ASIC 1024 calculates the access address of the distribution table 241. This processing is the same as S4 (S41 to S45) described in FIG. 3 and FIG. 7 of the first embodiment, and the LRP 10244 obtains the Index number corresponding to the S_ID included in the I / O command from the search data table 3010. Then, an access address is calculated based on that. Further, the index number search may fail and the access address calculation may fail, as in the first embodiment. In this case, the LRP 10244 generates a dummy address as in the first embodiment.
- the LRP 10244 reads information on a predetermined address (access address of the distribution table 241 calculated in S1006) of the distribution table 241 of the controller 1001 (1001-1 or 1001-2) specified by the table read destination CTL # 3120. As a result, the processor (processor core) having the ownership of the LU to be accessed is identified.
- S1008 is the same processing as S7 (FIG. 3) of the first embodiment.
- the LRP 10244 writes the command parameter processed in S1005 to the storage memory 1012.
- FIG. 15 shows only an example in which the controller 1001 serving as the distribution table read destination in the processing of S1007 and the command parameter write destination controller 1001 in the processing of S1008 are the same.
- the controller 1001 to which the processor core having the ownership of the access target LU belongs, which is found in S1007 is different from the controller 1001 that is the read destination of the distribution table.
- the command parameter is written to the storage memory 1012 on the controller 1001 to which the processor core having the ownership of the LU to be accessed belongs.
- the identification number of the processor core having the ownership of the access target LU found in S1007 is in the range of 0x00 to 0x07, or 0x08. If the identification number is in the range of 0x00 to 0x07, the command parameter is written to the command queue provided on the storage memory 1012-1 of the controller 1001-1, and 0x08 If it is in the range of 0x0F, the command parameter is written to the command queue provided on the storage memory 1012-2 of the controller 1001-2.
- the LRP 10244 is provided on the storage memory 1012. Of the eight command queues for server blade 1002-1, command parameters are stored in the command queue for core 0x01. After storing the command parameters, the LRP 10244 notifies the processor core 10111 of the storage controller module 1001 (the processor core having ownership of the access target LU) that the storage of the command parameters has been completed.
- the LRP 10244 transmits an I / O command to a predetermined specific processor core (this processor core is referred to as “representative MP” as in the first embodiment). That is, the command parameter is stored in the command queue for the representative MP, and after the command parameter is stored, the completion of the storage of the command parameter is notified to the representative MP.
- the processor core 10111 of the storage controller module 1001 acquires I / O command parameters from the command queue, and prepares read data based on the acquired I / O command parameters. Specifically, data is read from the HDD 1007 and stored in the cache area of the storage memory 1012. In S1010, the processor core 10111 generates a DMA transfer parameter for transferring the read data stored in the cache area, and stores it in its own storage memory 1012. When the storage of the parameters for DMA transfer is completed, the processor core 10111 notifies the LRP 10244 of the ASIC 1024 that the storage has been completed (S1010). Specifically, this notification is realized by writing information to a predetermined address in the MMIO space (10248 or 10249) for the controller 1001.
- the LRP 10244 reads the DMA transfer parameter from the storage memory 1012.
- the I / O command parameter from the server blade 1002 stored in step S1004 is read out.
- the DMA transfer parameter read in S1011 includes the transfer source memory address (the address on the storage memory 1012) where the read data is stored, and the I / O command parameter from the server blade 1002 includes the read data. Transfer destination memory address (the address on the memory 1022 of the server blade 1002) is included.
- the LRP 10244 uses these pieces of information to transfer the read data on the storage memory 1012 to the memory 1022 of the server blade 1002.
- a DMA transfer list is generated and stored in the internal RAM 10246.
- the DMA controller 10245 notifies the LRP 10244 that the data transfer is completed (S1016).
- the LRP 10244 creates status information indicating completion of the I / O command, and writes the status information to the memory 1022 of the server blade 1002 and the storage memory 1012 of the storage controller module 1001 (S1017).
- the MPU 1021 of the server blade 1002 and the processor core 10111 of the storage controller module 1001 are notified of the completion of the processing, and the read processing is completed.
- the representative MP When the representative MP receives the I / O command (corresponding to S1008 in FIG. 15), it refers to the S_ID and LUN included in the I / O command, and the LDEV management table 200, and itself represents the LU to be accessed. It is determined whether or not the owner is owned (S11). If there is ownership, the process of the next S12 is performed by itself. If there is no ownership, the representative MP transfers an I / O command to the processor core having ownership, and the processor core having ownership. Receives an I / O command from the representative MP (S11 ′). Further, when the representative MP transmits an I / O command, information on the server blade 1002 that issued the I / O command (which server blade is one of the server blades 1002-1 to 1002-8, Is sent).
- the processor core processes the received I / O request and returns the processing result to the server 3.
- the processor core that has received the I / O command has ownership
- the processes of S1009 to S1017 described in FIGS. 15 and 16 are performed. If the processor core that has received the I / O command does not have ownership, the processor core to which the I / O command has been transferred (the processor core that has ownership) performs the processing of S1009, and the representative MP
- the data is transferred to the controller 1001 where the representative MP exists, and the processing after S1010 is performed by the representative MP.
- the processing after S13 ' is the same as the processing after S13 (FIG. 8) in the first embodiment.
- the controller 1001 of the second embodiment if the processor core having the ownership of the volume specified by the I / O command received in S1008 is different from the processor core receiving the I / O command, the ownership is granted.
- the processor core possessed performs the processing after S13 ′.
- FIG. 17 describes the processing flow in that case.
- the processor core When associating the S_ID included in the I / O command that has been processed up to S12 with the Index number, the processor core refers to the index table 600 for the server blade 1002 that is the command issue source, and is still in any S_ID. An index number that is not associated is searched, and one of the index numbers is selected.
- the processor core that performs the processing of S13 ′ is the processor core (representative MP) that received the I / O command in S11 ′. Information specifying the server blade 1002 that issued the command is received. Then, the S_ID included in the I / O command is registered in the S_ID 601 column of the row corresponding to the selected index number (Index # 602).
- S14 ′ The processing of S14 ′ is the same as S14 (FIG. 8) of the first embodiment, but since the distribution table 241 exists for each server blade 1002, the distribution table 241 for the server blade 1002 that is the command issue source is updated. This is a point different from the first embodiment.
- the processor core writes the index number information associated with S_ID in S13 to the search data table 3010 in the ASIC 1024 of the server blade 1002 that issued the command.
- the processor core is the CTL1 MMIO space 10248 (or CTL2 MMIO space).
- the information of S_ID is reflected in the search data table 3010 by writing to the predetermined address 10249).
- the distribution module 33 receives the first command from the MPU 31 of the server 3 and receives the second command from the MPU 31 while performing the process of determining the transmission destination of the first command.
- the ASIC 1024 of the second embodiment can process a plurality of commands at the same time, and this processing is the same as the processing of FIG. 9 of the first embodiment.
- the LU generation processing and the failure processing described in the first embodiment are performed in the same manner. Since the processing flow is the same as that described in the first embodiment, detailed description thereof is omitted. Note that the process of determining ownership information is performed in the course of these processes. However, in the computer system of the second embodiment, since the processor core has the ownership of the LU, the controller 1001 determines the ownership when determining the ownership. The point that one of the processor cores 10111 in the controller 1001 is selected instead of the MPU 1011 is different from the processing in the first embodiment.
- the controller 21a when a failure occurs, in the processing in the first embodiment, for example, when the controller 21a is stopped due to a failure, there is no other controller in the storage apparatus 2 that can handle the processing, so the controller 21a (MPU 23a) All the ownership information of each volume that has ownership has been changed to the controller 21b.
- the controller 1001-1 when one controller (for example, the controller 1001-1) stops, there are a plurality of processor cores that can handle each volume processing (eight processor cores in the controller 1001-2). 10111 can be in charge of processing).
- controller 1001-1 when one controller (for example, the controller 1001-1) stops, the remaining controller (controller 1001-2) has the ownership information of each volume. Change to one of the eight processor cores 10111. Other processes are the same as those described in the first embodiment.
- the embodiment of the present invention has been described above, but this is an example for explaining the present invention, and is not intended to limit the present invention to the embodiment described above.
- the present invention can be implemented in various other forms.
- the number of controllers 21, ports 26, and disk I / Fs 215 in the storage device 2 is not limited to the number described in FIG.
- a configuration having the controller 21 and the disk I / F 215 may be a configuration having three or more host I / Fs.
- the present invention is also effective when the HDD 22 is replaced with a storage medium such as an SSD.
- the distribution table 241 is stored in the memory of the storage device 2, but the distribution table may be provided in the distribution module 33 (or ASIC 1024).
- the distribution table is updated (as explained in the above embodiment)
- the controller when the first I / O access is issued from the server to the storage device, when the LU is defined in the storage device, the controller
- an updated distribution table is created in the storage device, and the updated result is reflected in the distribution module 33 (or ASIC 1024) from the storage device.
- the distribution module 33 in the first embodiment may be configured to be implemented as an ASIC (Application Specific Integrated Circuit) or FPGA (Field Programmable Gate Array), or a general-purpose processor is mounted in the distribution module 33. A lot of processing performed by the distribution module 33 may be realized by a program executed by a general-purpose processor.
- ASIC Application Specific Integrated Circuit
- FPGA Field Programmable Gate Array
- Computer system 2 Storage device 3: Server 4: Management terminal 6: LAN 7: I / O bus 21: Storage controller 22: HDD 23: MPU 24: Memory 25: Disk interface 26: Port 27: Inter-controller connection path 31: MPU 32: Memory 33: Distribution module 34: Interconnection switch 35: Distribution unit 36, 37: Port
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Software Systems (AREA)
- Computer Networks & Wireless Communication (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
本発明の実施例1におけるストレージ装置2は、1または複数のHDD22から1つまたは複数の論理ボリューム(LDEVとも呼ばれる)を形成する。各論理ボリュームにはストレージ装置2内で一意な番号を付与して管理しており、それを論理ボリューム番号(LDEV#)と呼ぶ。また、サーバ3がI/Oコマンドの発行時などにアクセス対象のボリュームを指定する場合、計算機システム1内でサーバ3を一意に特定可能な情報(あるいはサーバ3内で仮想計算機が動作する環境の場合には、仮想計算機を一意に特定可能な情報)であるS_IDと、論理ユニット番号(LUN)とを用いる。つまりサーバ3は、I/OコマンドのコマンドパラメータにS_IDとLUNとを含めることによって、アクセス対象のボリュームを一意に特定するのであって、サーバ3はボリュームを指定する際に、ストレージ装置2で用いられるLDEV#を用いない。そのためストレージ装置2では、LDEV#とLUNとの対応関係を管理する情報(論理ボリューム管理テーブル200)を保持し、その情報を用いて、サーバ3からI/Oコマンドで指定されたS_IDとLUNの組の情報をLDEV#に変換している。図2に示す論理ボリューム管理テーブル200(または「LDEV管理テーブル200」とも呼ばれる)は、LDEV#とLUNの対応関係を管理するテーブルで、コントローラ21a、21b内のそれぞれのメモリ24a、24bに同じテーブルが格納されている。S_ID200-1とLUN200-2の欄には、LDEV#200-4で特定される論理ボリュームに対応付けられている、サーバ3のS_ID、LUNが格納される。MP#200-4は、オーナ権という情報を格納する欄で、オーナ権については以下で説明する。
図3は、サーバ3がストレージ装置2に対してI/O要求を送信する際の処理の概要を示している。まずS1は、計算機システム1の起動後の初期設定時のみ行われる処理で、ストレージコントローラ21aまたは21bが振分テーブル241a、241bを生成し、そして振分テーブルのリード先情報と振分テーブルベースアドレス情報とをサーバ3の振分モジュール33に通知する。振分テーブル241は、オーナ権情報を格納したテーブルであり、その内容は後述する。また、S1での振分テーブル241a(または241b)の生成処理は、振分テーブル241を格納する記憶領域をメモリ上に確保し、内容を初期化(たとえばテーブル上全領域に0を書き込む等)する処理である。
続いて、図3のS4で振分モジュール33が算出する振分テーブル241のアクセスアドレス及び振分テーブル241の内容について図4、図5を用いて説明する。ストレージコントローラ21のメモリ24は、64ビットのアドレス空間を持つ記憶領域で、振分テーブル241はメモリ24内の連続領域に格納されている。図4は、振分モジュール33が算出する振分テーブル241内アドレス情報のフォーマットを示している。このアドレス情報は、42ビットの振分テーブルベースアドレス、8ビットのIndex、12ビットのLUN、そして2ビットの固定値(値は00)で構成される。振分テーブルベースアドレスは、図3のS2で振分モジュール33がコントローラ21から受信する情報である。
次に、サーバ3の振分部35の行う処理(図3のS4、S6に相当する処理)の詳細を説明するが、その前に振分部35が自身のメモリ内に保持する情報について図6を用いて説明する。振分部35がI/O振分け処理を行うために必要な情報として、検索データテーブル3010、振分けテーブルベースアドレス情報3110、振分けテーブルリード先CTL#情報3120がある。検索データテーブル3010のIndex#3011には、S_ID3012の欄に格納されているS_IDに対応するIndex番号が格納されており、サーバ3からI/Oコマンドを受信すると、この検索データテーブル3010を用いてI/Oコマンド中のS_IDからIndex番号を導出する。ただし、図6の検索データテーブル3010の構成は一例であり、図6で示した構成以外に、たとえばS_ID3012の欄のみを有するテーブルで、Index番号が0番、1番、2番…のS_IDが、S_ID3012欄の先頭から順に格納されるテーブルなどを用いても、本発明は有効である。
図7を用いて、サーバ3の振分部35の行う処理(図3のS4、S6に相当する処理)の詳細を説明する。振分部35がポート36を介してMPU31からI/Oコマンドを受信すると、I/Oコマンドに含まれるサーバ3(あるいはサーバ3上の仮想計算機)のS_ID、アクセス対象LUのLUNを抽出する(S41)。振分部35は続いて、抽出されたS_IDをIndex番号に変換する処理を行う。この時、振分部35内で管理する検索データテーブル3010を用いる。振分部35は検索データテーブル3010のS_ID3012を参照して、S41で抽出したS_IDと一致する行(エントリ)を検索する。
続いて、Index番号の検索が失敗した場合に(S43の判定で、Noの場合)、代表MPに対して送信されたI/Oコマンドを受信した、ストレージ装置2での処理の流れを、図8を用いて説明する。代表MP(ここでは、コントローラ21aのMPU23aが代表MPだった場合を例にとって説明する)がI/Oコマンドを受信すると、コントローラ21aは、I/Oコマンドに含まれているS_IDとLUN、及びLDEV管理テーブル200を参照して、自身がアクセス対象のLUのオーナ権を持っているか判定する(S11)。オーナ権がある場合は、以降の処理をコントローラ21aで実施し、オーナ権がない場合には、コントローラ21bにI/Oコマンドを転送する。以降の処理はコントローラ21a、21bのいずれか一方で行われ、かつコントローラ21a、21bのいずれで処理が行われたとしても、処理に大きな違いはないため、「コントローラ21」が処理を行うこととして記載している。
振分テーブル241は、オーナ権、LU、LDEVに関する情報を格納しているテーブルであるため、LUが生成された場合やオーナ権に変更が発生する場合にも、情報の登録・更新が発生する。ここではLUを生成した場合を例にとって、振分テーブル241への情報登録の流れを説明する。
なお、本発明の実施例1の振分モジュール33は、同時に複数のI/Oコマンドを受信し、コントローラ21aまたは21bへと振り分ける処理を行うことが可能である。つまりMPU31から第1のコマンドを受信し、第1のコマンドの送信先の決定処理を行っている間に、MPU31から第2のコマンドを受信することが可能である。この場合の処理の流れについて、図9を用いて説明する。
続いて、本発明の実施例1におけるストレージ装置2で障害が発生した場合、特に複数のコントローラ21のうちの1つが停止した場合の処理について説明する。1つのコントローラ21が停止した時、当該停止したコントローラ21が振分テーブル241を有していたとすると、サーバ3はそれ以降振分テーブル241にアクセスできなくなるため、振分テーブル241を別のコントローラ21に移動し(再作成し)、かつ振分モジュールが振分テーブル241にアクセスする際のアクセス先コントローラ21の情報を変更する必要がある。また停止したコントローラ21がオーナ権を有していたボリュームについてもオーナ権を変更する必要がある。
続いて、本発明の実施例2におけるストレージコントローラ1001が有する管理情報について説明していく。まずストレージコントローラ1001がサーバブレード1002やホスト1008に提供する論理ボリューム(LU)についての管理情報について説明する。
ストレージメモリ1012-1、1012-2には、サーバブレード1002がコントローラ1001に対して発行するI/Oコマンドを格納する、FIFO型の領域が設けられており、実施例2ではそれをコマンドキューと呼ぶ。図14に、ストレージメモリ1012-1に設けられているコマンドキューの一例を示す。図14に示されているように、コマンドキューはサーバブレード1002ごと、及びコントローラ1001のプロセッサコアごとに設けられている。たとえばサーバブレード1002-1が、識別番号0x01のプロセッサコア(コア0x01)がオーナ権を持つLUに対するI/Oコマンドを発行すると、サーバブレード1002-1は、サーバブレード1002-1用のコマンドキューの集合10131-1の中の、コア0x01用キューにコマンドを格納する。なお、ストレージメモリ1012-2にも同様に、サーバブレードごとのコマンドキューが設けられているが、ストレージメモリ1012-2に設けられるコマンドキューは、MPU1011-2の持つプロセッサコア、つまり識別番号0x08~0x0Fのプロセッサコアに対するコマンドを格納するキューである点が、ストレージメモリ1012-1に設けられているコマンドキューと異なる。
実施例2におけるコントローラ1001も実施例1のコントローラ21と同様、振分テーブル241を有する。振分テーブル241の内容は実施例1で説明したもの(図5)と同様である。相違点は、実施例2の振分テーブル241では、MPU#502にプロセッサコアの識別番号(つまり0x00~0x0F)が格納される点であり、それ以外の点は実施例1における振分テーブルと同じである。
実施例1のストレージコントローラ21は、I/Oコマンドに含まれるサーバ3(あるいはサーバ3上で動作する仮想計算機)の情報(S_ID)をもとに、8ビットのIndex番号を導出し、サーバ3はIndex番号を用いて振分テーブル内アクセス先を決定していた。そしてコントローラ21は、S_IDとIndex番号の対応関係の情報をインデックステーブル600に管理していた。実施例2におけるコントローラ1001も同様に、インデックステーブル600を保持し、S_IDとIndex番号の対応関係の情報を管理している。
本発明の実施例2のブレードサーバ1002がI/O振分処理を行うために、ブレードサーバ1002で維持管理する情報は、実施例1のサーバ3(の振分部35)が有する情報(検索データテーブル3010、振分けテーブルベースアドレス情報3110、振分けテーブルリード先CTL#情報3120)と同じである。実施例2のブレードサーバ1002では、これらの情報はASIC1024の内部RAM10246に格納される。
続いて、図15、16を用いて、サーバブレード1002がストレージコントローラモジュール1001に対してI/O要求(リード要求を例にとる)を送信する際の処理の概要を説明する。この処理の流れは実施例1の図3に記載の流れと同様である。なお、実施例2における計算機システム1000でも、初期設定時には、図3のS1、S2の処理(振分テーブルの生成と、振分テーブルのリード先、振分テーブルベースアドレス情報の送信)は行われるが、図15、16ではその処理は省略している。
続いて、Index番号の検索が失敗した場合の処理(たとえばサーバブレード1002(またはサーバブレード1002上で稼働する仮想計算機)が最初にコントローラ1002にI/O要求を発行した時など)について、図17を用いて説明する。この処理は、実施例1における図8の処理と同様である。
実施例1において、振分モジュール33がサーバ3のMPU31から第1のコマンドを受信し、第1のコマンドの送信先の決定処理を行っている間に、MPU31から第2のコマンドを受信して処理することが可能であることを説明した。実施例2のASIC1024も同様に、複数のコマンドを同時期に処理することが可能で、この処理は実施例1の図9の処理と同じである。
また実施例2の計算機システムでも、実施例1で説明したLU生成時の処理や障害発生時の処理は、同様に実施される。処理の流れは実施例1で説明したものと同じであるため詳細な説明は省略する。なお、これらの処理の過程でオーナ権情報を決定する処理が行われるが、実施例2の計算機システムではLUのオーナ権をプロセッサコアが持つため、オーナ権の決定の際には、コントローラ1001はMPU1011ではなく、コントローラ1001内のいずれかのプロセッサコア10111を選択する点が、実施例1における処理と異なる。
2: ストレージ装置
3: サーバ
4: 管理端末
6: LAN
7: I/Oバス
21: ストレージコントローラ
22: HDD
23: MPU
24: メモリ
25: ディスクインタフェース
26: ポート
27: コントローラ間接続パス
31: MPU
32: メモリ
33: 振分モジュール
34: 相互接続スイッチ
35: 振分部
36,37: ポート
Claims (14)
- 1以上のサーバと、ストレージ装置とを有する計算機システムであって、
前記ストレージ装置は、1以上の記憶媒体と、第1プロセッサ及び第1メモリを有する第1コントローラと、第2プロセッサ及び第2メモリを有する第2コントローラとを有し、第1コントローラと第2コントローラはいずれも前記サーバに接続されており、
前記サーバは、第3プロセッサと第3メモリと、前記第3プロセッサが発行する前記ストレージ装置に対するI/O要求を前記第1プロセッサまたは第2プロセッサのいずれかに送信する振分モジュールを有し、
前記振分モジュールは、
前記第3プロセッサが第1のI/O要求を発行すると、前記ストレージ装置が提供する振分情報に基づいて、前記第1プロセッサまたは前記第2プロセッサのいずれを前記第1のI/O要求の送信先にするか決定する処理を開始し、
前記第1のI/O要求の送信先が決定する前に、前記第3プロセッサから第2のI/O要求を受信すると、前記ストレージ装置が提供する振分情報に基づいて、前記第1プロセッサまたは前記第2プロセッサのいずれを前記第2のI/O要求の送信先にするか決定する処理を開始し、
前記第1のI/O要求の送信先が決定すると、前記決定された送信先に対して前記第1のI/O要求を送信し、
前記第1のI/O要求の送信先が決定するまでは、前記第2のI/O要求を前記送信先に送信しない、
ことを特徴とする、計算機システム。 - 前記ストレージ装置は、前記第1メモリまたは前記第2メモリに、前記サーバのI/O要求の送信先についての情報を格納した振分テーブルを有し、
前記振分モジュールは、前記第3プロセッサからI/O要求を受信すると、前記振分テーブルに格納された前記情報を取得し、前記情報に基づいて前記第1プロセッサ、前記第2プロセッサのいずれを前記I/O要求の送信先にするか決定することを特徴とする、請求項1に記載の計算機システム。 - 前記ストレージ装置は、前記1以上の記憶媒体で構成される複数のボリュームを前記サーバに提供し、
前記第3プロセッサから発行されるI/O要求には少なくとも、前記サーバに与えられた固有の識別子と、前記ストレージ装置が提供するボリュームの論理ユニット番号(LUN)が含まれており、
前記振分テーブルには、前記ボリュームごとに、前記I/O要求の送信先についての情報が格納されている計算機システムであって、
前記振分モジュールは、
前記識別子及び前記識別子に対応付けられたインデックス番号との対応関係についての情報を格納した検索データテーブルを有し、
前記第3プロセッサから前記第1のI/O要求を受信すると、前記検索データテーブルを参照し、前記識別子が前記検索データテーブルに存在する場合には、前記識別子に基づいて前記インデックス番号を特定し、
前記特定したインデックス番号と前記第1のI/O要求に含まれるLUNとに基づいて前記振分テーブル内の参照先アドレスを決定し、前記参照先アドレスで特定される前記第1メモリまたは第2メモリ上の領域に格納された情報を読み出すことによって、前記第1のI/O要求の送信先についての情報を取得し、
前記取得した情報に基づいて、前記第1プロセッサ、前記第2プロセッサのいずれを前記第1のI/O要求の送信先にするか決定することを特徴とする、
請求項2に記載の計算機システム。 - 前記計算機システムでは、前記サーバの識別子に対応付けられたインデックス番号が前記検索データテーブルに存在しない場合の前記I/O要求の送信先の情報である、代表プロセッサ情報があらかじめ定義されており、
前記振分モジュールは、前記第3プロセッサから前記第2のI/O要求を受信すると、前記検索データテーブルを参照し、前記第2のI/O要求に含まれている前記識別子が前記検索データテーブルに存在しなかった場合、前記第1メモリまたは第2メモリ上の所定領域のデータの読み出しを実行した後、前記代表プロセッサ情報で特定される送信先に対して前記第2のI/O要求を送信することを特徴とする、
請求項3に記載の計算機システム。 - 前記ストレージ装置は、前記第2のI/O要求に対する応答を前記サーバに返却した後、
前記識別子に対応付けられるべきインデックス番号を決定し、前記決定したインデックス番号を前記検索データテーブルに、前記識別子と対応付けて格納する
ことを特徴とする、請求項4に記載の計算機システム。 - 前記ストレージ装置では、前記ボリュームに対するI/O要求の処理を担当するプロセッサが、前記ボリュームごとにあらかじめ定められており、
前記振分テーブルに格納されている、前記ボリュームごとの前記I/O要求の送信先についての情報は、前記各ボリュームに対するI/O要求を担当するプロセッサの情報である
ことを特徴とする、請求項3に記載の計算機システム。 - 前記第1プロセッサ及び前記第2プロセッサは、それぞれ複数のプロセッサコアを有し、
前記振分モジュールは、前記第3プロセッサからI/O要求を受信すると、前記振分テーブルに格納された前記情報を取得し、前記情報に基づいて前記第1プロセッサまたは前記第2プロセッサの有する複数のプロセッサコアのうち、いずれのプロセッサコアを前記I/O要求の送信先にするか決定することを特徴とする、請求項2に記載の計算機システム。 - 1以上のサーバと、ストレージ装置とを有する計算機システムの制御方法であって、
前記ストレージ装置は、1以上の記憶媒体と、第1プロセッサ及び第1メモリを有する第1コントローラと、第2プロセッサ及び第2メモリを有する第2コントローラとを有し、第1コントローラと第2コントローラはいずれも前記サーバに接続されており、
前記サーバは、第3プロセッサと第3メモリと、前記第3プロセッサが発行する前記ストレージ装置に対するI/O要求を前記第1プロセッサまたは第2プロセッサのいずれかに送信する振分モジュールを有し、
前記振分モジュールは、
前記第3プロセッサが第1のI/O要求を発行すると、前記ストレージ装置が提供する振分情報に基づいて、前記第1プロセッサまたは前記第2プロセッサのいずれを前記第1のI/O要求の送信先にするか決定する処理を開始し、
前記第1のI/O要求の送信先が決定する前に、前記第3プロセッサから第2のI/O要求を受信すると、前記ストレージ装置が提供する振分情報に基づいて、前記第1プロセッサまたは前記第2プロセッサのいずれを前記第2のI/O要求の送信先にするか決定する処理を開始し、
前記第1のI/O要求の送信先が決定すると、前記決定された送信先に対して前記第1のI/O要求を送信し、
前記第1のI/O要求の送信先が決定するまでは、前記第2のI/O要求を前記送信先に送信しない、
ことを特徴とする、計算機システムの制御方法。 - 前記ストレージ装置は、前記第1メモリまたは前記第2メモリに、前記サーバのI/O要求の送信先についての情報を格納した振分テーブルを有し、
前記振分モジュールは、前記第3プロセッサからI/O要求を受信すると、前記振分テーブルに格納された前記情報を取得し、前記情報に基づいて前記第1プロセッサ、前記第2プロセッサのいずれを前記I/O要求の送信先にするか決定することを特徴とする、請求項8に記載の計算機システムの制御方法。 - 前記ストレージ装置は、前記1以上の記憶媒体で構成される複数のボリュームを前記サーバに提供するストレージ装置であって、
前記第3プロセッサから発行されるI/O要求には少なくとも、前記サーバに与えられた固有の識別子と、前記ストレージ装置が提供するボリュームの論理ユニット番号(LUN)が含まれており、
前記振分テーブルには、前記ボリュームごとに、前記I/O要求の送信先についての情報が格納されており、
前記振分モジュールは、前記識別子及び前記識別子に対応付けられたインデックス番号との対応関係についての情報を格納した検索データテーブルを有しており、
前記振分モジュールは、
前記第3プロセッサから前記第1のI/O要求を受信すると、前記検索データテーブルを参照し、前記識別子が前記検索データテーブルに存在する場合には、前記識別子に基づいて前記インデックス番号を特定し、
前記特定したインデックス番号と前記第1のI/O要求に含まれるLUNとに基づいて前記振分テーブル内の参照先アドレスを決定し、前記参照先アドレスで特定される前記第1メモリまたは第2メモリ上の領域に格納された情報を読み出すことによって、前記第1のI/O要求の送信先についての情報を取得し、
前記取得した情報に基づいて、前記第1プロセッサ、前記第2プロセッサのいずれを前記第1のI/O要求の送信先にするか決定する、
ことを特徴とする、請求項9に記載の計算機システムの制御方法。 - 前記計算機システムでは、前記サーバの識別子に対応付けられたインデックス番号が前記検索データテーブルに存在しない場合の前記I/O要求の送信先の情報である、代表プロセッサ情報があらかじめ定義されており、
前記振分モジュールは、前記第3プロセッサから前記第2のI/O要求を受信すると、前記検索データテーブルを参照し、前記第2のI/O要求に含まれている前記識別子が前記検索データテーブルに存在しなかった場合、前記第1メモリまたは第2メモリ上の所定領域のデータの読み出しを実行した後、前記代表プロセッサ情報で特定される送信先に対して前記第2のI/O要求を送信する、
ことを特徴とする、請求項10に記載の計算機システムの制御方法。 - 前記ストレージ装置は、前記第2のI/O要求に対する応答を前記サーバに返却した後、
前記識別子に対応付けられるべきインデックス番号を決定し、前記決定したインデックス番号を前記検索データテーブルに、前記識別子と対応付けて格納する
ことを特徴とする、請求項11に記載の計算機システムの制御方法。 - 前記ストレージ装置では、前記ボリュームに対するI/O要求の処理を担当するプロセッサが、前記ボリュームごとにあらかじめ定められており、
前記振分テーブルに格納されている、前記ボリュームごとの前記I/O要求の送信先についての情報は、前記各ボリュームに対するI/O要求を担当するプロセッサの情報である
ことを特徴とする、請求項10に記載の計算機システムの制御方法。 - 前記第1プロセッサ及び前記第2プロセッサは、それぞれ複数のプロセッサコアを有し、
前記振分モジュールは、前記第3プロセッサからI/O要求を受信すると、前記振分テーブルに格納された前記情報を取得し、前記情報に基づいて前記第1プロセッサまたは前記第2プロセッサの有する複数のプロセッサコアのうち、いずれのプロセッサコアを前記I/O要求の送信先にするか決定することを特徴とする、請求項9に記載の計算機システムの制御方法。
Priority Applications (6)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2015550262A JP6068676B2 (ja) | 2013-11-28 | 2013-11-28 | 計算機システム及び計算機システムの制御方法 |
GB1515783.7A GB2536515A (en) | 2013-11-28 | 2013-11-28 | Computer system, and a computer system control method |
PCT/JP2013/082006 WO2015079528A1 (ja) | 2013-11-28 | 2013-11-28 | 計算機システム及び計算機システムの制御方法 |
CN201380073594.2A CN105009100A (zh) | 2013-11-28 | 2013-11-28 | 计算机系统及计算机系统的控制方法 |
DE112013006634.3T DE112013006634T5 (de) | 2013-11-28 | 2013-11-28 | Computersystem und Computersystemsteuerverfahren |
US14/773,886 US20160224479A1 (en) | 2013-11-28 | 2013-11-28 | Computer system, and computer system control method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2013/082006 WO2015079528A1 (ja) | 2013-11-28 | 2013-11-28 | 計算機システム及び計算機システムの制御方法 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2015079528A1 true WO2015079528A1 (ja) | 2015-06-04 |
Family
ID=53198517
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2013/082006 WO2015079528A1 (ja) | 2013-11-28 | 2013-11-28 | 計算機システム及び計算機システムの制御方法 |
Country Status (6)
Country | Link |
---|---|
US (1) | US20160224479A1 (ja) |
JP (1) | JP6068676B2 (ja) |
CN (1) | CN105009100A (ja) |
DE (1) | DE112013006634T5 (ja) |
GB (1) | GB2536515A (ja) |
WO (1) | WO2015079528A1 (ja) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2017072827A1 (ja) * | 2015-10-26 | 2017-05-04 | 株式会社日立製作所 | 計算機システム、及び、アクセス制御方法 |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104811473B (zh) * | 2015-03-18 | 2018-03-02 | 华为技术有限公司 | 一种创建虚拟非易失性存储介质的方法、系统及管理系统 |
US10277677B2 (en) * | 2016-09-12 | 2019-04-30 | Intel Corporation | Mechanism for disaggregated storage class memory over fabric |
CN106648851A (zh) * | 2016-11-07 | 2017-05-10 | 郑州云海信息技术有限公司 | 一种多控存储中io管理的方法和装置 |
KR102367359B1 (ko) * | 2017-04-17 | 2022-02-25 | 에스케이하이닉스 주식회사 | 직렬 시스템 버스 인터페이스 및 직접메모리액세스 컨트롤러를 갖는 전자 시스템 및 그 동작 방법 |
KR20210046348A (ko) * | 2019-10-18 | 2021-04-28 | 삼성전자주식회사 | 복수의 프로세서들에 유연하게 메모리를 할당하기 위한 메모리 시스템 및 그것의 동작 방법 |
WO2021174063A1 (en) * | 2020-02-28 | 2021-09-02 | Nebulon, Inc. | Cloud defined storage |
CN113297112B (zh) * | 2021-04-15 | 2022-05-17 | 上海安路信息科技股份有限公司 | PCIe总线的数据传输方法、系统及电子设备 |
CN114442955B (zh) * | 2022-01-29 | 2023-08-04 | 苏州浪潮智能科技有限公司 | 全闪存储阵列的数据存储空间管理方法及装置 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH11338648A (ja) * | 1998-02-26 | 1999-12-10 | Nec Corp | ディスクアレイ装置、そのエラ―制御方法、ならびにその制御プログラムを記録した記録媒体 |
JP2004240949A (ja) * | 2002-11-26 | 2004-08-26 | Hitachi Ltd | クラスタ型ストレージシステム及びその管理方法 |
JP2013517537A (ja) * | 2010-04-21 | 2013-05-16 | 株式会社日立製作所 | ストレージシステム及びストレージシステムにおけるオーナ権制御方法 |
JP2013524334A (ja) * | 2010-09-09 | 2013-06-17 | 株式会社日立製作所 | コマンドの起動を制御するストレージ装置及びその方法 |
JP2013196176A (ja) * | 2012-03-16 | 2013-09-30 | Nec Corp | 排他制御システム、排他制御方法および排他制御プログラム |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP4039794B2 (ja) * | 2000-08-18 | 2008-01-30 | 富士通株式会社 | マルチパス計算機システム |
CN100375080C (zh) * | 2005-04-15 | 2008-03-12 | 中国人民解放军国防科学技术大学 | 大规模分布共享系统中的输入输出分组节流方法 |
US7624262B2 (en) * | 2006-12-20 | 2009-11-24 | International Business Machines Corporation | Apparatus, system, and method for booting using an external disk through a virtual SCSI connection |
JP5072692B2 (ja) * | 2008-04-07 | 2012-11-14 | 株式会社日立製作所 | 複数のストレージシステムモジュールを備えたストレージシステム |
CN102112967B (zh) * | 2008-08-04 | 2014-04-30 | 富士通株式会社 | 多处理器系统、多处理器系统用管理装置以及方法 |
JP5282046B2 (ja) * | 2010-01-05 | 2013-09-04 | 株式会社日立製作所 | 計算機システム及びその可用化方法 |
JP5691306B2 (ja) * | 2010-09-03 | 2015-04-01 | 日本電気株式会社 | 情報処理システム |
JP5660986B2 (ja) * | 2011-07-14 | 2015-01-28 | 三菱電機株式会社 | データ処理システム及びデータ処理方法及びプログラム |
-
2013
- 2013-11-28 JP JP2015550262A patent/JP6068676B2/ja not_active Expired - Fee Related
- 2013-11-28 US US14/773,886 patent/US20160224479A1/en not_active Abandoned
- 2013-11-28 GB GB1515783.7A patent/GB2536515A/en not_active Withdrawn
- 2013-11-28 WO PCT/JP2013/082006 patent/WO2015079528A1/ja active Application Filing
- 2013-11-28 CN CN201380073594.2A patent/CN105009100A/zh active Pending
- 2013-11-28 DE DE112013006634.3T patent/DE112013006634T5/de not_active Withdrawn
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH11338648A (ja) * | 1998-02-26 | 1999-12-10 | Nec Corp | ディスクアレイ装置、そのエラ―制御方法、ならびにその制御プログラムを記録した記録媒体 |
JP2004240949A (ja) * | 2002-11-26 | 2004-08-26 | Hitachi Ltd | クラスタ型ストレージシステム及びその管理方法 |
JP2013517537A (ja) * | 2010-04-21 | 2013-05-16 | 株式会社日立製作所 | ストレージシステム及びストレージシステムにおけるオーナ権制御方法 |
JP2013524334A (ja) * | 2010-09-09 | 2013-06-17 | 株式会社日立製作所 | コマンドの起動を制御するストレージ装置及びその方法 |
JP2013196176A (ja) * | 2012-03-16 | 2013-09-30 | Nec Corp | 排他制御システム、排他制御方法および排他制御プログラム |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2017072827A1 (ja) * | 2015-10-26 | 2017-05-04 | 株式会社日立製作所 | 計算機システム、及び、アクセス制御方法 |
JPWO2017072827A1 (ja) * | 2015-10-26 | 2018-08-16 | 株式会社日立製作所 | 計算機システム、及び、アクセス制御方法 |
US10592274B2 (en) | 2015-10-26 | 2020-03-17 | Hitachi, Ltd. | Computer system and access control method |
Also Published As
Publication number | Publication date |
---|---|
JPWO2015079528A1 (ja) | 2017-03-16 |
JP6068676B2 (ja) | 2017-01-25 |
GB2536515A (en) | 2016-09-21 |
GB201515783D0 (en) | 2015-10-21 |
DE112013006634T5 (de) | 2015-10-29 |
CN105009100A (zh) | 2015-10-28 |
US20160224479A1 (en) | 2016-08-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6068676B2 (ja) | 計算機システム及び計算機システムの制御方法 | |
JP6074056B2 (ja) | 計算機システムおよびデータ制御方法 | |
US7269646B2 (en) | Method for coupling storage devices of cluster storage | |
US7865676B2 (en) | Load equalizing storage controller and control method for the same | |
US20180189109A1 (en) | Management system and management method for computer system | |
JP4462852B2 (ja) | ストレージシステム及びストレージシステムの接続方法 | |
JP7116381B2 (ja) | クラウド・ベースのランクを使用するデータの動的再配置 | |
US10585609B2 (en) | Transfer of storage operations between processors | |
US20170102874A1 (en) | Computer system | |
US20070067432A1 (en) | Computer system and I/O bridge | |
JP7135162B2 (ja) | 情報処理システム、ストレージシステム及びデータ転送方法 | |
JP2007207007A (ja) | ストレージシステム、ストレージコントローラ及び計算機システム | |
JP2005275525A (ja) | ストレージシステム | |
JP2004227558A (ja) | 仮想化制御装置およびデータ移行制御方法 | |
JP6703600B2 (ja) | 計算機システム及びサーバ | |
JP2007048323A (ja) | 仮想化制御装置およびデータ移行制御方法 | |
US9239681B2 (en) | Storage subsystem and method for controlling the storage subsystem | |
US11080192B2 (en) | Storage system and storage control method | |
US11989455B2 (en) | Storage system, path management method, and recording medium | |
WO2017072868A1 (ja) | ストレージ装置 | |
JP2006155640A (ja) | アクセスの設定方法 | |
US11016698B2 (en) | Storage system that copies write data to another storage system | |
US11201788B2 (en) | Distributed computing system and resource allocation method | |
US9529721B2 (en) | Control device, and storage system | |
JP7118108B2 (ja) | クラウドサーバ、ストレージシステム、及び計算機システム |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
ENP | Entry into the national phase |
Ref document number: 2015550262 Country of ref document: JP Kind code of ref document: A |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 13898100 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 112013006634 Country of ref document: DE Ref document number: 1120130066343 Country of ref document: DE |
|
ENP | Entry into the national phase |
Ref document number: 201515783 Country of ref document: GB Kind code of ref document: A Free format text: PCT FILING DATE = 20131128 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 1515783.7 Country of ref document: GB |
|
WWE | Wipo information: entry into national phase |
Ref document number: 14773886 Country of ref document: US |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 13898100 Country of ref document: EP Kind code of ref document: A1 |