US10452557B2 - Storage apparatus, computer system, and method for improved read operation handling - Google Patents
Storage apparatus, computer system, and method for improved read operation handling Download PDFInfo
- Publication number
- US10452557B2 US10452557B2 US15/540,042 US201515540042A US10452557B2 US 10452557 B2 US10452557 B2 US 10452557B2 US 201515540042 A US201515540042 A US 201515540042A US 10452557 B2 US10452557 B2 US 10452557B2
- Authority
- US
- United States
- Prior art keywords
- command
- address
- read
- data
- processor
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active, expires
Links
- 238000000034 method Methods 0.000 title claims description 226
- 230000015654 memory Effects 0.000 claims abstract description 57
- 230000008569 process Effects 0.000 claims description 218
- 238000006243 chemical reaction Methods 0.000 claims description 74
- 230000004044 response Effects 0.000 claims description 35
- 239000004065 semiconductor Substances 0.000 claims description 5
- 238000012545 processing Methods 0.000 description 116
- 238000012546 transfer Methods 0.000 description 92
- QFJCIRLUMZQUOT-HPLJOQBZSA-N sirolimus Chemical compound C1C[C@@H](O)[C@H](OC)C[C@@H]1C[C@@H](C)[C@H]1OC(=O)[C@@H]2CCCCN2C(=O)C(=O)[C@](O)(O2)[C@H](C)CC[C@H]2C[C@H](OC)/C(C)=C/C=C/C=C/[C@@H](C)C[C@@H](C)C(=O)[C@H](OC)[C@H](O)/C(C)=C/[C@@H](C)C(=O)C1 QFJCIRLUMZQUOT-HPLJOQBZSA-N 0.000 description 75
- 238000007726 management method Methods 0.000 description 44
- 238000012217 deletion Methods 0.000 description 16
- 230000037430 deletion Effects 0.000 description 16
- 239000000872 buffer Substances 0.000 description 10
- 230000006870 function Effects 0.000 description 10
- 238000010586 diagram Methods 0.000 description 8
- 238000004891 communication Methods 0.000 description 4
- 238000013500 data storage Methods 0.000 description 4
- QGWNDRXFNXRZMB-UUOKFMHZSA-N GDP Chemical compound C1=2NC(N)=NC(=O)C=2N=CN1[C@@H]1O[C@H](COP(O)(=O)OP(O)(O)=O)[C@@H](O)[C@H]1O QGWNDRXFNXRZMB-UUOKFMHZSA-N 0.000 description 3
- 101800001226 Glicentin-related polypeptide Proteins 0.000 description 3
- 102100040918 Pro-glucagon Human genes 0.000 description 3
- 230000008859 change Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000002360 preparation method Methods 0.000 description 3
- 240000002791 Brassica napus Species 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 230000008878 coupling Effects 0.000 description 2
- 238000010168 coupling process Methods 0.000 description 2
- 238000005859 coupling reaction Methods 0.000 description 2
- 238000013144 data compression Methods 0.000 description 2
- 230000007423 decrease Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 230000010076 replication Effects 0.000 description 2
- 238000007906 compression Methods 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 230000008094 contradictory effect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 239000003999 initiator Substances 0.000 description 1
- 238000012005 ligant binding assay Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/10—Address translation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/0223—User address space allocation, e.g. contiguous or non contiguous base addressing
- G06F12/023—Free address space management
- G06F12/0238—Memory management in non-volatile memory, e.g. resistive RAM or ferroelectric memory
- G06F12/0246—Memory management in non-volatile memory, e.g. resistive RAM or ferroelectric memory in block erasable memory, e.g. flash memory
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/061—Improving I/O performance
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0629—Configuration or reconfiguration of storage systems
- G06F3/0635—Configuration or reconfiguration of storage systems by changing the path, e.g. traffic rerouting, path reconfiguration
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0655—Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
- G06F3/0659—Command handling arrangements, e.g. command buffers, queues, command scheduling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/0671—In-line storage system
- G06F3/0683—Plurality of storage devices
- G06F3/0688—Non-volatile semiconductor memory arrays
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/65—Details of virtual memory and virtual address translation
- G06F2212/657—Virtual address space management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/72—Details relating to flash memory management
- G06F2212/7201—Logical to physical mapping or translation of blocks or pages
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/72—Details relating to flash memory management
- G06F2212/7202—Allocation control and policies
Definitions
- the present invention relates to a storage apparatus.
- a storage apparatus includes a randomly accessible non-volatile storage medium and a storage controller which controls data transfer between a host computer coupled to the storage apparatus and the non-volatile storage medium inside the storage apparatus.
- Examples of a non-volatile storage medium included in a storage apparatus include a hard disk drive (HDD), a magnetic disk drive, an optical disk drive, and a flash drive (a non-volatile semiconductor memory). Performances of these non-volatile storage media are improving year after year. In particular, a storage apparatus including a flash drive is superior to a storage apparatus only including a hard disk drive in terms of lifetime, power saving, access time, and the like. Performances of flash drives are improving dramatically with advances in semiconductor technology.
- PTL 1 discloses a method of realizing, with dedicated hardware, a high-performance storage controller for controlling a storage apparatus.
- the technique according to PTL 1 enables the performance of a storage controller to be improved.
- functions for facilitating management of storage apparatuses and improving utilization efficiency of storage apparatuses cannot be provided in a sufficient manner.
- functions for facilitating management of storage apparatuses include thin provisioning, volume replication, and snapshot.
- functions for improving utilization efficiency of storage apparatuses include deduplication and data compression. In order to provide these functions, a program including the functions must be executed on a processor.
- the present invention uses both a processor and a control device (for example, dedicated hardware) to improve processing performance with respect to a physical storage device (for example, a non-volatile storage medium) while providing a high-performance storage apparatus.
- a part of command processing (for example, a write command process) is executed by the processor, and at least a part of a read command process is made executable by the control device without involving the processor in order to improve I/O performance.
- a storage apparatus includes: a controller coupled to a host computer; and a physical storage device coupled to the controller.
- the controller includes: a processor configured to provide the host computer with a logical volume based on the physical storage device; a first memory coupled to the processor; a control device coupled to the host computer, the physical storage device, and the processor; and a second memory coupled to the control device.
- the control device Based on a command from the host computer, the control device is configured to write, into the second memory, address information that associates a logical address in the logical volume with a device address, which is an address in the physical storage device.
- the control device determines whether or not the command is a read command.
- the control device When it is determined that the command is a read command, the control device identifies a first logical address designated by the command and determines whether or not the first logical address is included in the address information. When a first address is included in the address information, the control device specifies a first device address corresponding to the first logical address, reads read data stored in an area indicated by the first device address, and transmits the read data to the host computer. When the first logical address is not included in the address information, the control device transmits the command to the processor. When it is determined that the command is a command other than a read command, the control device transmits the command to the processor.
- the processor is configured to execute a process designated by the command received from the control device.
- the storage apparatus is capable of improving processing performance with respect to a physical storage device by executing a part of command processing (for example, a write command process) with a processor and making at least a part of a read command process executable by a control device without involving the processor.
- a part of command processing for example, a write command process
- FIG. 1 is a configuration diagram of a computer system according to Embodiment 1.
- FIG. 2 shows a configuration of a memory 170 of an MP 140 .
- FIG. 3 shows a configuration of a memory 180 of a RAP 160 .
- FIG. 4 is a functional block diagram for explaining a command parallel processing program 320 .
- FIG. 5 is a functional block diagram for explaining a command processing unit 440 .
- FIG. 6 shows an example of a flow chart of a command sorting process.
- FIG. 7 shows an example of a flow chart of a command extracting process.
- FIG. 8 shows an example of a flow chart of an individual command process.
- FIG. 9 shows an example of a flow chart of a read command process executed by the RAP 160 .
- FIG. 10 shows an example of a flow chart of a read command process executed by the MP 140 .
- FIG. 11 shows an example of a flow chart of a write command process of the MP 140 .
- FIG. 12 shows an example of a flow chart of a write data transferring process of the MP 140 .
- FIG. 13 shows an example of a format of an entry registration command.
- FIG. 14 shows an example of a format of an entry registration response 1400 .
- FIG. 15 shows an example of a flow chart of a read command process of the MP 140 according to Embodiment 2.
- FIG. 16 shows an example of a flow chart of a staging process of the MP 140 according to Embodiment 3.
- FIG. 17 shows an example of a format of an entry deletion command.
- FIG. 18 shows a flow of a write data transferring process of the MP 140 according to Embodiment 4.
- FIG. 19 is a configuration diagram of a computer system according to Embodiment 5.
- each piece of information held by a computer system may be described using expressions such as a table and a list.
- a data structure of each piece of information is not limited and other data structures may be used. Since each piece of information is not dependent on a data structure, for example, a “kkk table” or a “kkk list” can also be referred to as “kkk information”.
- a processor executes a program and performs a process using, for example, a storage resource (such as a memory) and/or a communication interface device (such as a communication port). While various programs are sometimes considered processing entities in the following description, alternatively, a processor executing the programs may be considered a processing entity. Furthermore, a process with a processor as its processing entity can be interpreted to be performed by executing one or more programs. While a processor is typically a microprocessor such as a CPU (Central Processing Unit), the processor may include a hardware circuit which executes a part of a process (for example, encoding/decoding and compression/expansion).
- a storage resource such as a memory
- a communication interface device such as a communication port
- a storage apparatus (a storage system) includes: one or more physical storage devices; and a controller which controls the physical storage devices.
- the controller provides a host computer with a logical volume based on the one or more physical storage devices.
- the logical volume may be a real logical volume or a virtual logical volume.
- a real logical volume is a logical volume based on the one or more physical storage devices included in the storage apparatus. Examples of a virtual logical volume may include a TP (thin provisioning) logical volume and a snapshot logical volume.
- a flash drive that is a non-volatile semiconductor memory will be used as an example of a physical storage device.
- the flash drive is, for example, an SSD (Solid State Drive).
- the physical storage device need only be a non-volatile physical storage device and not necessarily a storage device using a non-volatile semiconductor and, for example, an HDD (Hard Disk Drive) may be used.
- a plurality of physical storage devices may constitute one or more RAID (Redundant Array of Independent (or Inexpensive) Disks) groups (a RAID group may also be referred to as a parity group).
- FIG. 1 is a configuration diagram of a computer system according to the Embodiment 1.
- the computer system includes a host computer 10 , a management terminal 20 , and a storage apparatus 100 .
- the host computer 10 may be simply referred to as a host 10 .
- the host 10 is coupled to the storage apparatus 100 via a network 30 .
- the network 30 is a communication path for exchanging commands and data between the host 10 and the storage apparatus 100 and is constituted by, for example, a SAN (Storage Area Network).
- the storage apparatus 100 includes: storage controllers 110 and 120 ; and one or more flash drives 190 which are physical storage devices (non-volatile storage media).
- the storage controllers 110 and 120 may be simply referred to as a controller and the flash drives 190 may be simply referred to as a drive 190 .
- the controllers 110 and 120 are made redundant in FIG. 1 . However, the number of controllers is not limited to two and one or three or more controllers may be provided. Moreover, the controller 110 and the controller 120 basically share a same configuration. Therefore, unless otherwise necessary, a description of the controller 120 will be hereinafter omitted.
- the controller 110 includes: a frontend interface 130 ; a processor 140 ; a read accelerator processor 160 ; memories 170 and 180 ; and a switch (SW) 150 .
- the frontend interface 130 may be may be abbreviated as FE I/F 130 , the processor 140 as MP (Microprocessor) 140 , and the read accelerator processor 160 as RAP 160 .
- the RAP 160 is a control device and may be realized by, for example, dedicated hardware such as an ASIC (application specific integrated circuit) or an FPGA (field-programmable gate array). For example, processes described later to be executed by the RAP 160 can be processed with higher performance (at a higher speed) when executed by the RAP 160 mounted as hardware than when executed by the MP (Microprocessor) 140 .
- the switch 150 is coupled to the FE I/F 130 via a signal line 131 , coupled to the MP 140 via a signal line 141 , coupled to the RAP 160 via a signal line 161 , and coupled to the drive 190 via a signal line 191 .
- each of the signal lines 131 , 141 , 151 , 161 , and 191 is, but not limited to, a PCI Express bus to be used as an internal data transfer path of the controller.
- the MP 140 and the RAP 160 are coupled via a Non Transparent Bridge and are capable of communicating with each other.
- the coupling between the MP 140 and the RAP 160 is not limited to a Non Transparent Bridge.
- the switch 150 of the controller 110 and a switch (not shown) of the controller 120 are coupled to each other via the signal line 151 . Accordingly, the MP 140 in the controller 110 and a processor (not shown) in the controller 120 are capable of communicating with each other.
- the memory 170 is coupled to the MP 140 .
- the memory 170 is a main storage medium of the MP 140 and includes: an area for storing programs (a storage control program and the like) to be executed by the MP 140 and a management table to be referenced by the MP; and a cache area for temporarily storing data. Details will be provided later.
- the memory 180 is coupled to the RAP 160 .
- the memory 180 is a main storage medium of the RAP 160 and includes an area for storing a management table to be referenced by the RAP 160 and the like. Details will be provided later.
- the FE I/F 130 is coupled to the host 10 via the network 30 .
- the FE I/F 130 converts a data transfer protocol between the host 10 and the controller 110 into a data transfer protocol inside the controller 110 and vice versa.
- the MP 140 of the controller 110 provides the host 10 with a logical volume (LU: Logical Unit) based on one or a plurality of drives 190 .
- LU Logical Unit
- an LU will be described as a virtual logical volume.
- the management terminal 20 is respectively coupled to the controllers 110 and 120 .
- the management terminal 20 includes: an input apparatus to be used by a manager of the storage apparatus 100 to input configuration information to each of the controllers 110 and 120 ; and a display apparatus for displaying information on the storage apparatus 100 to the manager.
- FIG. 2 shows a configuration of the memory 170 for the MP 140 .
- the memory 170 includes a drive access program 210 , an IO processing program 220 , a storage management program 230 , a cache directory 240 , a cache area 250 , and a read buffer 260 .
- the drive access program 210 performs control for transferring (staging) data stored in the drive 190 to the cache area 250 , control for storing (destaging) data in the cache area 250 in the drive 190 , and the like.
- the IO processing program 220 processes commands such as a read command, a write command, and a management command, and returns a response to each command.
- the storage management program 230 is, for example, a program that provides functions for facilitating management, improving utilization efficiency, and the like of the storage apparatus 100 and provides, for example, functions such as thin provisioning, volume replication, snapshot, deduplication, and data compression.
- the read buffer 260 is an area for temporarily storing data when the MP 140 transfers data in the drive 190 to the FE I/F 130 .
- the cache directory 240 is information used by the MP 140 when retrieving data (cache data) stored in the cache area 250 .
- the cache directory 240 includes reference tables GRPP (GRouP Pointer), GRPT (GRouP Table) 1, GRPT2, and a slot control block (SLCB) as a management table.
- the reference tables (GRPP, GRPT1, and GRPT2) are tables referenced by the MP 140 when retrieving a cache segment (hereinafter, a segment) which is a minimum unit in the cache area 250 .
- the reference tables have a directory structure.
- the GRPP is positioned uppermost and the GRPT2 is positioned lowermost.
- An upper-level reference table contains a pointer to a next reference table.
- the GRPT2 contains a pointer to the SLCB.
- the SLCB is a table for managing control information related to a segment.
- the SLCB stores information indicating whether or not data designated by a read command is stored in the cache area 250 and, when the data is stored in the cache area 250 , an address (a cache address) of the data on the memory 170 , and the like.
- One or a plurality of segments can be associated with one slot. For example, one segment is capable of storing 64 KB of data.
- the cache area 250 may be managed in slot units.
- a transition between states of dirty data and clean data may be performed in slot units.
- the cache area 250 may be reserved and released in either slot units or segment units.
- dirty data refers to data stored in the cache area 250 in a state prior to being written into the drive 190 .
- clean data refers to data stored in the cache area 250 in a state after being written into the drive 190 .
- the MP 140 When the MP 140 receives a read command from the host 10 , based on an LU number (LUN: Logical Unit Number) and a logical block address (LBA: Logical Block Address) contained in the read command, the MP 140 sequentially searches the respective tables in the cache directory 240 . Accordingly, the MP 140 is able to find out whether or not read data designated by the read command is stored in the cache area 250 and, when stored, an address (a cache address) of the read data on the cache area 250 . Moreover, since data retrieval in the cache area 250 using the cache directory 240 is well-known art, a detailed description thereof will be omitted herein.
- LUN Logical Unit Number
- LBA Logical Block Address
- FIG. 3 shows a configuration of the memory 180 for the RAP 160 .
- the memory 180 includes a command parallel processing program 320 , an address conversion table management program 330 , and an address conversion table 310 .
- the address conversion table 310 may be stored in the memory 170 .
- the command parallel processing program 320 and the address conversion table management program 330 are programs to be executed by the RAP 160 .
- the command parallel processing program 320 processes commands received by the controllers 110 and 120 . Details of this process will be described later.
- the address conversion table management program 330 performs a registration process of the address conversion table 310 in accordance with an instruction from the MP 140 .
- the address conversion table management program 330 performs a registration process or a registration deletion process of the address conversion table 310 in accordance with an instruction from the command parallel processing program 320 .
- the address conversion table 310 is a table which directly associates an LBA of an LU provided by the controllers 110 and 120 to the host 10 and an address (a device address) of a data block in the drive 190 with each other.
- the device address is an address provided by the drive 190 to the controllers 110 and 120 .
- Each entry of the address conversion table 310 includes items of, for example, an LU number 311 of an LU, an LBA 312 of the LU, a drive number 313 of the drive 190 , and an address 314 of a data block in the drive 190 .
- a set of the LU number 311 and the LBA 312 and a set of the drive number 313 and the address 314 each have unique values.
- the controllers 110 and 120 may manage the LBA 312 using a plurality of consecutive LBAs as one management unit. Specifically, for example, the controllers 110 and 120 manage the LBA 312 in a same unit as a slot of the cache area 250 .
- a configuration may be adopted in which a corresponding management unit is identified by high-order bits of an LBA in a range determined in advance and an LBA in the management unit is identified by low-order bits of an LBA in a range determined in advance.
- a first LBA (with 0 as lowest 7 bits) of the management unit is registered in the LBA 312 .
- the address conversion table 310 is searched using a LU number and a value in which the lowest 7 bits of an LBA are masked by 0 as a search key to obtain a drive number and a data block address of a drive corresponding to the management unit.
- a corresponding slot number may be stored and managed in the LBA 312 in place of an LBA. In this case, an LU number and the corresponding slot number are to be used as a search key for address conversion. Details of timings at which registration and registration deletion of the address conversion table 310 are performed will be provided later.
- FIG. 4 is a functional block diagram for explaining the command parallel processing program 320 .
- the command parallel processing program 320 includes a reception queue 410 , a command sorting unit 420 , a plurality of sorted queues 430 (sorted queue #0 to sorted queue #N), and a plurality of command processing units 440 (command processing unit #0 to command processing unit #N).
- the command parallel processing program 320 enqueues a command received by the FE I/F 130 from the host 10 to a tail end of the reception queue 410 .
- Examples of a command type of a command received by the controllers 110 and 120 from the host 10 include a read command, a write command, and a management command. While it is assumed that the controllers 110 and 120 receive commands of these three types in the following description, the controllers 110 and 120 may receive commands of other types.
- a read command may be denoted by “R”
- a write command may be denoted by “W”
- M a management command
- the command sorting unit 420 dequeues a command from a head of the reception queue 410 and enqueues the command to a tail end of one sorted queue 430 that satisfies conditions. Specifically, the command sorting unit 420 sorts commands that can be processed in parallel into different sorted queues 430 and sorts commands that cannot be processed in parallel into a same sorted queue 430 . Commands that are sorted into the same sorted queue 430 are, for example, commands in an order relationship.
- the command sorting unit 420 sorts these commands into the same sorted queue 430 .
- the write command may be processed before the read command.
- the MP 140 executes a write process based on the write command and transmits a completion response thereof to the host 10 , data prior to update having been read before the write process may be inadvertently transmitted as read data to the host.
- a read command and a write command designating a same set of an LU number and an LBA must be processed in series instead of in parallel.
- Each sorted queue 430 has a one-to-one correspondence with the command processing unit 440 .
- Each command processing unit 440 dequeues a command from a head of the sorted queue 430 corresponding to the command processing unit 440 and processes the command according to rules.
- An example of the rules of command processing in the command processing unit 440 will be described. For example, the following three rules are applied: (1) when processing a read command, a command other than a read command is not newly processed; (2) commands other than read commands are processed sequentially (prohibition of simultaneous processing of plurality of commands); and (3) when processing a command other than a read command, a read command is not newly processed.
- each command processing unit 440 is capable of parallel processing of a plurality of read commands, commands other than read commands such as a write command and a management command must always be processed one at a time.
- the parallel processing of read commands enables read commands to be processed at high speed.
- FIG. 5 is a functional block diagram for explaining the command processing unit 440 .
- the command processing unit 440 includes a command extracting unit 510 and a plurality of individual command processing units 520 (individual command processing unit #1 to individual command processing unit #M).
- the command extracting unit 510 extracts one command from a head of the sorted queue 430 in accordance with the rules described above and hands over the extracted command to a single idling individual command processing unit 520 .
- a maximum number of commands that can be processed by the individual command processing unit 520 at one time is one.
- a state where the individual command processing unit 520 is executing a command will be referred to as “executing” and a state where the individual command processing unit 520 is not executing a command will be referred to as “idling”.
- FIG. 6 shows an example of a flow chart of a command sorting process.
- the command sorting process is a process executed by the command sorting unit 420 .
- the command sorting unit 420 dequeues one command from the head of the reception queue 410 .
- the command sorting unit 420 determines a command type of the extracted command. Specifically, for example, when a read command and a write command are considered IO commands, the command sorting unit 420 determines whether or not the extracted command is an IO command.
- An example of a command other than an IO command is a management command.
- the command sorting unit 420 advances the process to S 603 .
- the command sorting unit 420 advances the process to S 606 .
- the command sorting unit 420 acquires a set of an LU number and an LBA designated by the extracted command.
- the command sorting unit 420 calculates a hash value M from the acquired set of an LU number and an LBA.
- the command sorting unit 420 enqueues a command to a tail end of a sorted queue 430 corresponding to the hash value M and ends the process.
- a hash value M and a sorted queue #M having a same number as the hash value M may be associated with each other.
- a hash value and a sorted queue 430 may be associated with each other in any way.
- the hash value M may be calculated after performing a conversion which, for example, masks a part of the bits of the LBA with 0.
- the range described above is, for example, a management unit of the address conversion table 310 .
- the command sorting unit 420 enqueues a command to a tail end of the sorted queue #0 and ends the process.
- the command sorting unit 420 uses the sorted queue #0 as a sorted queue for processing a management command.
- a command of a reception queue can be sorted into sorted queues 430 in accordance with rules.
- a management command can be sorted into one sorted queue 430 determined in advance. Since a plurality of management commands cannot implement simultaneous processing, having only a command processing unit 440 corresponding to a sorted queue determined in advance execute a management command prevents processing performance of other command processing units 440 from declining.
- an IO command is sorted into a sorted queue corresponding to the hash value M corresponding to a set of an LU number and an LBA. Accordingly, since a read command and a write command which share a same storage destination are sorted into a same sorted queue, processing of IO commands can be performed in an order in which the IO commands are received.
- FIG. 7 shows an example of a flow chart of a command extracting process.
- the command extracting process is a process executed by each command extracting unit 510 .
- a process executed by the command extracting unit 510 of the command processing unit #0 will now be described as an example.
- the command extracting unit 510 determines whether or not the sorted queue 430 is empty. When the sorted queue 430 is empty (Yes in S 701 ), the command extracting unit 510 ends the process. On the other hand, when the sorted queue 430 is not empty (No in S 701 ), the command extracting unit 510 advances the process to S 702 .
- the command extracting unit 510 determines whether a command type of a head command of the sorted queue 430 is a read command or a command of another type. When the head command is a read command (Yes in S 702 ), the command extracting unit 510 advances the process to S 703 .
- the command extracting unit 510 confirms a state of all individual command processing units 520 in the command processing unit 440 .
- the command extracting unit 510 determines whether or not there is even one idling individual command processing unit 520 . When there is no idling individual command processing unit 520 (No in S 704 ), the command extracting unit 510 returns the process to S 703 . On the other hand, when there is even one idling individual command processing unit 520 (Yes in S 704 ), the command extracting unit 510 advances the process to S 705 .
- the command extracting unit 510 dequeues a read command from the head of the sorted queue 430 .
- the command extracting unit 510 hands over the extracted command to one idling individual command processing unit 520 and causes the individual command processing unit 520 to start an individual command process. After S 706 , the command extracting unit 510 returns the process to S 701 .
- the command extracting unit 510 stands by until all individual command processing units 520 in the command processing unit 440 enter an idle state.
- the command extracting unit 510 dequeues one command from the head of the sorted queue 430 .
- the command extracting unit 510 hands over the extracted command to one of the idling individual command processing units 520 and causes the individual command processing unit 520 to start an individual command process.
- the command extracting unit 510 stands by until all individual command processing units 520 in the command processing unit 440 enter an idle state (in other words, until all executing individual command processing units 520 end their processes), and returns the process to S 701 .
- the command processing unit 440 can perform processing in accordance with rules. Specifically, when a command at the head of a sorted queue 430 is a read command and there is even one individual command processing unit 520 in an idle state among the plurality of individual command processing units 520 in the command processing unit 440 to which the command extracting unit 510 belongs, the command extracting unit 510 can hand over the read command to the individual command processing unit 520 in an idle state and cause the individual command processing unit 520 to execute an individual command process. Accordingly, read commands can be processed in parallel in the command processing unit 440 . Therefore, read commands can be executed in an efficient manner.
- a command at the head of a sorted queue 430 is a command other than a read command and all of the individual command processing units 520 in the command processing unit 440 to which the command extracting unit 510 belongs are in an idle state
- the command extracting unit 510 can hand over the command other than a read command to one of the individual command processing units 520 and cause the individual command processing unit 520 to execute an individual command process. Accordingly, conditions of rule (1) can be satisfied.
- commands other than a read command such as a write command and a management command can be processed exclusively from other commands.
- the command extracting unit 510 does not extract a next command until the individual command process is finished. Accordingly, conditions of rules (2) and (3) can be satisfied. In other words, with respect to commands that are either management commands or write commands, one command is to be exclusively processed in the command processing unit 440 .
- a correspondence between an LBA and a device address in the drive 190 may be changed, and unless the change is appropriately reflected in the address conversion table 310 in the memory 180 , the RAP 160 may inadvertently return data stored at the device address prior to the change to the host. According to the processes described above, even when such a change is made, the MP 140 and the RAP 160 can be operated without any logical contradictions.
- FIG. 8 shows an example of a flow chart of an individual command process.
- the individual command process is a process executed by each individual command processing unit 520 .
- a process executed by one individual command processing unit 520 in the command processing unit #0 will now be described as an example.
- the individual command processing unit 520 determines whether a command type of a command received from the command extracting unit 510 is a read command or a command other than a read command.
- the individual command processing unit 520 advances the process to S 802 .
- the received command is a command other than a read command (No in S 801 ) such as a write command or a management command
- the individual command processing unit 520 advances the process to S 807 .
- the individual command processing unit 520 searches the address conversion table 310 using a set of an LU number and an LBA designated by the read command as a search key.
- the set of an LU number and an LBA may be expressed by the hash value described earlier.
- a method of designating a search key in accordance with a management method of the address conversion table 310 is as described earlier.
- the individual command processing unit 520 determines whether or not the search in S 802 results in a hit.
- the individual command processing unit 520 advances the process to S 804 .
- the search results in a hit Yes in S 803
- the individual command processing unit 520 advances the process to S 805 .
- the entry in processing of a write command, when an entry of a set of an LU number and an LBA designated by the write command is registered in the address conversion table 310 , the entry is deleted from the address conversion table 310 (S 810 ) and, in a read command process, when dirty data does not exist in the cache area 250 (in a case other than a dirty hit), a set of an LU number and an LBA designated by the read command is registered in the address conversion table 310 .
- the address conversion table 310 is searched upon receiving a read command (S 802 ), the RAP 160 or the MP 140 is determined as a processing entity of the read command process based on a search result, and a different read command process is executed depending on a result of the determination of a processing entity as shown in FIG. 9 or 10 . Therefore, before data of the drive 190 is updated by write data (dirty data) due to a write command, data prior to the update is prevented from being transmitted to the host 10 as read data by a read command subsequent to the write command.
- the individual command processing unit 520 performs a read command process ( FIG. 9 ) by the RAP 160 and ends the process. Details of the read command process by the RAP 160 will be described later.
- the individual command processing unit 520 causes the MP 140 to execute a read command process by transmitting a read command to the MP 140 . Subsequently, in S 806 , the individual command processing unit 520 stands by until a completion notification of the read command process is received from the MP 140 and then ends the process.
- the individual command processing unit 520 determines whether a command type of a command received from the command extracting unit 510 is a write command or a command other than a write command. When the received command is a write command (Yes in S 807 ), the individual command processing unit 520 advances the process to S 808 . When the received command is a command other than a write command (No in S 807 ), the individual command processing unit 520 advances the process to S 813 .
- the individual command processing unit 520 searches the address conversion table 310 using a set of an LU number and an LBA designated by the write command as a search key. Moreover, a method of designating a search key in accordance with a management method of the address conversion table is as described earlier. When a range of write data designated by the write command straddles a plurality of management units of the address conversion table, the process described above is repeated the required number of times to search the address conversion table 310 with respect to the data range designated by the write command.
- the individual command processing unit 520 determines whether or not the search in S 808 results in a hit.
- the individual command processing unit 520 deletes an entry corresponding to a set of an LU number and an LBA having resulted in the hit from the address conversion table 310 (S 810 ), and advances the process to S 811 .
- a hit may include a plurality of entries.
- the individual command processing unit 520 causes the MP 140 to execute a write command process by transmitting a write command to the MP 140 . Subsequently, in S 812 , the individual command processing unit 520 stands by until a completion notification of the write command process is received from the MP 140 , and ends the process after receiving the completion notification.
- the individual command processing unit 520 causes the MP 140 to execute a command process by transmitting a command to the MP 140 .
- An example of this command is a management command.
- the individual command processing unit 520 stands by until a completion notification of the command process is received from the MP 140 , and ends the process after receiving the completion notification.
- the RAP 160 can execute a read command process without involving the MP 140 . Accordingly, for example, a case where a read command process is executed by the RAP 160 including when the RAP 160 is realized by hardware such as an ASIC or an FPGA enables processing with higher performance (at a higher speed) and improves performances of the controllers 110 and 120 as compared to a case where the read command process is executed by the MP 140 . In addition, a part of a read command process executed by the MP 140 can be borne by the RAP 160 to distribute a load applied to the MP 140 by the read command process.
- many virtual or logical volumes may be interposed between a logical volume provided to a host computer and a non-volatile storage medium, in which case a plurality of address conversions may be required to calculate an address of the non-volatile medium storing actual data from an address of a logical volume designated by an IO (Input or Output) command (a read command or a write command) transmitted from the host computer and a performance effect of a processor may not be sufficiently obtained due to an overhead imposed by the conversion process.
- the address conversion table 310 directly associates a data storage position in an LU provided to the host and a data storage position in the drive 190 .
- a read speed of read data can be increased without having to perform such address conversion.
- the performance of the storage apparatus 100 can be maintained even when the storage apparatus 100 is equipped with a function unique to the storage apparatus 100 .
- the RAP 160 inquires the MP 140 for an address in the drive 190 corresponding to a set of an LU number and an LBA every time the RAP 160 processes a read command, a load applied by address conversion causes a decline in the performance of the MP 140 .
- communication between the RAP 160 and the MP 140 consumes processing time and causes a decline in the performance of the storage apparatus.
- the storage apparatus 100 is capable of executing a read command process without requiring the RAP 160 and the MP 140 to communicate with each other.
- the command is transferred by the individual command processing unit 520 to enable the MP 140 to process the command.
- a resource of the MP 140 can be used to process commands other than a read command and performance of the storage apparatus 100 as a whole can be improved.
- the RAP 160 deletes an entry of the address conversion table 310 other than an individual command process.
- An example of such a case is when the storage management program 230 running on the MP 140 changes a storage destination of data.
- the MP 140 may be configured to output a deletion command of an entry of the address conversion table 310 to the RAP 160 .
- examples of a case where a storage destination of data is changed include a case where same data is stored in a plurality of different data blocks of the drive 190 . Due to a deduplication process, the MP 140 changes the drive number 313 and the address 314 of a storage destination of data corresponding to an LBA so that a common data block is referenced by the data. Details including a format of the deletion command will be described later.
- FIG. 9 shows an example of a flow chart of a read command process executed by the RAP 160 .
- a read command process of the RAP 160 is executed by the individual command processing unit 520 of the RAP 160 . This process is the process of S 804 in FIG. 8 .
- the individual command processing unit 520 transmits a transfer instruction of read data designated by a read command to a drive storing the read data.
- the individual command processing unit 520 acquires a set of the drive number 313 and the address 314 corresponding to a set of the LU number 311 and the LBA 312 designated by the read command from the address conversion table 310 .
- the individual command processing unit 520 writes a transfer parameter indicating a transfer source address of a data block of the address 314 , a transfer destination address of the read buffer 260 of the memory 170 , and a data length designated by the read command into the memory 170 , and transmits a transfer instruction of read data to the drive 190 corresponding to the drive number 313 .
- a controller (not shown) of the drive 190 receives the transfer instruction of read data, references a transfer parameter in the memory 170 based on the transfer instruction, and transfers read data in a data block of the address 314 to the read buffer 260 based on the referenced transfer parameter.
- the controller of the drive 190 transmits a completion notification of the data transfer to the RAP 160 .
- the individual command processing unit 520 stands by until the completion notification of the data transfer is received from the drive 190 , and advances the process to S 903 after receiving the completion notification.
- the individual command processing unit 520 creates a read response (READ RSP) frame for notifying the host of completion of the read command process.
- RSD RSP read response
- the individual command processing unit 520 transfers the created response frame to the memory 170 .
- the individual command processing unit 520 transmits a transfer instruction of read data in the read buffer 260 to the FE I/F 130 . Specifically, for example, the individual command processing unit 520 writes a transfer parameter for transferring read data and a read response frame to the host 10 into the memory 170 and transmits a transfer instruction of the read data to the FE I/F 130 .
- the FE I/F 130 receives the transfer instruction, references a transfer parameter in the memory 170 based on the transfer instruction, and transfers the read data and the read response frame to the host 10 based on the referenced transfer parameter. Subsequently, the FE I/F 130 transmits a completion notification of the data transfer to the RAP 160 .
- the individual command processing unit 520 performs a freeing process of a secured read buffer after S 906 . While this freeing process is generally collectively performed with freeing processes of other read buffers (garbage collection), the freeing process may be performed immediately after S 906 .
- the RAP 160 can transmit read data to the host 10 without having the MP 140 perform a read command process by respectively transmitting transfer instructions to the drive 190 and the FE I/F 130 .
- FIG. 10 shows an example of a flow chart of a read command process executed by the MP 140 .
- the read command process of the MP 140 is performed as the MP 140 executes the IO processing program 220 . This process is triggered by the reception of the read command transferred by the individual command processing unit 520 in S 805 shown in FIG. 8 . Moreover, as described earlier, the read command designates an LU number and an LBA of a storage destination of read data.
- the MP 140 receives a read command, references the cache directory 240 of the memory 170 , and searches whether read data based on the read command is cached in the cache area 250 .
- the MP 140 determines whether or not a result of the search is a dirty hit.
- the result of the search being a dirty hit represents a case where the read data is being cached and the read data is dirty data.
- the MP 140 advances the process to S 1003 .
- the MP 140 advances the process to S 1007 .
- Examples of a case where the search result is other than a dirty hit include a case where the read data is being cached and the read data is clean data (when the read data results in a clean hit) and a case where the read data is not being cached (when the read data results in a cache miss).
- the MP 140 creates a read response frame and stores the created read response frame in the memory 170 .
- the MP 140 writes a transfer parameter for transferring read data in the cache area 250 and the read response frame to the host 10 into the memory 170 and transmits a transfer instruction of the read data to the FE I/F 130 .
- the FE I/F 130 receives the transfer instruction of the read data, references the transfer parameter, transfers the read data and the read response frame to the host 10 , and transmits a completion notification of data transfer to the MP 140 .
- the MP 140 transmits the completion notification of the read command to the RAP 160 and ends the process.
- the MP 140 writes a transfer parameter for transferring read data stored in the drive 190 to the cache area 250 into the memory 170 and transmits a transfer instruction of the read data to the drive 190 .
- the MP 140 specifies a set of a drive number and a device address corresponding to a set of an LU number and an LBA designated by the read command.
- the MP 140 writes a transfer parameter indicating a transfer source address of a data block of the address and a transfer destination address of the read buffer 260 of the memory 170 into the memory 170 , and transmits a transfer instruction of read data to the drive 190 corresponding to the drive number.
- a range of read data designated by the read command straddles slots of the cache area 250 , the process described above is repeated the required number of times and a transfer instruction of read data is transmitted with respect to the data range designated by the read command. Therefore, there may be a plurality of transmission destination drives of the transfer instruction.
- the storage apparatus 100 is equipped with virtualization technology unique to storage apparatuses such as thin provisioning, a virtual or logical volume is interposed between an LU and the drive 190 .
- the MP 140 allocates the drive 190 to an LU and allocates the LU to a pool.
- the memory 170 includes conversion information for specifying a device address of the drive 190 from a set of an LU and an LBA. For example, based on an LU number and an LBA designated by a read command or a write command, the MP 140 specifies a drive number and a device address using, as conversion information, a conversion table from a virtual address in the memory 170 into a pool address, a conversion table from a pool address into a logical address, and a conversion table into an address of a logical drive.
- the MP 140 when receiving a read command or a write command, the MP 140 is to reference a plurality of tables that are conversion information and perform a plurality of address conversions until a set of a drive number and a device address corresponding to a set of an LU number and an LBA designated by the read command or the write command is acquired from the designated set of an LU number and an LBA.
- conversion information included in the memory 170 is not limited to the tables described above.
- a controller (not shown) of the drive 190 receives the transfer instruction of read data, references a transfer parameter in the memory 170 based on the transfer instruction, and transfers read data in a data block of the address 314 to the read buffer 260 based on the referenced transfer parameter.
- the controller of the drive 190 transmits a completion notification of the data transfer to the RAP 160 .
- the MP 140 stands by until the completion notification of the data transfer is received from the drive 190 , and ends the process after receiving the completion notification.
- the MP 140 creates a read response frame for notifying the host 10 of completion of the read command process, and stores the created read response frame in the memory 170 .
- the MP 140 writes a transfer parameter for transferring read data transferred from the drive 190 and the read response frame to the host 10 into the memory 170 and instructs the FE I/F 130 to perform data transfer.
- the FE I/F 130 receives the transfer instruction of the read data, references the transfer parameter, transfers the read data and the read response frame to the host 10 , and transmits a completion notification of data transfer to the MP 140 .
- the MP 140 stands by until the completion notification of the data transfer is received from the FE I/F 130 , and advances the process to S 1012 after receiving the completion notification.
- the MP 140 creates an entry registration command for instructing registration of an entry based on the read command process and transmits the entry registration command to the RAP 160 .
- the entry registration command includes the set of an LU number and an LBA and the set of the drive number of the drive 190 and the device address of the data block specified in S 1007 . A description of the entry registration command will be provided later.
- the RAP 160 registers an entry in the address conversion table 310 in accordance with the entry registration command. In addition, the RAP 160 transmits an entry registration response 1400 notifying completion of the entry registration to the address conversion table 310 to the MP 140 . A description of the entry registration response 1400 will be provided later.
- the MP 140 stands by until receiving the entry registration response 1400 to an entry registration command 1300 , and ends the process to S 1014 after receiving the entry registration response 1400 .
- the MP 140 transmits a completion notification to the RAP 160 and ends the process.
- the RAP 160 can register these associations in the address conversion table 310 .
- the RAP 160 having received the read command designating the LU number and the LBA is capable of reading, based on the address conversion table 310 , the read data from the drive 190 at a higher speed than when the process is performed by the MP 140 .
- the read data in the cache area 250 can be transmitted to the host 10 based on the read command transferred from the RAP 160 .
- FIG. 13 shows an example of a format of an entry registration command.
- the entry registration command 1300 includes a plurality of fields.
- the respective fields are: a command type 1301 indicating a type of the command; a sequence number 1302 of the command; an LU number 1303 of an LU that is a target of the command; an LBA 1304 that is a target of the command; a drive number 1305 of the drive 190 that is a target of the command; and an address 1306 of a data block that is a target of the command.
- a command type 1301 of “0” means that the command is an entry registration command.
- the sequence number 1302 is an identifier of the entry registration command and, at the same time, a number which links the entry registration command and a response thereto with each other. Every time the MP 140 issues an entry registration command, the MP 140 increments the sequence number 1302 by one. Moreover, when the sequence number 1302 exceeds a designated maximum value, the MP 140 may reset the sequence number to zero and reuse the sequence number.
- the LU number 1303 and the LBA 1304 are the LU number and the LBA designated by a read command.
- the drive number 1305 and the address 1306 represent a device address of the drive 190 corresponding to the set of the LU number 1303 and the LBA 1304 .
- FIG. 14 shows an example of a format of the entry registration response 1400 .
- the entry registration response 1400 includes a plurality of fields.
- the respective fields are: a command response type 1401 indicating a type of the response; and a sequence number 1402 of the response.
- a command response type 1401 of “1” means that a process based on the entry registration command 1300 has ended normally.
- FIG. 11 shows an example of a flow chart of a write command process of the MP 140 .
- the write command process of the MP 140 is performed as the MP 140 executes the IO processing program 220 .
- the MP 140 performs the write command process when receiving a write command transferred from the RAP 160 in S 811 of FIG. 8 .
- the write command designates an LU number and an LBA of a storage destination of write data.
- a preparation process for receiving write data is performed.
- the FE I/F 130 reserves a buffer area for receiving the write data and the MP 140 reserves an area for storing the write data in the cache area 250 .
- the MP 140 creates a transfer ready (XFER_RDY) frame and stores the created transfer ready frame in the memory 170 .
- the MP 140 writes a transfer parameter for transferring the transfer ready frame from the memory 170 to the host 10 into the memory 170 , and transmits a transfer instruction of the write data to the FE I/F 130 .
- the FE I/F 130 receives the transfer instruction of the write data, references the transfer parameter, and transfers the transfer ready frame to the host 10 .
- the MP 140 stands by until a completion notification of the transfer is received from the FE I/F 130 , and ends the process after receiving the completion notification.
- the MP 140 when the MP 140 receives a write command from the RAP 160 , the MP 140 can perform preparation for receiving write data.
- a process performed when the MP 140 receives a management command from the RAP 160 based on S 813 in FIG. 8 will be omitted.
- the MP 140 performs a management command process by a conventionally known method.
- FIG. 12 shows an example of a flow chart of a write data transferring process of the MP 140 .
- the MP 140 After the write command process, the MP 140 successively performs a write data transferring process.
- the host 10 receives a transfer ready frame from the FE I/F 130 , the host 10 transmits write data to the FE I/F 130 .
- the FE I/F 130 Upon receiving the write data from the host 10 , the FE I/F 130 transmits a write data reception notification to the MP 140 .
- the MP 140 receives a write data reception notification from the FE I/F 130 .
- the MP 140 writes a transfer parameter for transferring write data from the host 10 to an area reserved in the cache area 250 of the controller 110 into the memory 170 , and transmits a transfer instruction of the write data to the FE I/F 130 .
- the FE I/F 130 receives the transfer instruction, references the transfer parameter, and transfers the write data to the reserved area from the host 10 . Once the transfer of the write data is completed, the FE I/F 130 transmits a transfer completion notification of the write data to the MP 140 .
- the MP 140 receives the transfer completion notification of the write data from the FE I/F 130 .
- the MP 140 transmits the write data to a cache area of the other controller 120 . Accordingly, the write data is duplicated in the cache area and reliability of the data can be ensured.
- the MP 140 creates a write response (Write RSP) frame and stores the created write response frame in the memory 170 .
- the MP 140 writes a transfer parameter for transferring the write response frame from the memory 170 to the host 10 into the memory 170 , and transmits a transfer instruction of the write response frame to the FE I/F 130 .
- the FE I/F 130 receives the transfer instruction, transfers the write response frame in the memory 170 to the host 10 , and transmits a completion notification of the transfer to the MP 140 .
- the MP 140 stands by until the completion notification of the transfer is received from the FE I/F 130 , and advances the process to S 1208 after receiving the completion notification.
- the MP 140 transmits the completion notification of the write command to the RAP 160 and ends the process.
- write data can be stored in the cache area 250 and a completion notification can be transmitted to the RAP 160 .
- the MP 140 writes the write data having been written into the cache area 250 into the drive 190 asynchronously with the write data transferring process. This control is performed as the MP 140 executes the drive access program 210 .
- the storage apparatus 100 may include a plurality of RAPs 160 .
- the plurality of RAPs 160 may be coupled to the switch 150 .
- the MP 140 when the MP 140 includes a plurality of cores, a part of the cores of the MP 140 may perform a process of the RAP 160 in place of the RAP 160 .
- Embodiment 2 differs from Embodiment 1 in the read command process of the MP 140 . Moreover, a configuration of the computer system is similar to that of Embodiment 1. Hereinafter, differences in configurations and steps of processes from Embodiment 1 will be mainly described. Configurations and steps of processes that are similar to those of Embodiment 1 will be denoted with similar reference numerals and a description thereof will be omitted or abridged.
- FIG. 15 shows an example of a flow chart of a read command process of the MP 140 according to Embodiment 2.
- the read command process according to Embodiment 2 differs from that of Embodiment 1 in that read data based on a read command is being cached in the cache area 250 and that a process for a case where read data is clean data (when read data results in a clean hit) has been added.
- the MP 140 determines whether or not a search result indicates a dirty hit by read data.
- the search result is a dirty hit (Yes in S 1002 )
- the MP 140 advances the process to S 1003 .
- the search result is other than a dirty hit (No in S 1002 )
- the MP 140 advances the process to S 1507 .
- the MP 140 determines whether or not the read data results in a clean hit.
- the MP 140 advances the process to S 1007 .
- the MP 140 writes the read data from the drive 190 into the cache area 250 and instructs the RAP 160 to register an entry in the address conversion table 310 . Since this is a similar process to Embodiment 1, a description thereof will be omitted.
- the MP 140 advances the process to S 1508 .
- the MP 140 determines whether or not to register the address conversion table 310 .
- An example of the determination made in S 1508 will now be described.
- the MP 140 may observe its own load, and may determine not to register an entry in the address conversion table 310 when the load on the MP 140 is smaller than a threshold configured in advance but determine to register an entry in the address conversion table 310 when the load on the MP 140 is equal to or larger than the threshold configured in advance.
- the MP 140 determines not to register the address conversion table (No in S 1509 )
- the MP 140 executes the process of S 1003 and thereafter. According to this process, read data in the cache area is transmitted to the host 10 . Therefore, the MP 140 need not read the read data from the drive 190 and response time to a read command can be reduced.
- the MP 140 determines to register the address conversion table 310 (Yes in S 1509 )
- the MP 140 executes the process of S 1007 and thereafter. According to this process, read data read to the cache area 250 from the drive 190 is transmitted to the host 10 and, at the same time, the entry registration command 1300 of the address conversion table 310 is transmitted to the RAP 160 . Accordingly, the RAP 160 can be caused to execute a subsequent read command process. A load applied to the MP 140 by the read command process can be offloaded to the RAP 160 .
- the MP 140 can determine, based on prescribed conditions, whether to make the MP 140 or the RAP 160 an executing entity of a read command process based on a subsequently-transmitted read command. For example, when the determination is made based on an observation result of the load on the MP 140 , read performance of the storage apparatus 100 can be improved by offloading the load on the MP 140 to the RAP 160 .
- Embodiment 3 will be described with reference to FIGS. 16 and 17 .
- Embodiment 3 differs from Embodiments 1 and 2 in that a case where the MP 140 performs a sequential read is taken into consideration.
- a configuration of the computer system is similar to those of Embodiments 1 and 2.
- differences in configurations and steps of processes from Embodiments 1 and 2 will be mainly described. Configurations and steps of processes that are similar to those of Embodiments 1 and 2 will be denoted with similar reference numerals and a description thereof will be omitted or abridged.
- FIG. 16 shows an example of a flow chart of a staging process of the MP 140 according to Embodiment 3.
- the staging process is a process performed in the read command process ( FIGS. 10 and 15 ) of the MP. Specifically, for example, after the MP 140 receives a completion notification of data transfer from the drive 190 (S 1008 ), the MP 140 advances the process to S 1603 .
- the MP 140 determines whether or not a look-ahead process during a sequential read is being executed based on the read command process. For example, this is a process in which, when the MP 140 determines that a read instruction indicates reading consecutive pieces of data, the MP 140 transfers data in an LBA consecutive to an LBA based on the read instruction to the cache area 250 in advance. Due to the look-ahead process, read performance can be improved.
- the MP 140 creates an entry deletion command for instructing deletion of a target entry of the address conversion table 310 and transmits the entry deletion command to the RAP 160 .
- the target entry is an entry containing a storage location of data to be read from the drive 190 by the sequential read.
- the RAP 160 receives the entry deletion command and deletes the target entry of the address conversion table 310 . In addition, the RAP 160 transmits a deletion completion notification of the target entry of the address conversion table 310 to the MP 140 .
- the MP 140 receives the entry deletion completion notification. In addition, the MP 140 executes S 1009 , S 1010 , S 1011 , and S 1014 in FIG. 10 and ends the process.
- An entry deletion command 1700 includes a plurality of fields.
- the respective fields are: a command type 1701 indicating a type of the command; a sequence number 1702 of the command; an LU number 1703 of an LU that is a target of the command; and an LBA 1704 that is a target of the command.
- a command type 1701 of “2” means that the command is a command to cause an entry of the address conversion table 310 to be deleted. Specifically, the command instructs deletion of an entry including a set of the LU number 1703 and the LBA 1704 .
- the sequence number 1702 is an identifier of the entry deletion command and, at the same time, links the entry deletion command and a response thereto with each other. A sequence number may be handled in a same manner as the entry registration command 1300 .
- Embodiment 3 by transferring data consecutive to data read from the drive 190 during a sequential read to the cache area 250 , even when the MP 140 performs a read command process, a load on the MP 140 and a response time to the read command can sometimes be reduced.
- Embodiment 4 will be described with reference to FIG. 18 .
- Embodiment 4 differs from Embodiment 1 in a write data transferring process of the MP 140 .
- a configuration of the computer system is similar to that of Embodiments 1 to 3.
- differences in configurations and steps of processes from Embodiments 1 to 3 will be mainly described. Configurations and steps of processes that are similar to those of Embodiments 1 to 3 will be denoted with similar reference numerals and a description thereof will be omitted or abridged.
- FIG. 18 shows a flow of the write data transferring process of the MP 140 according to Embodiment 4.
- the MP 140 destages write data (dirty data) in the cache area 250 to the drive 190 .
- Processes of S 1205 and thereafter are similar to Embodiment 1.
- the MP 140 synchronizes the drive 190 with the cache area 250 .
- write data dirty data
- an address indicating the data is to be registered in the address conversion table.
- the data results in a search hit in a search of the address conversion table (S 802 ) performed by the individual command processing unit 520 of the RAP 160 , the number of read commands processed by the RAP 160 can be increased.
- Embodiment 5 differs from Embodiments 1 to 4 in a configuration of a storage apparatus 1000 .
- FIG. 19 is a configuration diagram of a computer system according to Embodiment 1.
- the computer system includes the host 10 , the management terminal 20 , and the storage apparatus 1000 coupled to the host 10 via the network 30 .
- the storage apparatus 1000 includes controllers 1100 and 1200 . Since the controllers 1100 and 1200 are basically configured in a similar manner, the controller 1100 will be described hereinafter.
- the present embodiment differs from Embodiment 1 in that the controller 1100 includes an SAS (Serial attached SCSI) controller 1900 .
- the SAS controller 1900 is coupled to the switch 150 via a signal line 1901 .
- An SAS port of the SAS controller 1900 and an SAS port of a drive 1910 are coupled to each other via an SAS link 1911 . Note that the SAS ports in the SAS controller 1900 and the drive 1910 have been omitted in the drawing.
- the SAS controller 1900 converts a data transfer protocol (an SAS protocol) between the drive 1910 and the controller 1100 into a data transfer protocol inside the controller 110 and vice versa.
- the SAS controller 1900 operates as an SAS initiator device and the drive 1910 operates as an SAS target device.
- one or a plurality of SAS expanders may be coupled between the SAS controller 1900 and the drive 1910 .
- An SAS expander refers to a device having a function of a switch in an SAS network.
- the individual command processing unit 520 transmits a transfer instruction of read data designated by a read command to the SAS controller 1900 and the SAS controller 1910 reads read data in the drive 1900 .
- the MP 140 issues an instruction to the SAS controller 1900 instead of the drive 1910 and the SAS controller 1910 reads and writes data in the drive 1910 .
- the field of the drive number 1305 in the address conversion table 310 may store an SAS address corresponding to the drive 1910 instead of a drive number.
- the SAS protocol is used to couple the controllers 1100 and 1200 and the drive 1900 to each other.
- a larger number of drives 1910 can be coupled as compared to, for example, coupling provided by other transfer protocols such as PCI.
- the use of SAS expanders enables drives 1910 to be additionally installed or uninstalled without stopping the network.
- a command corresponds to a read command, a write command, a management command, or the like
- a process designated by a command other than a read command corresponds to, for example, a write command process or a management command process.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Computer Networks & Wireless Communication (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
- 10 Host computer
- 20 Management terminal
- 30 Network
- 100 Storage apparatus
- 110, 120 Storage controller
- 130 Frontend interface
- 140 Processor
- 160 Read accelerator processor
- 170 Memory
- 180 Memory
Claims (15)
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2015/052313 WO2016121026A1 (en) | 2015-01-28 | 2015-01-28 | Storage apparatus, computer system, and method |
Publications (2)
Publication Number | Publication Date |
---|---|
US20180018272A1 US20180018272A1 (en) | 2018-01-18 |
US10452557B2 true US10452557B2 (en) | 2019-10-22 |
Family
ID=56542682
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/540,042 Active 2035-10-03 US10452557B2 (en) | 2015-01-28 | 2015-01-28 | Storage apparatus, computer system, and method for improved read operation handling |
Country Status (2)
Country | Link |
---|---|
US (1) | US10452557B2 (en) |
WO (1) | WO2016121026A1 (en) |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10785300B2 (en) * | 2016-10-12 | 2020-09-22 | Dell Products L.P. | Storage rate limiting for information handling system with multiple storage controllers |
CN107003943B (en) * | 2016-12-05 | 2019-04-12 | 华为技术有限公司 | The control method of reading and writing data order, storage equipment and system in NVMe over Fabric framework |
KR102101622B1 (en) * | 2017-12-06 | 2020-04-17 | 주식회사 멤레이 | Memory controlling device and computing device including the same |
US11150834B1 (en) * | 2018-03-05 | 2021-10-19 | Pure Storage, Inc. | Determining storage consumption in a storage system |
JP6898393B2 (en) * | 2019-03-22 | 2021-07-07 | 株式会社日立製作所 | Storage system and data transfer method |
CN118779250A (en) * | 2020-02-27 | 2024-10-15 | 华为技术有限公司 | Data processing method, device and system for memory device |
CN113885779B (en) * | 2020-07-02 | 2024-03-12 | 慧荣科技股份有限公司 | Data processing method and corresponding data storage device |
US20230153023A1 (en) * | 2021-11-15 | 2023-05-18 | Samsung Electronics Co., Ltd. | Storage device and method performing processing operation requested by host |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5412787A (en) * | 1990-11-21 | 1995-05-02 | Hewlett-Packard Company | Two-level TLB having the second level TLB implemented in cache tag RAMs |
US5835962A (en) * | 1995-03-03 | 1998-11-10 | Fujitsu Limited | Parallel access micro-TLB to speed up address translation |
US6138225A (en) * | 1997-12-24 | 2000-10-24 | Intel Corporation | Address translation system having first and second translation look aside buffers |
US6470437B1 (en) * | 1999-12-17 | 2002-10-22 | Hewlett-Packard Company | Updating and invalidating store data and removing stale cache lines in a prevalidated tag cache design |
US20100325358A1 (en) * | 2009-06-22 | 2010-12-23 | Arm Limited | Data storage protocols to determine items stored and items overwritten in linked data stores |
US20130145088A1 (en) | 2010-09-24 | 2013-06-06 | Texas Memory Systems, Inc. | High-speed memory system |
US20130212319A1 (en) * | 2010-12-15 | 2013-08-15 | Kabushiki Kaisha Toshiba | Memory system and method of controlling memory system |
US20140173244A1 (en) * | 2012-12-14 | 2014-06-19 | Advanced Micro Devices, Inc. | Filtering requests for a translation lookaside buffer |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100268904A1 (en) * | 2009-04-15 | 2010-10-21 | Sheffield Robert L | Apparatus and methods for region lock management assist circuit in a storage system |
US9268695B2 (en) * | 2012-12-12 | 2016-02-23 | Avago Technologies General Ip (Singapore) Pte. Ltd. | Methods and structure for using region locks to divert I/O requests in a storage controller having multiple processing stacks |
-
2015
- 2015-01-28 WO PCT/JP2015/052313 patent/WO2016121026A1/en active Application Filing
- 2015-01-28 US US15/540,042 patent/US10452557B2/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5412787A (en) * | 1990-11-21 | 1995-05-02 | Hewlett-Packard Company | Two-level TLB having the second level TLB implemented in cache tag RAMs |
US5835962A (en) * | 1995-03-03 | 1998-11-10 | Fujitsu Limited | Parallel access micro-TLB to speed up address translation |
US6138225A (en) * | 1997-12-24 | 2000-10-24 | Intel Corporation | Address translation system having first and second translation look aside buffers |
US6470437B1 (en) * | 1999-12-17 | 2002-10-22 | Hewlett-Packard Company | Updating and invalidating store data and removing stale cache lines in a prevalidated tag cache design |
US20100325358A1 (en) * | 2009-06-22 | 2010-12-23 | Arm Limited | Data storage protocols to determine items stored and items overwritten in linked data stores |
US20130145088A1 (en) | 2010-09-24 | 2013-06-06 | Texas Memory Systems, Inc. | High-speed memory system |
US20130212319A1 (en) * | 2010-12-15 | 2013-08-15 | Kabushiki Kaisha Toshiba | Memory system and method of controlling memory system |
US20140173244A1 (en) * | 2012-12-14 | 2014-06-19 | Advanced Micro Devices, Inc. | Filtering requests for a translation lookaside buffer |
Non-Patent Citations (1)
Title |
---|
International Search Report of PCT/JP2015/052313 dated Apr. 21, 2015. |
Also Published As
Publication number | Publication date |
---|---|
WO2016121026A1 (en) | 2016-08-04 |
US20180018272A1 (en) | 2018-01-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10452557B2 (en) | Storage apparatus, computer system, and method for improved read operation handling | |
US20160342545A1 (en) | Data memory device | |
US10042560B2 (en) | Method and storage array for processing a write data request | |
JP7135162B2 (en) | Information processing system, storage system and data transfer method | |
US8402220B2 (en) | Storage controller coupled to storage apparatus | |
US8868809B2 (en) | Interrupt queuing in a media controller architecture | |
US7574577B2 (en) | Storage system, storage extent release method and storage apparatus | |
US8352681B2 (en) | Storage system and a control method for accelerating the speed of copy processing | |
US9740409B2 (en) | Virtualized storage systems | |
US9336153B2 (en) | Computer system, cache management method, and computer | |
US8316195B2 (en) | Storage system and data transfer method of storage system | |
US10877701B2 (en) | Scale-out type storage system | |
EP3037949B1 (en) | Data duplication method and storage array | |
US11327653B2 (en) | Drive box, storage system and data transfer method | |
US9672180B1 (en) | Cache memory management system and method | |
CN104969167B (en) | Control device and control method | |
US10095625B2 (en) | Storage system and method for controlling cache | |
US10884924B2 (en) | Storage system and data writing control method | |
WO2017072868A1 (en) | Storage apparatus | |
US9798661B2 (en) | Storage system and cache control method | |
US9003129B1 (en) | Techniques for inter-storage-processor cache communication using tokens | |
US11080192B2 (en) | Storage system and storage control method | |
US10235053B1 (en) | Method and system for using host driver for flexible allocation fast-sideways data movements | |
US20190179556A1 (en) | Storage control apparatus | |
US11016698B2 (en) | Storage system that copies write data to another storage system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: HITACHI, LTD., JAPAN Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:AKAIKE, HIROTOSHI;SHIMOZONO, NORIO;NAKAGAWA, KAZUSHI;REEL/FRAME:042821/0688 Effective date: 20170614 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: EX PARTE QUAYLE ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO EX PARTE QUAYLE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 4 |
|
AS | Assignment |
Owner name: HITACHI VANTARA, LTD., JAPAN Free format text: COMPANY SPLIT;ASSIGNOR:HITACHI, LTD.;REEL/FRAME:069518/0761 Effective date: 20240401 |