CN111722791A - Information processing system, storage system, and data transmission method - Google Patents

Information processing system, storage system, and data transmission method Download PDF

Info

Publication number
CN111722791A
CN111722791A CN201910777272.5A CN201910777272A CN111722791A CN 111722791 A CN111722791 A CN 111722791A CN 201910777272 A CN201910777272 A CN 201910777272A CN 111722791 A CN111722791 A CN 111722791A
Authority
CN
China
Prior art keywords
command
host
drive
data
storage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910777272.5A
Other languages
Chinese (zh)
Other versions
CN111722791B (en
Inventor
赤池洋俊
细木浩二
下园纪夫
杉本定广
横井伸浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hitachi Ltd
Original Assignee
Hitachi Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hitachi Ltd filed Critical Hitachi Ltd
Publication of CN111722791A publication Critical patent/CN111722791A/en
Application granted granted Critical
Publication of CN111722791B publication Critical patent/CN111722791B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • G06F3/0659Command handling arrangements, e.g. command buffers, queues, command scheduling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • G06F3/0679Non-volatile semiconductor memory device, e.g. flash memory, one time programmable memory [OTP]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • G06F3/0647Migration mechanisms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • G06F11/1076Parity data used in redundant arrays of independent storages, e.g. in RAID systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/0223User address space allocation, e.g. contiguous or non contiguous base addressing
    • G06F12/023Free address space management
    • G06F12/0238Memory management in non-volatile memory, e.g. resistive RAM or ferroelectric memory
    • G06F12/0246Memory management in non-volatile memory, e.g. resistive RAM or ferroelectric memory in block erasable memory, e.g. flash memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/10Address translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/10Address translation
    • G06F12/1009Address translation using page tables, e.g. page table structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0604Improving or facilitating administration, e.g. storage management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/0644Management of space entities, e.g. partitions, extents, pools
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • G06F3/0658Controller construction arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0662Virtualisation aspects
    • G06F3/0665Virtualisation aspects at area level, e.g. provisioning of virtual or logical volumes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/65Details of virtual memory and virtual address translation
    • G06F2212/657Virtual address space management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/72Details relating to flash memory management
    • G06F2212/7201Logical to physical mapping or translation of blocks or pages

Abstract

The invention provides a high-performance information processing system, a storage system and a data transmission method, which can directly transmit data from a drive shell to a host without a storage controller. Wherein the storage system comprises: the storage system includes at least one drive casing connectable to at least one host via a first network, and a storage controller connected to the drive casing, the storage controller instructing generation of a logical volume to the drive casing, the drive casing generating the logical volume in accordance with an instruction from the storage controller, providing a storage area of the storage system to the host, and receiving an IO command for the storage area of the storage system from the host.

Description

Information processing system, storage system, and data transmission method
Technical Field
The present invention relates to an information processing system, a storage system, and a data transmission method, and is suitable for a system loaded with a flash drive, for example.
Background
Conventionally, the drive box Of a storage system is JBOD (Just a Bunch Of Disks: Just Bunch), and it is the mainstream that an SAS/SATA drive is mounted in a drive slot and an SAS (Serial Attached SCSI) network can be connected as an external I/F. SAS is a communication I/F that occupies a bus in connection units, and is suitable for connection of a large number of drives, and is not suitable for performance improvement because of high overhead of connection processing. In addition, in JBOD supporting the NVMe protocol for high-speed SSD, since the connection between the storage controller of the storage system and JBOD is configured by direct connection of PCI Express (PCIe), the scalability of the drive connection is low, and the connection to many drives is not possible. Recently, with the high performance of Flash drives such as SSD (Solid State drive), FBOF (Fabric-attached Bunch of Flash: networked Flash cluster) with high performance I/F has appeared as a substitute for JBOD. FBOF can be connected to high-performance networks such as Ethernet (Ethernet) and wireless broadband (Infiniband), and has a feature of supporting NVMe over Fabrics (nvmeofs). NVMeoF refers to a specification that enables the use of NVMe protocol over a network. Under such a background, in order to bring out high performance of the SSD, high-speed data transfer and high scalability of drive connection based on network connection are required in the storage system.
(problem of data transmission band in past storage)
In the conventional storage, a front end (hereinafter abbreviated as FE) of the storage is connected to a host, and a back end (hereinafter abbreviated as BE) independent from the front end network is connected to a drive box. In the FE network, an FC (fibre channel) network and an ethernet network are the mainstream, and in the BE network, an SAS network is the mainstream. When the storage controller receives a command from the host, for example, in the case of a read command, the storage controller reads data from the drive in the drive box and performs data transfer to the host. By changing the stored BE network to a network supporting high-performance I/F of FBOF, there is an advantage that the data transmission band of the BE network can BE expanded compared to the SAS network. However, the above-described data transfer path is not changed from the conventional one, and the storage controller transfers data to the host, and therefore, there are the following problems: even if a plurality of FBOF are connected, the data transfer band of the storage controller becomes a bottleneck, and the performance of FBOF cannot be extracted.
(method for realizing high-speed data Transmission)
In recent years, efforts to support the NVMe overFabrics specification have been made mainly on ethernet network side, as with FBOF. FBOF has an I/F connectable to an ethernet network, and supports the NVMe over Fabrics specification, so that direct data transfer (hereinafter, referred to as direct transfer) can be performed between a host and FBOF in a FE network for storage without using a storage controller. Through the direct transmission, the performance bottleneck of the storage controller can be eliminated, and high-speed data transmission is realized.
(technical problem of direct Transmission implementation)
When implementing the direct transmission, there are the following 2 technical problems.
(technical problem 1) in the logical volume provided by the storage system, the address space viewed from the host is different from the address space of the drive in the FBOF, and the host cannot identify which address of the drive in the FBOF the desired data is stored.
(problem 2) in the case of improving the performance of data access by using the cache of the storage system, when new data is cached, it is necessary to read the data from the cache stored, but the host cannot determine whether the new data is cached.
For such a technical problem, for example, patent document 1 discloses the following invention: the agent software running in the host inquires the storage controller about the driver in the FBOF corresponding to the access target data of the host and the address thereof, and directly accesses the driver in the FBOF based on the obtained information.
Documents of the prior art
Patent document
Patent document 1: specification No. US9,800,661
Disclosure of Invention
Technical problem to be solved by the invention
In the invention disclosed in patent document 1, the host can directly access the FBOF driver, but the proxy software needs to perform calculation for data protection such as RAID, and there is a problem that a calculation load for performing highly reliable processing is generated on the host side.
In addition, in order to avoid competition between the operation of a program product (a function of a storage device) such as Snapshot or Thin Provisioning that operates in a storage controller and the operation of agent software, exclusive control over a network is required, which causes a problem of performance degradation.
The present invention has been made in view of the above problems, and an object of the present invention is to provide an information processing system, a storage system, and a data transfer method that realize high-speed data transfer by direct FBOF transfer without introducing special software for data processing for storage, such as proxy software, into a host.
It is another object of the present invention to provide an information processing system, a storage system, and a data transmission method that can provide data protection and a program product function by a storage device and realize high-speed data transmission from FBOF direct transmission.
Technical solution for solving technical problem
To solve the technical problem, an aspect of an information processing system of the present invention includes: at least one host; at least one drive enclosure connected with a host via a first network and provided with a storage device; and a storage controller connected to the drive enclosure, wherein the drive enclosure generates a logical volume in accordance with an instruction from the storage controller, provides a storage area of the storage system to the host, receives a read command issued from the host to the drive enclosure providing the storage area of the storage system, transmits the read command to the storage controller when the read command is received, receives an unload command required for data transmission sent to the drive enclosure from the storage controller when data in response to the read command is not located on a cache of the storage controller, and reads out data from the storage device in accordance with the unload command and transmits the data to the host.
Further, in order to solve the technical problem, an aspect of the storage system of the present invention includes: the storage system includes at least one drive casing connected to at least one host via a first network, and a storage controller connected to the drive casing, the storage controller instructing generation of a logical volume to the drive casing, the drive casing generating the logical volume according to an instruction from the storage controller, providing a storage area of the storage system to the host, and the drive casing receiving an IO command to the storage area of the storage system from the host.
Further, in order to solve the technical problem, an aspect of a data transmission method of an information processing system of the present invention includes: at least one host; at least one drive housing connected to a host via a first network and provided with a storage device; and a storage controller connected to the drive enclosure, in the data transfer method, the drive enclosure generates a logical volume in accordance with an instruction from the storage controller, provides a storage area of the storage system to the host, receives a read command issued from the host to the drive enclosure providing the storage area of the storage system, transmits the read command to the storage controller when the read command is received, receives an unload command required for data transfer sent to the drive enclosure from the storage controller in a case where data in response to the read command is not located on a cache of the storage controller, and reads data from the storage device in accordance with the unload command and transfers the data to the host.
Effects of the invention
According to the present invention, an information processing system, a storage system, and a data transmission method capable of constructing a highly reliable and high-performance information processing system can be realized.
Drawings
Fig. 1 is a configuration diagram of an information processing system of embodiment 1.
Fig. 2 is a structural view of the drive case of embodiment 1.
Fig. 3 is a configuration diagram of programs of a host, a memory controller, and a drive cartridge according to embodiment 1.
FIG. 4 is a diagram showing the identifiers of the host and NVM subsystems in NVMe over Fabrics.
Fig. 5 is a conceptual diagram of address mapping of user data.
Fig. 6 is a flowchart showing the processing sequence of host commands in the storage controller of embodiment 1.
Fig. 7 is a flowchart showing the processing sequence of an unload command for data transfer in the drivecage of embodiment 1.
Fig. 8 is a diagram showing data transmission conditions used for determining a transmission method.
Fig. 9A is a diagram showing a format of a host command.
Fig. 9B is a diagram showing a host information table of the storage controller.
Fig. 9C is a diagram showing a drive information table.
Fig. 9D is a diagram showing a format of the unload command.
Fig. 10 is a flowchart showing a processing procedure of host commands in the cache-less storage controller according to embodiment 2.
Fig. 11 is a flowchart showing host command (normal command) processing in the memory controller according to embodiment 1.
Fig. 12 is a flowchart showing a procedure of destaging (destage) processing in the storage controller according to embodiment 1.
Fig. 13 is a flowchart showing the processing procedure of host commands in the uncached storage controller according to embodiment 2.
Fig. 14 is a diagram showing the program configuration of the host, the storage controller, and the drive cartridge in a mode in which the drive case of embodiment 3 operates as a target of NVMe over Fabrics with respect to the host.
FIG. 15 is a diagram showing identifiers of a host and an NVM subsystem in NVMe over Fabrics of embodiment 3.
Fig. 16 is a flowchart showing the processing sequence of a host command and an unload command in the drive cartridge of embodiment 3.
Fig. 17 is a flowchart showing the processing sequence of the cartridge commands in the memory controller of embodiment 3.
Fig. 18 is a flowchart of a processing sequence of destaging in the storage controller of embodiment 3.
Fig. 19 is a block diagram showing a configuration of an information processing system according to embodiment 4.
Fig. 20 is a flowchart showing the processing sequence of the controller command in the drivecage of embodiment 1.
FIG. 21 is a flowchart showing the processing sequence of controller commands in the drivecage of embodiment 3.
FIG. 22 is a diagram showing a host information table of a drive cartridge.
Fig. 23 is a diagram showing a format of a box command.
FIG. 24 is a block diagram showing the programs of a host, a memory controller, and a drive cartridge according to embodiment 5.
FIG. 25 is a flowchart showing the processing sequence of host commands in the drive cartridge of embodiment 5.
FIG. 26 is a flowchart showing the processing sequence of the controller commands in the drivecage of embodiment 5.
Fig. 27 is a flowchart showing the processing sequence of the cartridge commands in the memory controller of embodiment 5.
Fig. 28 is a diagram showing a duplexed area and a parity generated area in the drive cartridge of embodiment 5.
Fig. 29 is a diagram of the correspondence relationship of double writing in the drive cartridge of embodiment 5.
FIG. 30 is a block diagram showing the programs of a host, a memory controller, and a drive cartridge according to embodiment 7.
FIG. 31 is a flowchart showing the processing sequence of host commands in the drive cartridge of embodiment 7.
Fig. 32 is a flowchart showing the processing sequence of the cartridge commands in the memory controller of embodiment 7.
FIG. 33 is a block diagram showing the programs of a host, a memory controller, and a drive cartridge according to embodiment 9.
Fig. 34 is a flowchart showing the processing sequence of host commands in the storage controller of embodiment 9.
FIG. 35 is a flowchart showing the processing sequence of an unload command in the drivecage of embodiment 9.
Fig. 36A is a diagram showing an example of an address translation table.
Fig. 36B is a diagram showing an example of the data protection driver group table.
Detailed Description
Hereinafter, embodiments of the present invention will be described with reference to the drawings. The following description and drawings are illustrative of the present invention and are omitted or simplified as appropriate for clarity of description. The present invention can be implemented in various other ways. Each constituent element may be singular or plural, unless otherwise specified.
In the following description, various information may be described by expressions such as "table", "list", and "queue", but various information may be described by data structures other than these. In the description of the identification information, expressions such as "identification information", "identifier", "name", "ID" and "number" are used, but these may be replaced with each other.
When there are a plurality of components having the same or similar functions, the description will be given with substantially the same reference numerals.
In the following description, there are cases where a process is executed by executing a program, and the program is executed by a processor (for example, a CPU) as a central processing unit to perform a predetermined process while appropriately using a storage resource (for example, a memory) and/or an interface device (for example, a communication port), and therefore the main body of the process may be the processor.
The program may be installed from a program source into a device such as a computer. The program source may be, for example, a program distribution server or a computer-readable storage medium. When the program source is a program distribution server, the program distribution server may include a processor and a storage resource for storing the program to be distributed, and the processor of the program distribution server may distribute the program to be distributed to another computer. In the following description, 2 or more programs may be implemented as 1 program, or 1 program may be implemented as 2 or more programs.
(summary of the invention)
The FBOF transfers (direct transfer) data read out from a drive of the storage system to a host based on data transfer information supplied from the storage controller. The data transfer information includes a drive in the FBOF and an address in the drive corresponding to an address of the logical volume specified by the read command from the host. The correspondence relationship between the address of the logical volume and the address of the drive in the FBOF is derived by the storage device based on the configuration information. In addition, the memory controller is sometimes referred to as a CTL on the drawings. In addition, for the purpose of contrast with a storage system including a drive, a storage controller is sometimes referred to as a storage device.
When the storage controller receives a command from the host, the data transfer information includes information such as an address of a data storage destination of the host. In a memory controller with a cache, the memory controller determines whether the cache hit or the cache miss occurs, and transmits data to the host for the data with the cache hit, and transmits data to the host for the data with the cache miss by the FBOF.
According to the storage apparatus and the data transfer method of the present embodiment, data transfer is directly performed between FBOF and the host without using the communication I/F of the storage controller, cache control, and a buffer, and improvement of read IO performance and reduction of delay (improvement of response performance) are expected. Further, regarding the read IO performance, performance adjustment by FBOF addition is expected.
[ example 1 ]
Hereinafter, an embodiment of the present invention will be described in detail with reference to the drawings.
(1) Configuration of information processing System of embodiment 1
Fig. 1 is a configuration diagram of an information processing system of embodiment 1. Fig. 1 shows a configuration of an information processing system in which a drive case is connected to a host and a storage controller via the same network (connection mode 1: embodiment 1, embodiment 2, and embodiment 3).
The information processing system 100 includes 1 or more hosts 110, a storage device 120, and a drive case 140, which are connected to each other through a Network 150 constituted by a LAN (Local Area Network) or the internet or the like. Drive housing 140 is FBOF. The drive case is sometimes referred to as ENC or drive cartridge due to the relationship described in the drawings. The storage device 120 and the drive case 140 constitute a storage system. The network 150 is a high-performance network such as Ethernet (registered trademark) or broadband over wireless (registered trademark), and supports NVMe over Fabrics (nvmeofs).
The host 110 is a computer device having information resources such as a CPU (Central Processing Unit) and a memory, and is configured by, for example, an open server or a cloud server. Host 110 sends write commands and/or read commands to storage device 120 via network 150 in response to user operations or requests from installed programs.
The storage device 120 is a device in which software necessary for providing a function as storage to the host 110 is installed, and is configured by a storage controller 121 and a storage controller 122 which are made redundant. The storage controller 121 includes a microprocessor 123, a memory 125, a front-end interface (network I/F)126, and a storage device 129. The memory controller 122 has the same structure as the memory controller 121.
The microprocessor 123 is hardware for controlling the overall operation of the memory controller 121, and includes 1 or more processor cores 124. Each processor core 124 reads and writes data to and from the corresponding driver enclosure (FBOF)140 in response to a read command and a write command given from the host 110.
The Memory 125 is formed of, for example, a semiconductor Memory such as SDRAM (Synchronous Dynamic Random Access Memory), and stores programs (including an OS (Operating System)) and data necessary for retention. The memory 125 is a main memory of the microprocessor 123, and stores programs (storage control programs and the like) executed by the microprocessor 123, a management table referred to by the microprocessor 123, and the like. The memory 125 is also used as a disk cache (cache memory) of the memory controller 121.
Various processes for providing the storage function to the host 110 are executed by the processor core 124 of the microprocessor 123 executing programs (programs shown in fig. 3, 14, and 23) stored in the memory 125. However, the following description is continued in a manner that the microprocessor 123 executes the program for easy understanding.
The network I/F126 is an interface for the host 110, and performs protocol control when communicating with the host 110 via the network 150.
The storage device 129 stores an OS, stores a control program, backs up a management table, and the like. The storage device 129 is, for example, an HDD or an SSD (Solid State Drive).
The internal configuration of the memory controller 122 is the same as that of the memory controller 121, and therefore, illustration thereof is omitted. The memory controller 121 and the memory controller 122 are connected by an inter-MP I/F134 such as a Non-transparent bridge (Non transparent bridge), and communicate control information including user data, memory configuration information, and the like. The operation of the memory controller 122 is also the same as that of the memory controller 121, and only the memory controller 121 will be described hereinafter for the sake of simplicity of description unless otherwise specified.
(2) Structure of driver shell
Fig. 2 is a structural view of the drive case. The drive case 140 is a device in which necessary software for providing functions of controlling the drive and reading and writing the drive as a storage device from the outside is installed. The drive case is composed of a redundant drive case 200, a drive case 201, and 1 or more drives 218. Redundancy of the drivecage is preferred, but not required, for improved usability and reliability of the drivecage. The drive case may be formed of a single drive cartridge without redundancy.
The drivecage 200 includes: microprocessor 202, memory 204, network I/F205, and PCIe switch 214; and a storage device 208. The drive 218 is a dual ported NVMe drive including PCIe connection ports 219 and 222. The PCIe connection ports 219 and 222 are each connected by a PCIe link 220 and a PCIe link 223 to a PCIe connection port 221 of the PCIe SW (switch) 214 of the drive enclosure 200 and a PCIe connection port 221 of the PCIe SW214 of the drive enclosure 201, respectively. The drive 218 is a storage area constituting a storage system, and is a storage device that stores data from a host. In addition, the drive 218 need not necessarily be an NVMe drive, but may be, for example, a SAS drive or a SATA drive. Further, the driver 218 need not be dual-ported, but may be a single port.
The microprocessor 202 is hardware responsible for the overall operation control of the drivecage 200, and has 1 or more processor cores 203. Each processor core 203 reads and writes data from and to the corresponding driver 218 in accordance with a read command or a write command supplied from the storage device 120, and performs data transfer with the host 110 in accordance with a data transfer command supplied from the storage device 120.
The Memory 204 is formed of a semiconductor Memory such as SDRAM (Synchronous Dynamic Random Access Memory), for storing and holding necessary programs (including OS (Operating System)) and data, and is used as a cache Memory.
The memory 204 is a main memory of the microprocessor 202, and stores a program (a drive box control program or the like) executed by the microprocessor 207, a management table referred to by the microprocessor 202, and the like. Various processes for providing the storage device 120 and the host 110 with a function of a driver box including FBOF are performed by executing a program stored in the memory 204 by the processor core 203 of the microprocessor 202. However, the following description is continued in a manner that the microprocessor 202 executes the program for easy understanding.
Network I/F205 and PCIe port 215 of PCIe SW214 are each connected by PCIe link 206 and PCIe link 216 to PCIe port 207 and PCIe port 217 of microprocessor 202, respectively.
The storage device 208 stores therein an OS, a drive box control program, backup of a management table, and the like. The storage device 208 is, for example, an HDD or an SSD.
The drivecage 201 has the same internal structure as the drivecage 200, and therefore, illustration thereof is omitted. The drivecage 200 and the drivecage 201 are connected by an inter-MP I/F213 such as a Non-transparent bridge (Non transparent bridge), and communicate user data and control information including drivecage configuration information. The operation of the drivecage 201 is also the same as that of the drivecage 200, and for the sake of simplicity of description, only the drivecage 200 will be described hereinafter unless otherwise specified.
(3) Program structure of host, storage controller and driver box
Fig. 3 is a diagram of the configuration of the host, storage controller, and drive cartridge of embodiment 1 directly participating in the program of embodiment 1, and shows the manner in which the storage controller operates with respect to the host as the target of NVMe over Fabrics (target configuration mode 1: embodiments 1, 2).
The programs of the host 110 include an application 300, a launcher driver 301, and an OS (operating system) not shown.
The application 300 is, for example, a program such as a numerical calculation program, a database, a Web service, or the like. The initiator driver 301 identifies a storage area supporting nvmeofs (nvme over fabrics) provided by the target driver, and provides an application I/F for reading, writing and other commands to the application. In embodiment 1, the initiator driver 301 of the host 110 recognizes the nvmeoff-supporting storage area provided by the target driver 302 of the storage controller 121 and the target driver 308 of the drive cartridge 200.
The program of the storage controller 121 includes a target driver 302, an initiator driver 303, a host command processing unit 304, a data transfer control unit (between the host and the storage controller) 305, a cache control unit 306, a data transfer offload unit 307, an offload command communication unit (initiator) 315, a destage processing unit 314, an address conversion unit 318, and an OS (not shown).
The target driver 302 provides the storage area supporting NVMeoF to the initiator driver 301, and accepts a host command and transmits a command completion response.
The initiator driver 303 recognizes the NVMeoF-enabled storage area provided by the target driver 308, and transmits a command to the drive cartridge 200 and receives a completion response to the command. Commands issued by the memory controller 121 to the drivecage 200 are referred to as controller commands. The drivecage 200 corresponds to the drivecage 140 of FIG. 2.
The host command processing unit 304 receives a command issued by the host via the target driver 302, and performs: analyzing the command; processing of a read command, a write command and a management command; generating a completion response for the command; the sending of a completion response of the command via the target driver 302, and the like.
The data transfer control unit (between the host and the storage controller) 305 performs data transfer processing between the storage controller supporting NVMeoF and the host in accordance with the instruction from the host command processing unit 304.
The cache control unit 306 performs: a determination of cache hit and miss based on retrieval of cache data; transitions between dirty data (state before writing to the physical drive) and clean data (state after writing to the physical drive); and control of reservation and release of the buffer area. The determination of the cache hit or miss is to determine whether or not data in response to an IO command from the host exists in the cache memory 204 of the memory controller. For example, in the case where the IO command from the host is a read command, it is determined whether data responding to the read command exists on the cache memory 204. The respective processes of these cache controls are well known techniques, and a detailed description thereof will be omitted.
The data transfer offload section 307 generates an offload command (data transfer parameter) for data transfer to instruct the drivecage (200 or 201) to perform data transfer to the host in the offload command. The unload command communication unit (initiator) 315 transmits an unload command to the drive cartridge and receives a response.
The destaging processing unit 314 performs destaging processing for writing data in the cache to the drive via the initiator driver 303. The address conversion unit 318 has a mapping table between the data range 505 of the namespace 504 managed by the storage controller and the storage area 509 in the drivebox 200, the drive 508, and the drive 508 as the storage destination of the data, and converts the address of the data range 505 into the address of the storage area in the corresponding drivebox 200, the drive, and the drive.
The programs of the drivecage 200 include a target driver 308, a controller command processing section 309, a data transfer control section (between the host and the drivecage) 310, a data transfer control section (between the storage controller and the drivecage) 316, an unload command communication section (target) 313, an unload command processing section 311, a drive control section 312, a buffer control section 317, and an unillustrated OS.
The target driver 308 provides the initiator driver 301 and the initiator driver 303 with a storage area supporting NVMeoF, and receives a host command and transmits a command completion response to the host, and receives a controller command and transmits a command completion response to the storage controller.
The controller command processing unit 309 receives a command issued by the storage controller by using the target driver 308, and performs command analysis, read/write processing, command completion response generation, command completion response transmission to the initiator driver 303 via the target driver 308, and the like.
The data transfer control unit (between host and cartridge) 310 performs data transfer processing between the host and the drive cartridge that support NVMeoF in accordance with the instructions from the controller command processing unit 309 and the unmount command processing unit 311. The data transfer control unit (between the storage controller and the cartridge) 316 performs data transfer processing between the storage controller and the drive cartridge that support NVMeoF in accordance with an instruction from the controller instruction processing unit 309.
The unload command communication unit (target) 313 receives an unload command from the storage controller and transmits a response. The unload command processing unit 311 receives an unload command for data transfer from the storage controller 121, and performs analysis and read processing of the unload command, generation of a completion response to the unload command, transmission of a completion response to the unload command, and the like.
The drive control unit 312 manages the drive 218, and performs read/write processing on the drive 218 in accordance with instructions from the controller command processing unit 309 and the unload command processing unit 311. The buffer control section 317 secures and releases a buffer that is a temporary storage area for data transfer.
(4) Identifiers of host and NVM subsystems in NVMe over Fabrics
FIG. 4 is a diagram showing the identifiers of the host and NVM subsystems in NVMe over Fabrics. That is, the figure shows identifiers of the host and the NVM subsystem in the NVMe over Fabrics related to the target configuration mode 1.
The identifier is NQN (NVMe Qualified Name: NVMe Qualified Name) of the NVMe over Fabrics specification, which is unique within the network (Fabric). The NVM subsystem refers to a logical drive having a storage area (referred to as a namespace in NVMe specification) and a processing function of managing IO commands such as commands and read/write commands. NQN of FIG. 4 is represented in a simplified string of characters in a format not specified in the specification for ease of understanding.
Host 110 has at least 1 identifier 401 (host NQN). The number of hosts 110 may be plural, and illustration thereof is omitted. Drive cartridge 200 has at least 1 identifier 402(NVM subsystem NQN). For example, each drive 218 of the drive housing 140 has 1 identifier 402. Here, drive 218 is an NVMe drive, and drive 218 is also an NVM subsystem with 1 or more namespaces within the NVM subsystem. For example, the name space of the corresponding driver 218 is allocated within the NVM subsystem of identifier 402 described above, providing a storage area for host 110 or storage 120. The above description of the drivecage 201 is also omitted. The number of the drivecage 200 and the drivecage 201 may be two or more, and illustration thereof is omitted.
Storage controller 121 has at least 1 identifier 403(NVM subsystem NQN). In the NVM subsystem corresponding to the identifier 403, a logical storage region allocated to a portion of the storage pool is allocated as a namespace. The storage pool is a storage area constructed by storage areas of the plurality of drives 218 and protected by data such as RAID. The same applies to the NVM subsystem of the memory controller 122, and the description thereof is omitted.
Upon startup of storage 120 and drive enclosure 140, drivebox 200 (and drivebox 201) creates an NVM subsystem with identifier 402 described above. Further, the storage controller 121 (and the storage controller 122) transmits a connect (connect) command to the drive cartridge 200 (and the drive cartridge 201), thereby enabling command transmission and data transfer to the NVM subsystem of the drive cartridge 200 (and the drive cartridge 201), and creates the NVM subsystem having the above-described identifier 403.
Host 110 sends a connect command to storage controller 121 (and storage controller 122) and drive cartridge 200 (and drive cartridge 201), thereby enabling command transmission and data transfer to the NVM subsystems of storage controller 121 (and storage controller 122) and drive cartridge 200 (and drive cartridge 201).
(5) Address mapping of user data
Fig. 5 is a conceptual diagram for explaining address mapping of user data. That is, the conceptual diagram is used to explain the address mapping of the user data.
The host 110 includes a continuous virtual memory 500 provided by the OS to the application program and a physical memory 502 as a storage destination of actual data.
When an application program of the host 110 issues a read command to the storage controller 121, for example, the virtual memory area 501 is secured as a storage destination of read data in the virtual memory 500. The virtual memory area 501 corresponds to a physical memory area 503 in a physical memory in units of memory management, that is, in units of pages. The read command issued by the application 300 to the storage device 120 has fields specifying the namespace 504 of the read object (corresponding to a logical volume in the storage device), the address in the namespace 504 corresponding to the data range 505 in the namespace 504, the length of the data transfer, and the physical memory region 503 for the data transfer in the host 110.
The data of the data ranges 505 "a" - "d" are stored in the buffer sector 507, which is a buffer management unit of the buffer 506 in the storage controller 121, or in the storage area 509 in the drive 508 connected to the drivebox 200. The cache 506 is used for temporary storage of data. For example, 64KB of data can be stored in 1 buffer block. In embodiment 1, the unit of cache management is described as a cache block, but the unit of cache slot in which 1 or more cache blocks are associated may be managed.
Fig. 5 shows, as an example, a state in which data is written in the "a" portion in the data range 505, new data is stored in the buffer sector 507, and data in the "a" portion of the storage area 509 in the drive 508 is old data. When the data of the buffer sector 507 is written to the "a" portion of the storage area 509 and updated to new data by the destage processing of the storage controller 121, the buffer sector 507 is released and made reusable.
The mapping of cache blocks 507 corresponding to data ranges 505 in namespace 504, or the mapping of addresses of driveenclosures 200, drives 508, and storage areas 509, is managed by storage controller 121.
The mapping of the cache block 507 corresponding to the data range 505 in the namespace 504 is the same as that of the conventional cache memory, and the description thereof is omitted.
The mapping of addresses of the drivecage 200, drives 508, and storage areas 509 corresponding to the data ranges 505 in the namespace 504 is illustrated in FIG. 36A.
(36) Address translation table and data protection driver group table
Fig. 36A is a diagram showing an address translation table, and fig. 36B is a diagram showing a data protection drive group table. These address translation table and data protection drive group table are managed by the memory controller 121.
Fig. 36A is a diagram showing an address translation table 3600 that is mapping information between a data range 505 and a data storage destination address in a namespace 504. The address translation table 3600 is used in address translation processing for translating an address of a logical volume to an address of a data storage destination. The address translation table 3600 includes entries for a logical address 3601, a drive region number 3602, and a drive address 3603.
In an actual storage system, there are a plurality of tiers such as logical volumes, storage pools, caches, storage areas protected by RAID, mirroring, or the like, and drives, and the address conversion process is performed in a plurality of tiers. In this embodiment, for the sake of simplifying the description, a layer other than the layer necessary for the description of the embodiment is omitted, and only the correspondence relationship between the logical volume and the address of the drive is described as an example. The logical volume corresponds to a pair of the NVM subsystem and the namespace. In the example, it is assumed that 1 address translation table 3600 exists for each logical volume. Logical address 3601 is a logical address within a logical volume. Drive zone number 3602 is an identification number for drive 508.
Drive zone number 3602 is described in detail with respect to FIG. 9C. Drive address 3603 is the address of the data storage destination within drive 508. In the following description, a drive address is sometimes referred to as a physical address. The manner of the table element of the drive address 3603 depends on the storage manner of data. In this embodiment, the data protection scheme is RAID, and the logical address 3601 is associated with the drive area number 3602 and the drive address 3603 in the address translation table 3600. When the data protection method is mirroring, the address translation table 3600 associates the logical address 3601 with the drive area number 3602 and the drive address 3603 of the mirror source and the mirror destination, respectively.
The management unit of the address in the address translation table, that is, the unit of correspondence between the logical address and the drive address is, for example, a RAID stripe unit. The logical volume has a block size of, for example, 512B and the RAID stripe size of, for example, 512KB (1024 blocks). In the present embodiment, the address conversion process is described using an address conversion table so that the correspondence relationship between logical addresses and data storage destinations can be easily understood. In the data protection method such as RAID or mirroring, address conversion can be performed by calculation, and the address conversion process is not limited to this method. For example, in RAID, there is periodicity in the correspondence between logical addresses and drive addresses in units of parity cycles (parity cycles), and the drive area number 3602 and the drive address 3603 can be calculated from the logical address 3601 using the drive configuration and periodicity of RAID groups. The drive configuration of the RAID group will be described with reference to fig. 36B.
Fig. 36B is a diagram showing a data protection drive group table 3610 that is management information of a drive group used for data protection. The data protection drive group table 3610 includes items of a drive group number 3611, a data protection method 3612, and a drive configuration 3612.
The driver group number 3611 is an identification number of the driver group. Data protection scheme 3612 shows the data protection scheme for the driver group. Such as RAID5(3D +1P), RAID6(6D +2P), mirroring, and the like. "D" denotes a data drive and "P" denotes a parity drive. For example, "3D + 1P" indicates that the system is composed of 3 data drives, 1 parity drive, and 4 drives in total. The drive configuration 3612 indicates the drive zone numbers of the drives that make up the drive group. The data protection drive group table 3610 is managed and stored by the storage device 120 as 1 of configuration information of the storage system.
(6) Processing order of host commands in a storage controller
Fig. 6 is a flowchart showing the processing sequence of host commands in the storage controller of embodiment 1. That is, the flowchart shows the processing procedure of the host command in the storage controller according to the target configuration 1.
When the target driver 302 of the storage controller 121 receives a command from the host 110, the host command processing section 304 starts the processing after step 600.
First, the host command processing unit 304 acquires an identifier 923 (403 in fig. 4) of the NVM subsystem NQN (see fig. 9B) using the information in the host information table 920 of the storage controller, analyzes the received NVMe command (see fig. 9A for the received NVMe command), and reads the command type 912, the NID (namespace ID)913, which is an identifier of the namespace, the start address 914, and the field 915 of the data transfer length (step 601).
Processing then branches with the type of command (step 613). In the case where the command category 912 is an IO command (read command or write command), step 602 is entered. In the case where the command category is a management command (a command for creation or deletion of a namespace, an information acquisition command of the NVM subsystem, a setting command of the NVM subsystem, etc.), step 614 is entered. The flow of the case where the command type is an IO command in step 613 will be described below.
When the process branches from step 613 to step 602, the cache control unit 306 determines a cache hit or miss based on the information on the identifier 403 of the storage controller, the NID of the received NVMe command, the start address, and the data transfer length, which are obtained from the target driver 302 (step 602).
Next, when the cache hit or miss determination is performed, a determination of a data transfer method is performed based on the information of the command type and the data transfer length (step 603). The data transfer method is determined by determining whether to perform normal data transfer or to unload data transfer to the drive case, according to the table shown in fig. 8.
The process then branches by the data transfer method (step 604). If the data transfer method is normal data transfer, the process proceeds to step 605, and if the data transfer method is off-load, the process proceeds to step 606. When normal data transfer is performed, normal command processing is performed (step 605). A general command process will be described with reference to fig. 11. Finally, the process ends (step 610).
The flow of the process of unloading data transfer to the drive case will be described below, returning to the description of the flowchart after step 606.
When the process branches from step 604 to step 606, the data transfer offload unit 307 refers to the address translation table 3600 to generate data transfer parameters (offload command) necessary for data transfer, based on the information of the start address and the data transfer length (step 606). That is, the storage controller refers to the address translation table 3600, and generates an unload command including a physical address to a drive of a storage destination of data corresponding to a command received from the host.
The unload command includes information identifying the host NQN of the host, the address of the data storage destination of the host, the address of the storage device of the data storage destination, and the data transfer length. The unload command includes necessary control data, and the method of generating the unload command is described in the description of fig. 9A to D.
Next, the data transfer offload unit 307 refers to the address translation table 3600, and the address translation unit 318 recognizes the identifier 923 (403 in fig. 4) obtained in step 601, the NID, and the drive box 200 of the data storage destination corresponding to the start address, and transmits an offload command to the drive box 200 using the offload command communication unit (initiator) 315 (step 607).
Next, the completion of the unload command from the drivecage 200 is awaited (step 608). Next, the data transfer offload unit 307 receives the completion response of the offload command from the drivecage 200 using the offload command communication unit (initiator) 315, and analyzes the completion response of the offload command (step 611). In the NVMe protocol, a command is processed by a queue, and therefore a device that processes the command must respond to the command issuing source with a completion. That is, this is because, in the case where the command from the host is a read command, a completion response must be returned from the storage controller that is the delegation target of the command to the host. When the response error is completed, the processing at the time of the occurrence of the abnormality is performed, but the description thereof is omitted here. The description is continued assuming the completion response is successful later.
Next, the host command processing section 304 generates a completion response to the command of the read command of the host (step 612). Next, the target driver 302 is used to send a read command completion response to the host 110 (step 609), completing the process (step 610). In addition, in the case where the data to be transferred spans the drives 218 of the plurality of driveboxes 200, the processes of step 606 and step 607 are performed for the plurality of driveboxes 200. Further, in step 608, the completion of the unload command from all the driveboxes 200 as targets of sending the unload command is waited for.
Next, returning to the description of steps 614 and thereafter in the flowchart, the flow when the command type is the management command in step 613 will be described.
When the process branches from step 613 to step 614, the host command processing unit 304 performs a process of a management command in accordance with the content specified by the management command (step 614). Next, a completion response for the command is generated that includes the results of the processing of the management command (step 615). Next, a completion response to the command is sent to the host 110 using the target driver 302 (step 616).
In this manner, when an IO command, for example, a read command is received, the memory controller transfers the read data to the host in the case of a cache hit, and generates an unload command with reference to the address translation table in the case of a cache miss, and controls the driver enclosure (FBOF) to directly transfer the read data to the host. In addition, although the read data is directly transferred from the drive case to the host, the command completion response needs to be performed by the host command processing section 304 of the memory controller that has received the command from the host.
(11) The processing order of host commands in the storage controller is subsequent (normal command processing).
Fig. 11 is a flowchart showing the processing procedure of the normal command processing, which follows the processing procedure of the host command in the memory controller according to embodiment 1. That is, the processing (normal command processing) of step 605 in the flowchart showing the processing procedure of the host command in the storage controller according to the target configuration 1 is described.
First, the host command processing unit 304 branches the processing in the command type (step 1101). In the case where the command type is a read command, step 1102 is entered. If the command type is a write command, the process proceeds to step 1113.
In the case where the process branches from step 1101 to step 1102, the process branches with a cache hit/miss (step 1102). In the case of a cache hit, step 1103 is entered. In the case of a miss, step 1106 is entered. Here, the determination of the cache hit or miss is performed to determine whether or not data in response to an IO command from the host exists in the cache memory 204 of the memory controller. For example, in the case where the IO command from the host is a read command, it is determined whether data responding to the read command exists in the cache memory 204.
Next, a description will be given of a flow in a case where the command type is a read command in step 1101 of the flow chart and a cache hit in step 1102. When the process branches to step 1103, the data transfer control section (between the host and the memory controller) 305 transfers the data of the address range specified by the read command from the cache 506 to the physical memory area 503 in the host 110 specified by the read command (step 1103).
Next, the host command processing unit 304 generates a command completion response (step 1104). Next, the target driver 302 is used to send a completion response to the command to the host (step 1105). Finally, the process ends (step 1118).
Next, a description will be given of a flow in a case where the command type is a read command in step 1101 of the flow chart and a cache miss in step 1102. When the process branches from step 1101 to step 1102, and the process branches from step 1102 to step 1106 again, the cache control unit 306 secures a cache area for storing read data (step 1106). Next, the host command processing unit 304 refers to the address translation table 3600, and the address translation unit 318 recognizes the identifier 403 and NID obtained in step 601, and the drive box 200 and the drive 508 of the data storage destination corresponding to the start address, and issues a read command of the controller command to the drive box 200 using the initiator driver 303 (step 1107).
The read destination of the read command is obtained by address conversion by the address conversion unit 318, and is the drive cartridge 200, the drive 508, and the storage area 509 in the drive 508 corresponding to the data range 505. As a transfer destination of the read data, the address of the buffer area secured in step 1106 is specified in the command. In addition, in the NVMe specification, in the case where the NVMe transfer uses RDMA, as information necessary for data transfer, an address of a memory area of a command issuing source is specified. In addition, in the NVMeoF specification, a management (Admin) queue and an IO queue are created between the host and the NVM subsystem by connecting commands, and the commands and completion responses are sent and received via these queues. Hereinafter, for simplicity of explanation, the transmission and reception of the command and completion response to and from the NVM subsystem corresponding to the driver 508 will be described as the transmission and reception of the command and completion response to and from the driver 508.
Next, the read command from the drivecage 200 is awaited for completion (step 1108). Next, the host command processing unit 304 receives the completion response of the read command from the drive cartridge 200 by using the initiator driver 303, and analyzes the completion response of the read command (step 1109). When the response error is completed, the processing at the time of occurrence of the abnormality is performed, but the description thereof is omitted here. The explanation is continued with the completion response being successful later.
Next, the data transfer control unit (between the host and the memory controller) 305 transfers the read data stored in the cache memory from the cache memory 506 to the physical memory area 503 in the host 110 designated by the read command (step 1110).
After the data transfer is completed, the host command processing unit 304 generates a completion response to the command of the read command from the host (step 1111). Next, a completion response to the command is sent to the host 110 using the target driver 302 (step 1112). Finally, the process is completed (step 1118).
Next, a description will be given of a flow in a case where the command type is a write command in step 1101 of the flowchart. When the process branches from step 1101 to step 1113, the buffer control unit 306 secures a buffer area for storing write data (step 1113).
Next, the data transfer control unit (between the host and the memory controller) 305 transfers the data of the physical memory area 503 in the host 110 designated by the write command to the secured buffer area (step 1114). Then, the write data transferred to the buffer area is transferred to another memory controller, and the write data is stored in the buffer areas of the two memory controllers (step 1115). This is referred to as a double write of the cache.
Next, the data transfer control unit (between the host and the memory controller) 305 generates a command completion response corresponding to the write command of the host 110 (step 1116). Next, the target driver 302 is used to send a completion response to the command to the host 110 (step 1116). Finally, the process is completed (step 1118).
(12) Destage processing order in a storage controller
Fig. 12 is a flowchart showing a procedure of destaging processing in the storage controller. That is, the present invention is a flowchart showing a procedure of destaging processing in the storage controller according to the target configuration system 1.
When the destaging processing unit 314 determines that the destaging condition is satisfied (for example, the dirty buffer amount is equal to or greater than the threshold), the destaging processing unit 314 starts the processing from step 1200 and thereafter.
The destaging processing unit 314 repeats the subsequent processes until the destaging target data stored in the buffer is written in the drive (step 1201). Since the method of selecting the destaging target data is not essential to the present embodiment, the description thereof will be omitted. The destaging processing section 314 generates a write command of a controller command for writing the destaging target data (step 1202).
The write destination of the write command is converted by the address conversion unit 318 into an address corresponding to the data range 505, and is the drive cartridge 200, the drive 508, or the storage area 509 in the drive 508. Next, a write command is sent to the drivecage 200 via the initiator driver 303 (step 1203). Next, the command from the drivecage 200 is awaited for completion (step 1204). Next, a completion response of the command is received from the drive box 200 via the initiator driver 303, and the analysis of the completion response of the command is performed (step 1205). When the response error is completed, processing at the time of occurrence of an abnormality is performed, and description thereof is omitted. After that, the explanation is continued with the completion response being successful.
Subsequently, when the repetition of step 1201 continues, the process proceeds to step 1202. When the iteration of step 1201 ends, the destaged buffer area is released (step 1206). The process is finally ended (step 1207).
(20) Processing order of controller commands in a drive cartridge
FIG. 20 is a flow chart showing the processing sequence of controller commands in the drivecage. When the target driver 308 of the drivecage 200 receives the controller command from the storage controller 121, the controller command processing section 309 starts the processing from step 2000 onward.
First, the controller command processing unit 309 analyzes the command received from the memory controller 121, and reads the command type, the NID (namespace ID) which is the identifier of the namespace, the start address, and the field of the data transfer length (step 2001).
Next, processing branches with the command category (step 2002). In the case where the command type is a read command, the process proceeds to step 2003. In the case where the command type is a write command, step 2009 is entered. In the case where the command category is a management command, step 2015 is entered.
The following describes a flow in the case where the command type is a read command in step 2002. When the process branches from step 2002 to step 2003, the controller instructs the processing section 309 to secure a buffer area for storing read data (step 2003). Next, the data to be read is read from the drive into the reserved buffer area (step 2004). The drive 508 storing the data of the read object is identified by the identifier 402 of the transmission destination of the controller command. The namespace ID and starting address of the read command issued to the drive, data transfer length, specify the values of the fields of step 2001. That is, the drive cartridge reads data from a drive as a storage device in accordance with an unload command from the storage controller. The method of the drivecage reading data from its own drive is a general method, and details are omitted.
Next, the read data stored in the buffer area is transferred to the memory controller (step 2005). In this embodiment, the nvmeoft specification is premised on that NVMe transmission uses RDMA (Remote Direct Memory Access). That is, the data transfer is performed in an RDMA Write (RDMA Write) to the memory area of the command issuing source specified by the command. Next, a completion response of the command corresponding to the read command from the memory controller 121 is generated (step 2007). Next, a completion response to the command is sent to the storage controller 121 via the target driver 308 (step 2008). Finally, the process ends (step 2018).
Next, returning to the description of steps 2009 and thereafter in the flowcharts, the flow of processing in the case where the command type is a write command will be described. When the process branches from step 2002 to step 2009, a buffer area in which write data is stored is secured (step 2009). Next, the write data is transferred from memory controller 121 (step 2010). Data transfer is performed in RDMA Read (RDMA Read) for a memory area of a command issuing source specified by a command, in accordance with the specification of NVMeoF in the case of using RDMA in NVMe transfer.
Next, the write data stored in the buffer area is written to the drive (step 2011). The driver 508 of the write object is identified with the identifier 402 of the transmission destination of the controller command. The namespace ID and starting address of the write command issued to the drive, data transfer length, specify the values of the fields of step 2001. The method of writing data to the drive of the drive cartridge itself is a general method, and details thereof are omitted.
Next, a completion response of the command corresponding to the write command from the memory controller 121 is generated (step 2012). Next, a completion response to the command is sent to the memory controller 121 via the target driver 308 (step 2014). Finally, the process ends (step 2018).
Next, returning to the description of step 2015 and thereafter in the flowchart, the process flow when the command type is a management command will be described. When the process branches from step 2002 to step 2015, a process of managing commands is performed (step 2015). Next, a command completion response corresponding to the management command from the storage controller 121 is generated (step 2016). Next, a completion response to the command is sent to the storage controller 121 via the target driver 308 (step 2017). Finally, the process ends (step 2018).
(7) Processing order of offload commands for data transfers in a drive enclosure
Fig. 7 is a flowchart showing the processing sequence of an unload command for data transfer in the drivecage of embodiment 1. That is, the flowchart shows the processing procedure of the unload command for data transfer in the drive cartridge according to the target configuration 1.
The unload command processing section 311 of the drivecage 200 starts the processing from step 700 onward when it receives an unload command from the storage controller 121 via the unload command communication section (target) 313.
First, the unload command processing unit 311 reads each field of the unload command (step 701). Each field is described with reference to fig. 9D. Next, a buffer is secured in memory 204 for storing the read data (step 708). Next, from the information of NVM subsystem NQN and NID of the field read in step 701 and the mapping information of NVM subsystem NQN and driver 218 within the drive box, the corresponding driver 218 is identified, and a read command is issued to driver 218. For the start address and the data transfer length of the read command, the start address and the data transfer length of the field read in step 701 are specified, and the address of the buffer secured in step 708 is specified as the storage destination of the data (step 702). The drive reads data from the storage device storing the data in accordance with the unload command.
Next, the completion of the read command from the drive 218 is awaited (step 703). Next, a completion response of the read command from the drive 218 is received, and the content of the completion response is analyzed (step 707). When the response error is completed, the processing for the occurrence of the abnormality is performed, but the description thereof is omitted here. The description is continued on the premise that the completion response is successful.
Next, the data transfer control unit (between the host and the cartridge) 310 transfers the read data in the buffer to the host 110 (step 704). The data transfer control (between host and cartridge) 310 performs RDMA-based data transfer between the drivecage 220 and the host 110 via the network I/F205.
The data transfer control unit (between the host and the cartridge) 310 generates an RDMA write command for data transfer of read data, and queues the command for RDMA communication. For the RDMA write command, the data transfer length, the address of the buffer that becomes the data transfer source, and the memory address and R _ key of the field read in step 701 are specified as information that identifies the physical memory area 503 of the host 110 that becomes the data transfer target. The queue for RDMA communication is generated in advance between the network I/F of the host and the network I/F205 by the above-described connection command.
In the NVMe protocol, a command is processed by a queue, and therefore a device that processes the command must respond to the command issuing source with a completion. That is, this is because, when the command from the host is a read command, the completion response needs to be returned from the storage controller that is the delegation target of the command to the host. However, the data in response to the command need not necessarily be transferred from the device that is the delegate target of the command linked by the queue, and thus the data is transferred directly from the drive cartridge to the host, thereby eliminating the bottleneck of the storage controller.
Next, after the data transfer control section (between the host and the cartridge) 310 completes the data transfer, the unload command processing section 311 releases the buffer (step 709). Subsequently, a completion response of the unload command is transmitted to the storage controller 121 via the unload command communication unit (target) 313 (step 705), and the process is terminated (step 706).
(8) Decision of data transmission method
Fig. 8 is a diagram showing data transmission conditions and data transmission types used for determining a transmission method. The data transfer category is represented by way of an IO mode 800, which is classified into a case where the data transfer length is less than a threshold or greater than a threshold, and a command category of read or write. For each classification, the transfer conditions are determined in the case of a cache hit 801 and the case of a cache miss 802.
In embodiment 1, the condition that data transfer can be performed directly from the drive case 140 to the host 110 is that the command category is a case of read and cache miss. In addition, when the data transfer length is large, the possibility of sequential access is high, and the performance improvement benefit by direct data transfer is large. On the other hand, when the data transfer length is small, since there is a high possibility of random access and the performance improvement benefit due to cache hit is large, data is copied to the cache 506 in normal command processing.
In addition, the threshold value of the data transfer length does not need to be fixed, and may be designed to be changeable according to the workload of the storage device.
(9) Host command format, host information table of storage controller, drive information table, and unload command format
Fig. 9A is a diagram showing a format of a host command, fig. 9B is a diagram showing a host information table of a storage controller, fig. 9C is a diagram showing a drive information table, and fig. 9D is a diagram showing a format of an unload command.
The fields of the host command shown in fig. 9A include a command identifier 911, a command category 912, NID913, a start address 914, a data transfer length 915, a memory address 916, and R _ key 917.
The command identifier 911 is an identifier for identifying each command. For example, in a configuration in which a plurality of commands are executed in multiple, a completion response for an issued command is associated with the command. Command identification based on the command identifier is a widely known method for executing commands, and detailed description thereof is omitted.
The command type 912 is a code (symbol) indicating a read command, a write command, and a management command.
NID913 is the namespace ID within the NVM subsystem. In embodiment 1, it is the NVM subsystem of the memory controller 121. In addition, NQN of the NVM subsystem is registered in NVM subsystem NQN923 of the host information table of fig. 9B.
The start address 914 and the data transfer length 915 are addresses and data transfer lengths within a namespace of data of the data transfer object.
The memory address 916 is an address of a memory area in the host that is a target of data transfer designated by the host 110. R _ key917 is an identifier of the memory region in the host described above. In the host command, a field having a low importance in the description of embodiment 1, that is, a metadata pointer, is omitted from the drawing. Metadata is additional data assigned to a drive or logical volume in units of logical blocks (for example, 512B). Since embodiment 1 can be applied regardless of the presence or absence of metadata, the description thereof is omitted.
In addition, for simplicity of explanation, in fig. 9A, there are only 1 set of the memory addresses 916 and the R _ key917, but a list composed of a plurality of sets may be used. Similarly, the memory address and the R _ key will be described as 1 group hereinafter including the description of the unload command, but a list including a plurality of groups may be used.
Fig. 9B is a host information table of the storage controller. Host information table 920 contains entries for queue number 921, host NQN922, NVM subsystem NQN 923.
Queue number 921 is the number of the IO queue between the host and the NVM subsystem. The storage controller 121, when receiving a connection command from the host 110 to generate an IO queue, numbers the queue number in order to internally manage the IO queue. The queue number is a unique value inside the memory controller 121. Host NQN922 and NVM subsystem NQN923 are NQN of host 110 and NQN of the NVM subsystem of storage controller 121, respectively, connected by the IO queues described above.
The drive information table 930 shown in fig. 9C contains entries of a drive region number 931, a drive cartridge (ENC) number 932, an NVM subsystem NQN933, and an NID 934.
The drive area number 931 is a number that stores an area of the drive 218 used in the controller 121. The driver 218 corresponds to the driver 508 in fig. 5. The storage controller 121 encodes the drive region number in order to manage the region of the drive 218 in namespace units. The drive zone number is a unique value inside the memory controller 121.
The drive cartridge number 932 is the number of the drive cartridge 200 having the drive 218 inside. The storage controller 121 encodes the cartridge number 932 for managing the cartridge 200. The drive cartridge number 932 is a unique value inside the storage controller 121. NVM subsystem NQN933 and NID934 are identifier 402 corresponding to driver 218 and the namespace ID within driver 218.
Fig. 9D shows the format of the unload command. The fields of the offload command 900 include a command identifier 908, a host NQN901, a memory address 902, an R _ key903, a data transfer direction 909, an NVM subsystem NQN904, a NID905, a start address 906, and a data transfer length 907.
The command identifier 908 is an identifier for identifying each command. The host NQN901 is a host NQN of the host 110 that is a target of the drive case 140 for data transfer. The memory address 902 is an address of a memory area in the host that is a target of data transfer designated by the host 110. R _ key903 is an identifier of the memory region in the host described above. The data transfer direction 909 represents either data transfer from the drivecage 200 to the host 110, or data transfer from the host 110 to the drivecage 200. NVM subsystem NQN904 and NID905 are namespace IDs within NVM subsystem NQN and the NVM subsystem of drive box 200, respectively. The start address 906 and the data transfer length 907 are an address and a data transfer length within a namespace of data of a data transfer object. The NID905, the start address 906, and the data transfer length 907 are information obtained from the logical address of the host command with reference to the address translation table 3600.
The values of the fields 901 to 909 of the unload command are set by the host command processing unit 304 as follows.
The host command processing unit 304 compares the IO queue of the target driver 302 of the storage controller 121 that received the command from the host 110 with the entry in the host information table 920 of the storage controller, sets the host NQN922 corresponding to the IO queue as the host NQN901, and identifies the NVM subsystem NQN923 as the identifier 403. This process is performed in step 601 of fig. 6.
The host command processing section 304 sets the memory addresses 916 and R _ key917 specified by the host 110 within the host command as the memory addresses 902 and R _ key 903. The host command processing section 304 uses the address conversion section 318 to identify the driver 508 of the data storage destination and the address of the data storage destination from the information of the identifier 403 (corresponding to the NVM subsystem NQN 923) obtained in step 601, the NID913, the start address 914, and the data transfer length 915 of the host command.
Specifically, this is performed as follows. First, the address converter 318 converts "(a) the identifier 403(NVM subsystem NQN)", "(B) the NID913 of the host command" (corresponding to a logical volume in the storage device), "(C) the start address 914" (corresponding to a logical address in the namespace) into "(D) the drive region number 3602" and "(E) the drive address 3603" using the address conversion table 3600.
Next, the address conversion unit 318 "converts" (D) the drive region number 3602 into "(F) the ENC number 932", "(G) the NVM subsystem NQN 933", and "(H) the NID 934", using the drive information table 930.
The drivecage 200 of the transmission destination of the unload command is identified by "(F) ENC number". The NVM subsystem NQN904, NID905, and start address 906 of the unload command correspond to "(G) NVM subsystem NQN", "(H) NID", "(E) driver address", respectively.
The command identifier 908 is a unique value between executing offload commands. In embodiment 1, the unloading is performed only at the time of the read command, and therefore the data transfer direction 909 is only from the drivecage 200 to the host 110.
The information of each field of the unload command in fig. 9D is not limited to the above-described order. For example, the setting can be performed by aggregating information settable by a host command.
As described above, according to embodiment 1, when the memory controller that receives the read command from the host determines that there is a cache miss and the data transfer length of the read command is longer than the threshold value, the read data is directly transferred from the driver enclosure that is the FBOF to the host. Therefore, even when a plurality of drive cases are connected to the storage controller, the bottleneck of the storage controller can be eliminated, and high-speed data transmission can be realized.
[ example 2 ]
The embodiment 1 describes a mode in which the memory controller has a cache, and the embodiment 2 describes a mode in which the memory controller does not have a cache. Since the structure of the information processing system is similar to the processing of the storage controller and the drive cartridge in many cases in the cache-less storage controller, differences from embodiment 1 will be described later. In example 2, the description is the same as that of example 1 except for the differences described in example 2, and therefore, the description is omitted.
In the non-cache memory controller, there is a difference in structure between the cache controller 306 in fig. 3 and the cache 506 in fig. 5. Accordingly, the write data is transferred (destaged) to the storage area 509 in the drive 508 connected to the drivecage 200, whereby the data of the storage area 509 in the drive 508 is instantly reflected. However, in destaging, new data and old data are mixed in a storage area 509 in the drive 508, and a part of the steps of the control of fig. 6 is changed so that the storage device can respond to the host with data having consistency. In order to determine whether or not the storage area 509 in the drive 508 is destaged, the storage controller 121 may manage the destaged state for each storage area 509 by having a bitmap.
Since the data transfer method is determined to be cache-free, fig. 8 shows a case 802 where only a cache miss occurs, and in the case of reading, the data transfer is always a direct transfer. That is, in fig. 8, the determination of the data transfer method corresponds to the case where there is no cache hit 801 and the threshold value is 0 byte.
(10) Processing order of host commands in a cache-less storage controller
FIG. 10 is a flow chart showing the processing sequence of host commands in a non-cached storage controller.
Except for step 1002, steps 600 to 610 are the same as steps 1000 to 1010. In step 1002, the host command processing unit 304 determines whether or not the data in the area has been destaged based on the identifier 403 obtained from the target driver 302, the NID obtained in step 1001, the start address, and the information on the data transfer length, and waits for completion of the destaging when the data is destaged. After the destaging is completed, the latest data is reflected in the drive.
(13) Host command processing order (general command processing) in a cache-less memory controller (embodiment 2)
Fig. 13 is a flowchart showing the processing sequence of the host command in the non-cache memory controller (embodiment 2) and is subsequent to the processing sequence of the normal command processing.
First, the host command processing unit 304 branches the process in the command type (step 1301). If the command type is a read command, the process proceeds to step 1302. If the command type is a write command, the process proceeds to step 1309. The following describes a flow in the case where the command type is a read command in step 1301.
When the process branches from step 1301 to step 1302, the host command processing unit 304 secures a buffer area for storing read data (step 1302).
Next, the address conversion unit 318 recognizes the identifier 403, the NID, the drive box 200 of the data storage destination corresponding to the start address, and the drive 508, and the host command processing unit 304 issues a read command to the drive box 200 using the initiator driver 303 (step 1303). The namespace ID and the start address of the read command issued to the drive are obtained by address translation by the address translation section 318, and the value of the field of step 1001 is specified for the data transfer length.
Next, wait for the read command from the drivecage 200 to complete (step 1304). Next, a read command completion response is received, and the completion response is analyzed (step 1305). When the response error is completed, processing at the time of occurrence of an abnormality is performed, and description thereof is omitted. The description is continued on the premise that the completion response has succeeded.
Next, the data transfer control unit (between the host and the memory controller) 305 transfers the data in the address range specified by the read command from the secured buffer area to the physical memory area 503 in the host 110 specified by the read command (step 1305). After the data transfer is completed, the host command processing section 304 generates a completion response to the command of the read command of the host 110 (step 1307). Next, a completion response to the command is sent to the host 110 via the target driver 302 (step 1308). Finally, the process is complete (step 1322).
Next, returning to the description of step 1309 and subsequent steps in the flowchart, the flow of processing in the case where the command type is a write command will be described. The main difference between the processing and fig. 11 is the timing of sending a completion response of the write command to the host. That is, in the case of a cache, a completion response of the write command is transmitted to the host after the cache double write of the write data, and the write to the drive is performed when the destage condition is satisfied, whereas in the case of no cache, a completion response of the write command is transmitted to the host after the write of the write data to the drive is completed.
When the process branches from step 1301 to step 1309, a buffer area for storing write data and RAID stripes is secured (step 1309). Next, the data transfer control unit (between the host and the memory controller) 305 transfers the data in the physical memory area 503 in the host 110 designated by the write command to the secured buffer area (step 1310). The data transfer is performed in RDMA Read (RDMA Read) according to the specification of nvmeaf in the case where NVMe transfer uses RDMA.
Next, a read command of a controller command to read a RAID stripe corresponding to the write destination of the write command from the drive is generated (step 1311). The write destination of the write command is converted by the address of the address conversion unit 318, and is the drive cartridge 200, the drive 508, and the storage area 509 in the drive 508 corresponding to the data range 505. Next, the host command processing unit 304 transmits a read command to the drive cartridge using the initiator driver 303 (step 1312). A RAID stripe sometimes spans multiple drives in multiple drive enclosures 200 that make up the RAID. In this case, a read command is issued to each drive of each drive cartridge as described above.
Next, the completion of the read command is awaited (step 1313). Next, receipt of a completion response to the read command and analysis of the completion response are performed (step 1314). When the response error is completed, the processing at the time of occurrence of the abnormality is performed, but the description thereof is omitted here. The description is continued assuming the completion response is successful later.
Next, parity is computed from the read RAID stripes (step 1315). Next, a write command is generated to write the updated data and parity of the RAID stripe to the drives (step 1316). Next, a write command of the controller command is sent to the drivecage 200 (step 1317). As previously described, where a RAID stripe spans multiple drives in multiple drive enclosures 200, a write command is issued to each drive of each drive enclosure as described above. Next, completion of the write command is awaited (step 1318). Then, a completion response of the write command is received, and the completion response is analyzed (step 1319). When the response error is completed, the processing at the time of occurrence of the abnormality is performed, but the description thereof is omitted here. The description is continued on the premise that the completion response has succeeded.
Next, a completion response for the command corresponding to the write command from the host 110 is generated (step 1320). Next, a completion response for the command is sent to the host 110 (step 1321). Finally, the process ends (step 1322).
In embodiment 2, since the read command from the host is not buffered in step S1004 of fig. 10, the data transfer method in fig. 8 is changed to the case 802 where only the buffer miss occurs, and it is determined that the threshold value is 0 byte, and the data transfer is always performed in the case of reading.
According to embodiment 2, since the data is always transferred directly in the case of reading, even in the case where a plurality of drive cases are connected to the storage controller, the bottleneck of the storage controller can be eliminated, and high-speed data transfer can be realized.
[ example 3 ]
(14) The program configurations of the host, the storage controller, and the drive cartridge in a manner that the drive casing operates as a target of NVMe over Fabrics with respect to the host instead of the storage controller (target configuration mode 2: embodiment 3).
Fig. 14 is a diagram showing the program configuration of the host, the storage controller, and the drive cartridge in a mode in which the drive case operates as the target of NVMe over Fabrics with respect to the host instead of the storage controller (target configuration mode 2: embodiment 3).
The program of the storage controller 121 includes a cartridge command communication unit (target) 1401, a cartridge command processing unit 1402, a data transfer control unit (between storage controllers and cartridges) 1403, a cache control unit 1404, a data transfer unloading unit 1405, an unloading command communication unit (initiator) 1406, a destaging processing unit 1407, a controller command transmission unit (initiator) 1408, an address conversion unit 1419, and an OS (not shown).
The cartridge command communication section (target) 1401 provides a storage area supporting NVMeoF to the cartridge command communication section (initiator) 1411.
The cartridge command processing unit 1402 receives a command issued by the drive cartridge 200 using the cartridge command communication unit (target) 1401, and performs command analysis, read/write processing, command completion response generation, command completion response transmission via the cartridge command communication unit (target) 1401, and the like.
The data transfer control section (between the memory controller and the cartridge) 1403 performs data transfer processing between the memory controller and the drive cartridge in accordance with the instruction of the cartridge command processing section 1402. The cache control unit 1404 performs: a determination of cache hit and miss based on retrieval of cache data; transitions between dirty data (state before writing to the physical drive) and clean data (state after writing to the physical drive); and control of reservation and release of the buffer area. Each process of the cache control is a well-known technique, and a detailed description thereof will be omitted.
The data transfer offload unit 1405 generates an offload command for data transfer, and instructs the drive deck 200 to perform data transfer to the host. The unload command is generated based on the stored configuration information such as the address translation table from the IO command received from the host, and is used for the IO command processing on the drive housing side, and therefore includes data transfer parameters such as the host identifier, the memory address, the identifier of the drive, the NS of the drive, the start address, and the data length as shown in fig. 9D.
The unload command communication unit (initiator) 1406 transmits an unload command to and receives a response from the drive cartridge. The destaging processing unit 1407 performs destaging processing for writing the data in the buffer into the drive. The controller command transmitting unit (initiator) 1408 transmits a storage command to the drive cartridge and receives a completion response. The address conversion unit 1419 has a mapping table between the data range 505 and the storage area 509 in the drive cartridge 200, the drive 508, and the drive 508 as the storage destination of the data, and converts the address of the data range 505 into the address of the storage area 509 in the corresponding drive cartridge 200, the drive 508, and the drive 508.
The program of the drivecage 200 includes a target driver 1409, a host command processing unit 1410, a cage command communication unit (initiator) 1411, a data transfer control unit (between a storage controller and a cage) 1413, a data transfer control unit (between a host and a cage) 1414, an unload command communication unit (target) 1415, an unload command processing unit 1416, a controller command communication unit (target) 1417, a drive control unit 1418, a buffer control unit 1412, a controller command processing unit 1420, and an OS not shown.
The target driver 1409 provides a storage area for nvmeofs support to the initiator driver 301 of the host 110. The host command processing unit 1410 receives a command issued by the host or the storage controller using the target driver 1409, and performs command analysis, read/write processing, command completion response generation, command completion response transmission via the target driver 1409, and the like. The cartridge command communication section 1411 recognizes the NVMeoF-enabled storage area provided by the cartridge command communication section (target) 1401. The data transfer control section 1413 (between the controller and the cartridge) performs data transfer processing between the storage controller and the drive cartridge. The data transfer control unit (between host and cartridge) 1414 performs data transfer processing between the NVMeoF-supporting host and the drive cartridge in accordance with the instructions of the host command processing unit 1410 and the unmount command processing unit 1416.
The offload command communication section (target) 1415 receives an offload command for data transfer from the storage controller 121. The unload command processing unit 1416 analyzes the unload command, performs read processing, generates a completion response to the unload command, and transmits the completion response to the unload command. The controller command communication unit (target) 1417 receives a storage command and transmits a completion response to the storage controller 121.
The drive control unit 1418 manages the drive 218, and performs read/write processing on the drive 218 in accordance with instructions from the host command processing unit 1410 and the unload command processing unit 1416. The buffer control unit 1412 secures and releases a buffer, which is a temporary memory area for data transmission. The controller command processing unit 1420 receives a command issued by the storage controller using the target driver 1409, and performs command analysis, read/write processing, command completion response generation, command completion response transmission via the target driver 1409, and the like.
(15) Identifiers of host and NVM subsystems in NVMe over Fabrics related to target structure mode 2
Fig. 15 is a diagram showing identifiers of a host and an NVM subsystem in NVMe over Fabrics related to the target configuration 2.
Host 110 has at least 1 identifier 401 (host NQN). The number of hosts 110 may be plural, and illustration thereof is omitted. The drive cartridge 200 has at least 1 identifier 1503(NVM subsystem NQN). In the NVM subsystem corresponding to the identifier 1503, a logical storage region allocated to a part of the storage pool is allocated as a namespace. The storage pool is a storage area constructed by storage areas of the plurality of drives 218 and protected by data such as RAID. The above-described embodiment is also applicable to the drivecage 201, and the description thereof is omitted. The number of the drivecage 200 and the drivecage 201 may be 2 or more, and illustration thereof is omitted. In this target configuration, the driver box receives the command from the host, so the NVM subsystem of the memory controller is no longer needed, unlike fig. 4.
The generation of the NVM subsystem of the drive cartridge is done in a master-slave manner. Storage device 120 is the master and drivecage 200 (and drivecage 201) is the slave. This is to manage and store information defining the NVM subsystem of the drive cartridge as configuration information of the storage device by the storage device 120 having a data protection function. This makes it possible to provide a data protection function of the storage controller and a function of a program product (a function of the storage device) such as Snapshot or Thin Provisioning that operates in the storage controller. The information defining the NVM subsystem refers to the NVM subsystem NQN (here, identifier 1503), the information of the NVM transport (information defining the connection between the host and the NVM subsystem; here, IP address of the drive box, TCP/UDP port, etc.), serial number or model number, etc.
The main flow until the storage device 120 recognizes the drive of the drive cartridge and provides a storage area to the host is as follows. First, the storage device 120 acquires the installation information of the drive 508 from the drive cartridge, and generates the drive information table 930 of fig. 9C. Next, the storage apparatus 120 combines the storage areas of the drives 508 in the drive information table 930, and constructs a storage area protected by RAID, mirroring, or the like according to the data protection method. The combination of the storage areas and the setting of the data protection method may be automatic or manual. The term "automatic" as used herein means automatic setting by the storage device 120, and "manual" means setting by the storage device 120 in accordance with a user instruction. The storage area combinations used for data protection are managed and stored in the data protection drive group table 3610. The data protection method in the storage system is a well-known technique, and the description thereof is omitted. Next, the storage device 120 constructs a storage pool by aggregating storage areas protected by RAID, mirroring, or the like. Next, the storage device 120 extracts a part of the storage area in the storage pool and constructs a logical volume. Next, storage 120 creates an NVM subsystem, allocating the logical volume as a namespace. In the storage device 120, the correspondence between the logical address of the logical volume and the physical address of the drive is managed as an address translation table 3600.
In the generation of the NVM subsystem, the storage device 120 specifies the above-mentioned information defining the NVM subsystem as a parameter, and instructs the cartridge 200 (and the cartridge 201) to generate the NVM subsystem so that the cartridge can provide a logical storage area of the storage device 120 to the host. Drivecage 200 (and drivecage 201) generates the NVM subsystem as instructed. The NVM subsystem is generated at startup, when a drive cartridge is added, or when the configuration is changed, for example.
Thus, the drive cartridge can provide its own storage area to the host, and the storage controller can protect the storage area of each drive cartridge by, for example, RAID technology or the like. That is, generation of the NVM subsystem is instructed for each driver cartridge based on the configuration information of the memory controller, whereby the driver cartridge in which the NVM subsystem is generated provides the generated NVM subsystem as a storage region to the host based on the instruction from the memory controller.
Host 110 is able to command and data transfer to the NVM subsystem of drive cartridge 200 (and drive cartridge 201) by sending a connect command to drive cartridge 200 (and drive cartridge 201).
(16) Processing order of host command and unload command in drive cartridge relating to target configuration mode 2
Fig. 16 is a flowchart showing the processing procedure of the host command and the unload command in the drive cartridge according to the target configuration mode 2.
When the target driver 1409 of the drivecage 200 receives a command from the host 110, the host command processing unit 1410 starts the processing from step 1600 onward.
First, the host command processing unit 1410 analyzes the received NVMe command (the format of the command refers to the format 910 of fig. 9A of the host command), and reads fields of the command type 912, the NID (namespace ID)913 that is an identifier of the namespace, the start address 914, and the data transfer length 915 (step 1601).
Next, processing branches with the type of command (step 1602). In the case where the command type is a read command, the process proceeds to step 1603. In the case where the command type is a write command, step 1623 is entered. In the case where the command type is a management command, the process proceeds to step 1617. The following describes a flow in the case where the command type is a read command in step 1601.
When the process branches to step 1603, the host command processing unit 1410 secures a buffer area for storing read data (step 1603). Next, a read command (read request) of the box command is generated (step 1604). The read command reads data in an address range specified by the read command from the host 110, and stores the data in a secured buffer area. Commands issued by the drive cartridge to the memory controller are referred to as cartridge commands. The format and generation method of the box command are explained with reference to fig. 22.
Next, the generated cartridge command is transmitted to the memory controller 121 using the cartridge command communication unit (initiator) 1411 (step 1605). Next, a command completion response from the memory controller 121 is awaited (step 1606). Next, a completion response of the read command is received from the memory controller 121 via the cartridge command communication unit (initiator) 1411, and the completion response of the read command is analyzed (step 1607). In step S1607, the storage controller transmits a completion response including a normal read response to the cartridge when the target driver 1409 of the drive cartridge 200 that received the host command stores the target data of the host command in the drive 218 connected to the same drive casing 140 based on the address translation table, and transmits a completion response including an unload instruction when the target driver is stored in a drive in another drive casing. When the response error is completed, the processing at the time of occurrence of the abnormality is performed, but the description thereof is omitted here. The description is continued assuming the completion response is successful later.
Next, processing branches with the category of completion response (1608). In the case where the completion response is a read response, step 1609 is entered. In the case where the completion response is a read response with an unload indication, step 1613 is entered. The following describes a flow when the response type of the command in step 1608 is a read response.
When the process branches from step 1608 to step 1609, the read data is transmitted to the host 110 (step 1609). Next, the data transfer control unit (between the host and the drive box) 1414 transfers the read data stored in the buffer to the physical memory area 503 in the host 110 designated by the read command (step 1609). Here, similarly to embodiment 1, the description will be given of the RDMA write to the memory area of the command issuing source specified by the command. However, in the present embodiment in which the drive enclosure operates as a target of NVMe over Fabrics with respect to the host instead of the storage controller, not only RDMA but also TC, fibre channel (fibrechannel), or the like can be used as NVMe transport. Therefore, the data transfer is not limited to RDMA writing, and the data transfer specified by NVMe transfer may be used.
Next, the host command processing unit 1410 generates a completion response to the command corresponding to the read command from the host 110 (step 1610). Next, the target driver 1409 is used to send a completion response for the command to the host 110 (step 1611). Next, the secured buffer area is freed (step 1612). The process is finally completed (step 1635).
Next, returning to the description of step 1608 and subsequent steps of the flowchart, the flow of processing in the case where the command response type is a read response with an unload instruction will be described. When the process branches from step 1608 to step 1613, the host command processing unit 1410 reads the data to be read from the drive to the reserved buffer area in accordance with the unload instruction (step 1613). The drive 508 storing the data of the read object is identified by the identifier 402 specified by the unload instruction. The namespace ID and starting address of the read command issued to the drive, the data transfer length, specify the value of the offload indication. The drive cartridge reads data from a drive as a storage device in accordance with an unload command. The method of the drivecage reading data from its own drive is a general method, and details are omitted.
Next, the data transfer control unit (between the host and the drive cartridge) 1414 transfers the read data stored in the buffer to the physical memory area 503 in the host 110 designated by the read command (step 1614). Data in response to a command need not necessarily be transferred from the device that is the target of the command's delegation linked by the queue, and thus data is transferred from the drive cartridge directly to the host, eliminating the bottleneck of the storage controller.
Next, a completion response to the unload command is generated (step 1615). Next, a completion response of the unload command is sent to the storage controller 121 using the unload command communication section (target) 1415 (step 1616). Thereafter, steps 1610, 1611, 1612, 1635 are as described above. Since the storage controller that is the source of the offload command issuance needs to be notified of the completion of the processing.
Next, returning to the description of steps 1623 and thereafter of the flowchart, the flow of processing in the case where the command type is a write command will be described. When the process branches from step 1602 to step 1623, the host command processing unit 1410 secures a buffer area for storing the write data (step 1623). Next, the data transfer control unit (between the host and the drive cartridge) 1414 transfers the data in the physical memory area 503 of the host 110 designated by the write command to the secured buffer area (step 1624). Next, the host command processing unit 1410 generates a write command of a box command for writing the write data of the buffer area into the address range specified by the write command of the host 110 (step 1625).
Next, a cartridge command is transmitted to the storage controller 121 using the cartridge command communication unit (initiator) 1411 (step 1626). Next, a XFER RDY from the storage controller 121 is awaited (step 1627). XFER RDY is a message meaning that write preparation has been completed. Next, XFER RDY is received from the storage controller 121 via the cartridge command communication unit (initiator) 1411 (step 1628).
Next, the data transfer control unit (between the memory controller and the drive cartridge) 1413 transfers the write data stored in the buffer area to the memory controller (1629). Next, the command of the memory controller 121 is waited for to complete (step 1630). Next, a command completion response of the write command is received from the memory controller 121 via the cartridge command communication unit (initiator) 1411, and the command completion response of the write command is analyzed (step 1631). Next, a completion response for the command corresponding to the write command from host 110 is generated (step 1632). Next, a completion response to the command is sent to host 110 using target driver 1409 (step 1633). Next, the secured buffer area is opened (step 1634). Finally, the process is completed (step 1635).
Next, returning to the description of step 1617 and thereafter in the flowchart, the flow of processing in the case where the command type is a management command will be described. When the process branches from step 1602 to step 1617, then, the contents of the management command of the host 110 are copied, and the management command of the box command is generated (step 1617). Next, a cartridge command is transmitted to the memory controller 121 using the cartridge command communication unit (initiator) 1411 (step 1618). Next, the command of the memory controller 121 is waited for to complete (step 1619). Next, a command completion response of the management command is received from the storage controller 121 via the cartridge command communication unit (initiator) 1411, and the command completion response of the management command is analyzed (step 1620). Next, a completion response for the command corresponding to the management command from the host 110 is generated (step 1621). Next, the target driver 1409 is used to send a completion response for the command to the host 110 (step 1622).
(17) Processing order of cartridge commands in the memory controller relating to target configuration mode 2
Fig. 17 is a flowchart showing a processing procedure of a cartridge command in the memory controller according to the target configuration mode 2. When the cartridge command communication section (target) 1401 of the storage controller 121 receives a cartridge command from the drive cartridge 200, the cartridge command processing section 1402 starts the processing after step 1700.
First, the box command processing unit 1402 analyzes the received box command, and reads the command type, the NID (namespace ID) which is the identifier of the namespace, the start address, and the field of the data transfer length (step 1701). Next, processing branches with the category of the command (step 1702). If the command type is a read command, the process proceeds to step 1703. In the case where the command type is a write command, step 1717 is entered. In the case where the command type is a management command, the process proceeds to step 1714. The following describes a flow in the case where the command type is a read command in step 1702.
When the process branches to step 1703, a cache hit and miss determination is performed based on the identifier 403 obtained from the box command communication unit (target) 1401, the NID obtained in step 1701, the start address, and the information on the data transfer length (step 1703). Next, the branch is processed with either a cache hit or miss (step 1705). Step 1706 is entered in the case of a cache hit, and step 1709 is entered in the case of a miss. Here, the determination of the cache hit or miss is to determine whether or not data in response to an IO command from the host exists in the cache memory 204 of the memory controller. For example, in the case where the IO command from the host is a read command, it is determined whether or not data responding to the read command is on the cache memory 204.
In the case of a cache hit, the data within the cache is transferred to the drivebox 200 using the data transfer processing part (between the memory controller and the drivebox) 1403 (step 1706). Next, a completion response to the read command of the box command is generated (step 1707). Next, a completion response of the command is transmitted to the drive cartridge 200 using the cartridge command communication unit 1401 (target) (step 1708). Finally, processing is complete (step 1723).
The flow of processing in the case of a miss will be described below, returning to the description of step 1709 and thereafter in the flowchart. When the process branches from step 1705 to step 1709, the data transfer offload unit 1405 generates an offload command necessary for data transfer by referring to the address translation table or the like (step 1709). The control data required for the unload command and its method of generation are described in the description of fig. 9A-D.
Next, a completion response to the read command of the cartridge command is generated (step 1710). Next, the drive box 200 of the data storage destination is identified based on the identifier 403 obtained from the box command communication unit (target) 1401, the NID obtained in step 1701, the start address, and the information of the data transfer length, and an unload command and a completion response of the read command are transmitted to the drive box 200 using the unload command communication unit (initiator) 1406 (step 1711).
Next, the completion of the unload command from the drivecage 200 is awaited (step 1712). Next, the unload command communication unit (initiator) 1406 receives a completion response of the unload command from the drive library 200, and analyzes the completion response of the unload command (step 1713). Finally, processing is complete (step 1723).
Next, returning to the description of step 1702 and subsequent steps in the flowchart, the flow of processing in the case where the command type is a write command will be described. When the process branches from step 1702 to step 1717, the box command processing unit 1402 secures a buffer area for storing the write data (step 1717). Next, XFER RDY is transmitted to the drivecage 200 via the cage command communicator (target) 1401 (step 1718). Next, the data transfer control part (between the memory controller and the cartridge) 1403 receives the transfer data from the drivebox (step 1720).
The write data is then transferred to another memory controller for a double write of the cache (step 1720). Next, a command complete response corresponding to the drive cartridge's write command is generated (step 1721). Next, a completion response of the command is transmitted to the drivebox 200 using the box command communication section (target) 1401 (step 1722). Finally, processing is complete (step 1723).
Next, returning to the description of step 1714 and thereafter in the flowcharts, the flow of processing in the case where the command type is a management command will be described. When the process branches to step 1714, the box command processing section 1402 performs the process of the management command in accordance with the content specified by the management command (step 1714). Next, a command completion response including the result of the processing of the management command is generated (step 1715). Next, a command completion response is transmitted to the drivecage 200 using the cage command communication unit (target) 1401 (step 1716). Finally, processing is complete (step 1723).
(18) Destage processing procedure in a storage controller according to target configuration 2
Fig. 18 is a flowchart showing a procedure of destaging processing in the storage controller according to the target configuration mode 2. Since there are many points in common with fig. 12, only the differences will be described for easy understanding. The difference is that in fig. 12, the part where the data transfer of write data is performed in the RDMA read of the drive cartridge 200 becomes the point where the transfer data is sent from the memory controller 121.
The modification part is from step 1801 to step 1803. That is, after step 1203, the destaging processing unit 1407 waits for XFER RDY from the drivecage 200 (step 1801). Next, XFER RDY is received from the drive bay via the unload command communicator (initiator) 1406 (step 1802). Next, the data transfer processing part (between the memory controller and the cartridge) 1403 sends transfer data to the drive cartridge (step 1803). Step 1204 is then the same as in fig. 12.
(21) Order of processing controller commands in a drive cartridge according to target configuration 2
Fig. 21 is a flowchart showing a processing procedure of a controller command in the drive cartridge according to the target configuration mode 2.
Since there are many points in common with fig. 20, only the differences will be described for easy understanding. The difference is the data transfer to and from the memory controller. The RDMA write-based data transfer of step 2005 is a data transfer performed by the data transfer control unit (between the memory controller and the drive cartridge) 1413 (step 2101). Further, the RDMA READ-based data transfer of step 2010 is changed to sending XFER RDY from the drivecage 200 to the storage controller 121 (step 2102) and receiving transfer data from the storage controller by the data transfer control (between the storage controller and the drivecage) 1413 (step 2103). The steps other than this are the same as those in fig. 20.
(22) Host information table for drive cartridge and format of cartridge command
FIG. 22 is a diagram showing a host information table of a drive cartridge. Fig. 23 is a diagram showing a format of a box command.
The host information table of FIG. 22 includes entries for queue number 2201, host NQN2202, and NVM subsystem NQN 2203. In embodiment 3, the drive cartridge 200 operates as a target of NVMe over Fabrics with respect to the host 110. Therefore, the drive cartridge 200 stores the information of the host in the host information table so that the information of the host can be referred to in the processing of the host command.
Queue number 2201 is the number of the IO queue between the host and the NVM subsystem. The drivecage 200 arranges the queue number when an IO queue is generated upon receiving a connection command from the host 110 in order to internally manage the IO queue. The queue number is a unique value inside the drivecage 200. Host NQN2202 and NVM subsystem NQN2203 are NQN of host 110 and NQN of the NVM subsystem of drivebox 200, respectively, connected by the IO queues described above. NVM subsystem NQN2203 corresponds to identifier 1503.
The fields of the box command shown in fig. 23 include command identifier 2211, host NQN2212, drive box number 2213, drive box memory address 2214, NVM subsystem NQN2215, command category 2216, NID (namespace ID)2217, start address 2218, data transfer length 2219, memory address 2220, and R _ key 2221.
The command identifier 2211 is an identifier for identifying each command. Host NQN2212 is NQN (corresponding to host NQN 2202) of the host 110 of the command issuing source. The cartridge number 2213 is a number for identifying the transmission source of the cartridge command, and is the number of the drive cartridge 200 itself. The drive enclosure number is a number programmed by the storage controller 121 for managing the drive enclosure 200. The timing of the number is, for example, when the storage device is started or when the drive cartridge is added.
Memory address 2214 is the address of the data buffer used by the drive cartridge 200 in data transfer with the memory controller 121. Data communication between the drive cartridge 200 and the storage controller 121 can use RDMA communication, and general FC (Fibre Channel) can be used in the storage device. In the case of using RDMA communication, R _ key is required in addition to the memory address 2214, and the description is omitted here since this is not limited to RDMA communication.
The NVM subsystem NQN2215 is the NVM subsystem's NQN (corresponding to identifier 1503) of the access object of the host command.
The command type 2216, NID2217, start address 2218, data transfer length 2219, memory address 2220, R _ key2221 are the command identifier 911, command type 912, NID913, start address 914, data transfer length 915, memory address 916, R _ key917 of the host command.
The values of the fields 2211 to 2221 of the above-described box command are set by the host command processing unit 1410 as follows.
The host command processing unit 1410 sets the command identifier 2211 to a value unique to the box command being executed.
The host command processing unit 1410 compares the IO queue of the target driver 1409 of the drive enclosure 200, which has received the command from the host 110, with the entry of the host information table 2200 of the drive enclosure, and sets the host NQN2202 and the NVM subsystem NQN2203 corresponding to the IO queue as the host NQN2212 and the NVM subsystem NQN2215 (corresponding to the identifier 1503).
The host command processing unit 1410 sets its own cartridge number to the cartridge number 2213, and sets the address of the data buffer used by the drive cartridge 200 for data transfer with the memory controller 121 to the cartridge memory address 2214.
The host command processing unit 1410 sets the values of the command identifier 911, the command type 912, the NID913, the home address 914, the data transfer length 915, the memory address 916, and the R _ key917 of the host command received from the host to the command types 2216, NID2217, the home address 2218, the data transfer length 2219, the memory address 2220, and the R _ key 2221.
In embodiment 3, unlike embodiments 1 and 2, a drive cartridge connected to a host via a network directly receives an IO command from the host. In the case where the IO command is a read command, the drive cartridge directly transmits read data to the host, and also performs a completion report. That is, the drive-box provides the generated NVM subsystem to the host as a storage area.
According to embodiment 3, it is possible to reduce the processing load of the storage controller by the offload function of the drive cartridge while maintaining the data protection technique of the storage controller, and to directly transfer read data to the host for a read command.
[ example 4 ]
(19) Structure of information processing System of embodiment 4
Fig. 19 is a diagram of the connection configuration of the information processing system according to the target configuration mode 2 of the mode (connection mode 2: embodiment 4) in which the storage controller is connected to the host over another network.
Since there are many points in common with fig. 1, only the differences will be described for easy understanding. The difference from fig. 1 is that the drive housing 140 is connected to 2 different networks, network 150 and network 1901. The network 150 is a network in which the host 110 and the drive enclosure 140 are connected, and the network 1901 is a network in which the storage device 120 and the drive enclosure 140 are connected. Here, the drive housing 140 is connected to the network 150 and the network 1901 via the network I/F205. Further, the storage device 120 is connected to a network 1901 via a network I/F126.
The control method of the storage device 120 and the drive case 140 in embodiment 4 is the same as that in embodiment 3, and therefore, the description is omitted.
The network 1901 may be a PCIe network. In this case, the drive enclosure 140 does not have the network I/F205, but instead is connected to the network 1901 via the PCIe port 206. Further, storage device 120 does not have network I/F126, but instead is connected to network 1901 via PCIe port 126. The control method of the storage device 120 and the drive enclosure 140 is the same as that of embodiment 3 except that it becomes a data transfer method (for example, DMA) over a PCIe network, and therefore, the description is omitted.
According to embodiment 4, similarly to embodiment 3, it is possible to provide a network configuration more suitable for a mode in which the drivecage 140 receives an IO command or the like from the host.
[ example 5 ]
The outline of example 5 will be explained. Embodiment 5 corresponds to an embodiment for speeding up the write IO of embodiment 3. The structure of the information processing system of embodiment 5 is shown in fig. 1. In embodiment 5, similarly to embodiment 3, the drive case operates as a target of NVMe over Fabrics with respect to the host instead of the storage controller (target configuration mode 2).
The write IO is speeded up by transferring write data from the host to the drive enclosure, without passing through the storage controller, and the drive enclosure writes the write data to the drive. The write destination of the write data is determined by the memory controller, and the drive case obtains the write destination of the write data by inquiring the memory controller (cooperation method 1 of write IO processing). The identifiers of the host and the NVM subsystem in NVMe over Fabrics are as shown in fig. 15, and are the same as those in embodiment 3, and therefore, the description is omitted.
(23) In the system (the same as the target configuration system 2: embodiment 3) in which the drive enclosure operates as a target of NVMe over Fabrics with respect to the host instead of the storage controller, the host, the storage controller, and the drive cartridge are configured in a system (the write IO processing cooperation system 1) in which the storage controller determines a write destination of write data in speeding up the write IO and inquires the storage controller about the write destination of the write data.
Fig. 24 is a diagram showing the program configuration of a host, a memory controller, and a drive cartridge in a system (write IO processing cooperation system 1) in which the memory controller determines a write destination of write data based on an address translation table in speeding up write IO and inquires the memory controller about the write destination of the write data in a system in which the drive case operates as a target of NVMe over Fabrics in place of the memory controller with respect to the host.
The program of the storage controller 121 includes a box command communication unit (target) 2301, a box command processing unit 2302, a data transfer offload unit 2303, an offload command communication unit (initiator) 2304, a duplication removal instruction unit 2305, a controller command transmission unit (initiator) 2306, a write destination address determination unit 2307, an address conversion unit 2308, a logical-physical address management unit 2309, a configuration information management unit 2310, a duplication information management unit 2311, and an OS (not shown).
The cartridge command communication section (target) 2301 provides the cartridge command communication section (initiator) 2314 with a storage area supporting NVMeoF.
The cartridge command processing section 2302 receives a command issued by the drive cartridge 200 using the cartridge command communication section (target) 2301, and performs command analysis, read/write processing, command completion response generation, command completion response transmission via the cartridge command communication section (target) 2301, and the like.
The data transfer offload unit 2303 generates an offload command for data transfer, and instructs the drive cartridge 200 to transfer data between the host and the drive cartridge.
The unload command communication unit (initiator) 2304 transmits an unload command to the drive cartridge and receives a response. The duplexed release instructing section 2305 instructs the drive bay 200 to release the duplexed regions with a storage command. The controller command transmitting unit (initiator) 2306 transmits a storage command to the drive cartridge and receives a completion response. The write-destination address determining unit 2307 determines the write-destination address of write data to be written to a drive in the drive cartridge. The address conversion unit 2308 has an address conversion table (mapping table) between the data range 505 and the storage area 509 in the drive cartridge 200, the drive 508, and the drive 508 as the storage destination of the data, and converts the address of the data range 505 into the address of the storage area 509 in the corresponding drive cartridge 200, the drive 508, and the drive 508.
The logical-physical address management unit 2309 controls transition of each of the access exclusive state and the exclusive release state of the storage area 509 corresponding to the data range 505, and the double write state and the double write release state of the storage area 509.
The configuration management unit 2310 has a function of initializing, updating, and storing configuration information of the storage system. The configuration information includes a hardware configuration, configuration setting, node information of the drive case, hardware configuration, and configuration setting of the storage controller. The duplication information management unit 2311 has a function of initializing, updating, and storing the arrangement of the parity generation area 2801, the duplication area 2802, and the primary area 2803 and the secondary area 2804 in the duplication area 2802. Each region will be described with reference to fig. 28 and 29.
The program of the drive cartridge 200 includes a target driver 2312, a host command processing section 2313, a cartridge command communication section (initiator) 2314, a data transfer control section (between host and cartridge) 2316, an unload command communication section (target) 2317, an unload command processing section 2318, a controller command communication section (target) 2319, a drive control section 2320, a buffer control section 2315, a controller command processing section 2321, a drive duplication writing section 2322, a duplication cancellation processing section 2323, and an unillustrated OS.
The target driver 2312 provides the initiator driver 301 of the host 110 with a storage area supporting NVMeoF. The host command processing unit 2313 receives a command issued by the host using the target driver 2312, and performs command analysis, read/write processing, command completion response generation, command completion response transmission via the target driver 2312, and the like. The cartridge command communication section (initiator) 2314 issues a cartridge command to the cartridge command communication section (target) of the storage controller 121. The data transfer control unit (between the host and the cartridge) 2316 performs data transfer processing between the host supporting NVMeoF and the drive cartridge in accordance with the instructions from the host command processing unit 2313 and the unmount command processing unit 2318.
The offload command communication section (target) 2317 receives an offload command for data transfer from the storage controller 121. The unload command processing unit 2318 analyzes the unload command, performs read processing, write processing, generates an unload command completion response, and transmits an unload command completion response. The controller command communication section (target) 2319 performs reception of a storage command and transmission of a completion response with the storage controller 121. The controller command processing unit 2321 receives the command issued by the storage controller using the controller command processing unit (target) 2319, and performs command analysis, execution of the duplication removal processing, generation of a command completion response, transmission of a command completion response via the controller command processing unit (target) 2319, and the like.
The drive control unit 2320 manages the drive 218, and performs read/write processing on the drive 218 in accordance with instructions from the host command processing unit 2313, the unload command processing unit 2318, the drive duplication write unit 2322, and the duplication removal processing unit 2323. The buffer control unit 2315 secures and releases a buffer, which is a temporary memory area for data transfer.
The drive dual writing unit 2322 performs a process of writing write data to 2 drives. By writing 2 drives, loss of user data due to drive failure is prevented. The duplication cancellation processing unit 2323 performs a process of switching from data protection by duplicate writing to data protection by RAID.
(25) Target configuration system 2 and write IO processing cooperation system 1
Fig. 25 is a flowchart showing the processing procedure of the host command in the drive cartridge according to the target configuration system 2 and the write IO process cooperation system 1. Since a part of the processing is the same as that of fig. 16, the step number of fig. 16 is described for the step of the same processing.
When the target driver 2312 of the drivecage 200 receives a command from the host 110, the host command processing section 2313 starts the processing after step 2500.
First, the host command processing unit 2313 analyzes the received NVMe command (the format of the command refers to the format 910 of the host command in fig. 9A), and reads fields of a command type 912, an NID (namespace ID)913 as an identifier of a namespace, a start address 914, and a data transfer length 915 (step 2501).
Next, processing branches with the type of command (step 2502). In the case where the command type is a read command, the process proceeds to step 1603. In the case where the command type is a write command, the process proceeds to step 2503. In the case where the command type is a management command, the process proceeds to step 1617. The following describes a flow when the command type is a read command in step 2502.
When branching to step 1603, the same processing as in the case where the command category is a read command and the response category is unload in fig. 16 is performed. Since the processing is the same, the following description will be omitted.
Next, returning to the description of step 2502 and subsequent steps in the flowchart, the flow of processing in the case where the command type is a write command will be described. When the process branches from step 2502 to step 2503, the host command processing portion 2313 secures a buffer area for storing write data using the buffer control portion 2315 (step 2503). Next, the host command processing portion 2313 notifies the memory controller 121 of the received write command, and generates a cartridge command inquiring about a write destination address corresponding to the address range specified by the write command (step 2504).
Next, the host command processing section 2313 transmits a cartridge command to the storage controller 121 via the cartridge command communication section (initiator) 2314 (step 2505).
Next, the host command processing section 2313 waits for a response of the write destination address from the memory controller 121 (step 2506). Here, the write destination address is an address obtained by the memory controller 121 with reference to the address translation table. Next, the host command processing section 2313 receives a notification of the write destination address from the memory controller 121 via the cartridge command communication section (initiator) 2314, analyzes the notification, and acquires the write destination address (step 2607).
Next, the data transfer control unit (between the host and the drive cartridge) 2316 transfers the data of the physical memory area 503 in the host 110 designated by the write command to the secured buffer area (step 2508).
Next, the drive double write unit 2322 double writes the write data of the buffer area to the write destination address received in step 2607 (step 2509). The double write means to write 2 drives, and is explained in detail with reference to fig. 28 and 29.
Next, the drive double write unit 2322 waits for the completion of the double write, that is, the completion of the write from the drive corresponding to the double write destination (step 2510). Next, the drive double write unit 2322 receives a double write completion response (step 2511).
Next, the host command processing section 2313 notifies the storage controller 121 of the write completion via the cartridge command communication section (initiator) 2314 (step 2512). Next, the host command processing section 2313 waits for a completion response of the controller command (corresponding to the write command) from the memory controller 121 (step 2513).
Next, the host command processing section 2313 receives a command completion response of the write command from the storage controller 121 via the cartridge command communication section (initiator) 2314, and analyzes the command completion response of the write command (step 2514).
Next, a completion response for the command corresponding to the write command from the host 110 is generated (step 2515). Next, the target driver 2312 is used to send a completion response for the command to the host 110 (step 2516). Next, the secured buffer area is opened (step 2517). The process is finally complete (step 2518).
Next, returning to the description of step 2502 and subsequent steps in the flowchart, the flow of processing in the case where the command type is a management command will be described. When the process branches from step 2502 to step 1617, the same is true as in the case where the command type is a management command in fig. 16. Since the processing is the same, the following description will be omitted.
(27) Target configuration system 2, and processing order of box commands in storage controller according to write IO processing cooperation system 1
Fig. 27 is a flowchart showing a processing procedure of a box command in the memory controller according to the target configuration system 2 and the write IO process cooperation system 1. Since some of the processes are the same as those in fig. 17, step numbers in fig. 17 are described for steps of the same processes.
When the cartridge command communication section (target) 2301 of the storage controller 121 receives a cartridge command from the drive cartridge 200, the cartridge command processing section 2302 starts the processing after step 2700.
First, the cartridge command processing section 2302 analyzes the received cartridge command, and reads in the command type, the NID (namespace ID) which is the identifier of the namespace, the start address, and the field of the data transfer length (step 2701). Processing then branches with processing of the command (step 2702). If the command type is a read command, the process proceeds to step 1709. If the command type is a write command, the process proceeds to step 2703. In the case where the command type is a management command, the process proceeds to step 1714.
When branching to step 1709, the same processing as in the case of unloading in fig. 17 is performed. Since the processing is the same, the following description will be omitted.
The flow of the case where the command type is the write command in step 2702 will be described below. When branching to step 2703, the cartridge command processing section 2302 performs exclusive access to the write range of the logical volume based on the identifier 403 obtained from the cartridge command communication section (target) 2301, the NID obtained in step 2301, the start address, and the information of the data transfer length (step 2703). The reason why the access exclusion is performed is to secure data consistency even when a plurality of write commands for accessing the same logical address are received.
Next, the address of the write destination of the write data, that is, the driver and the physical address of the double write destination are determined (step 2704). In this processing flow, since the write destination of the write data is determined by the drive case without using the memory controller, the address translation table managed by the memory controller needs to be updated after the write.
Next, the unload command communication unit (initiator) 2304 transmits the write destination address to the drive box 200 (step 2705). Next, the write from the drivecage 200 is awaited for completion (step 2706).
Next, the unload command communication unit (initiator) 2304 receives the write completion from the drivecage 200 (step 2707). Next, the address conversion unit 2308 updates the correspondence relationship of the address conversion table (step 2708). That is, the identifier and the physical address of the drive of the double write destination are mapped to the logical address of the write range of the logical volume specified by the write command.
Next, the write-destination address determining unit 2307 updates the additional write pointer (step 2709). The additional write pointer is a pointer indicating where the additional write processing is performed. The pointer is, for example, a physical address of the drive, an index corresponding to the physical address of the drive. Next, a completion response for the box command is generated (step 2710).
Then, a completion response of the command is transmitted to the drivecage 200 via the cage command communication section (target) 2301 (step 2711). This notifies the drive case of completion of updating the address translation table. Subsequently, the access exclusion is released (2712). In the drive case, the completion notification of the write command is performed to the host, and the process is completed (step 2713).
Next, returning to the description at step 2702 and thereafter of the flowchart, the flow of processing in the case where the command type is a management command will be described. When the process branches from step 2702 to step 1714, the same is true as in the case where the command type is a management command in fig. 17. Since the processing is the same, the following description will be omitted.
(26) Target configuration system 2, and write IO process cooperation system 1
Fig. 26 is a flowchart showing the processing procedure of the controller command in the drivecage according to the target configuration system 2 and the cooperation system 1 of the write IO process.
When the duplex removal instruction section 2305 of the memory controller 121 determines that the destaging condition is satisfied (for example, the writing completion amount of the duplex region is equal to or greater than the threshold), the duplex removal instruction section 2305 starts the processing after step 2600.
First, the duplication cancellation instruction section 2305 determines a duplication cancellation object (step 2601). As a method of determining the duplication cancellation target, for example, until the write completion amount of the duplication area becomes equal to or less than a threshold value, a RAID stripe to be subjected to the duplication cancellation is preferentially selected from data with an earlier write time. Then, until all the duplexed objects are released, the subsequent processes are repeated (step 2602).
Next, a RAID stripe of 1 primary area 2803 is selected from the duplexed solution objects (step 2603). Next, the controller command communication unit (initiator) 2306 instructs the drive cartridge 200 to cancel the duplication of the selected RAID stripe, that is, to generate and write parity (2604).
Next, a response from the drivecage 200 is awaited (step 2605). Next, the controller command processing unit 2321 of the drivecage 200 receives the duplication removal instruction from the storage controller 121 via the controller command communication unit (target) 2319, and analyzes the instruction (step 2606). Next, the duplication removal processing unit 2323 reads the data of the RAID stripe specified by the duplication removal instruction (2607).
Next, parity of the read data is generated (step 2608). Next, the generated parity is written (step 2609). Next, the controller command processing unit 2321 generates a completion response of the duplicate removal instruction, and transmits the completion response to the storage controller 121 via the controller command communication unit (target) 2319 (step 2610).
Next, the duplex release instruction section 2305 receives a completion response from the drive cartridge 200 via the controller command communication section (initiator) 2306 (step 2611). Next, the duplication information management unit 2311 releases the secondary area of the selected RAID stripe and updates the duplication information, and the logical-physical address management unit 2309 updates the state of the corresponding storage area 509 to the double write released state (step 2612). Next, if the repetition of step 2602 continues, the process proceeds to step 2603. When the repetition of step 2602 ends, the process ends finally (step 2613).
(28) Dualized area and parity-generated area in a drive cartridge
Fig. 28 is a diagram showing a duplexed area and a parity generated area in the drive cartridge.
The drive 2805 is a drive 218 belonging to a RAID group within a drive enclosure. In the figure, a RAID group having a RAID level of RAID5, which is configured by 4 drives, is shown as an example. The number of drives is not limited to 4, and the RAID level is not limited to RAID 5. For example, the drives of a RAID group can be configured of N +1 (N data drives and 1 parity drive) of RAID5, N + 2(N data drives and 2 parity drives) of RAID6, and the like.
The parity generated area 2801 is made up of a parity cycle in which parity has been generated. In the figure, "m", "n", "o", and "P" of the stripe 2806 are parity cycles. Stripe 2806 is a RAID stripe. The stripe "P" is a parity obtained by making data of the stripes "m", "n", and "o" redundant.
The duplexed area 2802 is a destination to which data is written, and is composed of a primary area 2803 and a secondary area 2804.
The primary area 2803 in the duplexed areas includes a stripe 2806 ("a" to "f") in which write data is stored, a stripe 2806 (described without characters) in which write data is not written and in which no parity is generated, and a stripe 2806 (described with gray and without characters) in which no parity is generated.
The secondary area 2804 within the duplexed areas is a copy of the primary area 2803, and has the same structure. In the figure, the stripes 2806 ("a" - "f") and 2806 ("a '" - "f'") correspond to the relationship of copies. The stripes corresponding to the relationship of copying are arranged to become areas of different drives as a measure against the disappearance of user data due to a drive failure. For example, "a" of the strip 2806 is disposed in the driver 0, and "a'" of the strip 2806 is disposed in the drivers 1, which are shifted by 1. Thus, even if 1 driver fails, at least one of the duplicated user data remains, and hence the user data can be prevented from being lost.
The primary area 2803 differs from the secondary area 2804 in that the parity cycle of the primary area 2803 becomes a parity generated area 2801 after parity generation, and the parity cycle of the secondary area 2904 is reused as a storage destination of write data as a duplicated area 2802 after parity generation.
The parity generated area 2801 and the duplexed area 2802 are logical management areas. Therefore, even if the area to which the stripe 2806 belongs is changed, only the management information (metadata) of the stripe 2806 is changed, and data movement to the driver that applies an IO load is not performed.
In addition, a free space exists in the storage area of the drive belonging to the RAID group, and the illustration is omitted.
(29) The correspondence of dualized regions in the drivecage.
Fig. 29 is a diagram showing the correspondence relationship between the duplexed regions in the drivecage.
Writing of write data into the duplexed areas is performed by write of an additional recording type (also referred to as log structured write). The write of the add-on type is a system of sequentially writing received user data, and has excellent write performance. The writing destinations are stripes 2806 of a primary area 2803 and a secondary area 2804 which are unwritten and do not store writing data. In the figure, the stripes of the primary area 2803 are denoted by stripes 2806 "g", and the stripes of the secondary area are denoted by stripes 2806 "g'". In the figure, descriptions other than the strip 2806 "g" and the strip 2806 "g'" are omitted. Stripe 2806 "g" and stripe 2806 "g'" are configured on different drives for copy relationship, as previously described. The drive dual writer 2322 writes the user data 2901 in order in accordance with the write request from the host ("g 1", "g 2", and "g 3" in the figure). When the stripes 2806 "g" and "g'" are full of user data and the user data cannot be written any more, the stripe 2806 is moved to the next stripe 2806 to continue writing.
According to embodiment 5, it is possible to reduce the processing load of the storage controller by the unloading function of the drive cartridge while maintaining the data protection technique of the storage controller, and to perform the double write processing in the drive enclosure for the write command, thereby reducing the processing load of the storage controller.
[ example 6 ]
Embodiment 6 is an embodiment of a manner (connection manner 2) in which the storage controller in embodiment 5 is connected to the host computer in another network. The connection mode 2 is as described in example 4.
The control method of the storage device 120 and the driver case 140 in embodiment 6 is the same as that in embodiment 5, and therefore, the description is omitted.
[ example 7 ]
Embodiments 7 and 5 are the same in that write data is directly transferred from the host to the drive controller in place of the storage controller operating as a target of NVMe overFabrics by the drive casing with respect to the host. On the other hand, embodiment 7 is different from embodiment 5 in that where the write destination of the write data is determined. Specifically, while in embodiment 5 the storage controller 121 determines the write destination of the write data and updates the mapping of the logical-physical addresses, in embodiment 7 the drive box 200 determines the write destination of the write data and the storage controller 121 updates the mapping of the logical-physical addresses based on the mapping of the logical-physical addresses notified by the drive box 200, these two are different. Embodiment 7 has an advantage that it is not necessary to inquire of the memory controller 121 about the write destination of the write data as compared with embodiment 5, and the response time of the write command processing can be shortened. On the other hand, in embodiment 7, in order to realize a highly reliable storage process using a drive cartridge 200 having low reliability in which control information is lost due to a failure such as a power failure, a mechanism for notifying the storage controller of an additional write pointer is required. The identifiers of the host and the NVM subsystem in NVMe over Fabrics are shown in fig. 15, and since they are the same as in embodiment 5, the description thereof is omitted.
(30) In the system (the same as the target configuration system 2: embodiment 3) in which the drive enclosure operates as a target of NVMe over Fabrics with respect to the host instead of the storage controller, the host, the storage controller, and the drive cartridge are configured in a system (the cooperative system 2 of write IO processing) in which the drive cartridge determines the write destination of write data in the case of high-speed write IO.
Fig. 30 is a diagram showing the program configuration of the host, the storage controller, and the drive cartridge in a system (write IO cooperation system 2) in which the drive cartridge determines the write destination of write data at the time of write IO speeding up in a system in which the drive case operates as the target of NVMe over Fabrics instead of the storage controller.
The program of the memory controller 121 has the same components as those of fig. 24, and therefore, the differences will be mainly described. 3001 to 3006 and 2301 to 2306, 3008 to 3011 and 2308 to 2311 are the same structural elements respectively. The difference between fig. 30 and fig. 24 in the program of the memory controller 121 is that the write destination address determination unit in fig. 24 is eliminated and the additional write pointer management unit 3007 is added. The additional write pointer is a pointer indicating where the additional write process has proceeded, and is control information necessary for guaranteeing integrity of user data, restoring data, and resuming the storage process when a failure occurs in the drive cartridge. The additional write pointer management unit 3007 has a function of storing a copy of the additional write pointer in a highly reliable storage controller instead of the drive cartridge 200 having low reliability.
The procedure of the drivecage 200 has the same components as those of fig. 24, and therefore, the differences will be mainly described. 3012 to 3023 and 2312 to 2323 are the same components. The difference between the program of the drive cartridge 200 in fig. 30 and 24 is that an additional write pointer update unit 3024, a logical-physical correspondence parameter generation unit 3025, and a copy 3026 of duplicated information are added. The additional write pointer update unit 3024 has a function of updating the additional write pointer. The logical-physical correspondence parameter generation unit has a function of generating a logical-physical correspondence parameter (information corresponding to the address translation table) for notifying the storage controller 121 of the correspondence relationship between the logical address of the write range of the logical volume specified by the write command and the identifier and physical address of the drive to which the double write is addressed. The copy 3026 of the duplicated information is a copy of the duplicated information managed by the duplicated information management unit 3011 of the storage controller 3011. By having a copy of the duplicated information in the drive cartridge 200, there is a significance in that the frequency of inquiring the memory controller 121 for the duplicated information in the write command processing is reduced, and the processing efficiency is improved.
(31) Target configuration system 2, and processing order of host commands in drive cartridge relating to write IO processing cooperation system 2
Fig. 31 is a flowchart showing the processing procedure of the host command in the drive cartridge according to the target configuration system 2 and the cooperation system 2 of the write IO process. Since some of the processes are the same as those in fig. 16, step numbers in fig. 16 are written for steps of the same processes.
When the target driver 3012 of the drivebox 200 receives a command from the host 110, the host command processing section 3013 starts the processing after step 3100.
First, the host command processing unit 3013 analyzes the received NVMe command (the format of the command refers to the format 910 of the host command in fig. 9A), and reads the fields of the command type 912, the NID (namespace ID)913 that is an identifier of the namespace, the start address 914, and the data transfer length 915 (step 3101).
Next, processing branches with the type of command (step 3102). In the case where the command type is a read command, the process proceeds to step 1603. If the command type is a write command, the process proceeds to step 3103. In the case where the command type is a management command, the process proceeds to step 1617. The following describes a flow in the case where the command type is a read command in step 3102.
When branching to step 1603, the same processing as in the case where the command category is a read command and the response category is unload in fig. 16 is performed. Since the processing is the same, the following description will be omitted.
Next, returning to the description of step 3102 and thereafter in the flowchart, the flow of processing in the case where the command type is a write command will be described. When the process branches from step 3102 to step 3103, the host command processing section 2313 secures a buffer area for storing write data using the buffer control section 3015 (step 3103).
Next, the data transfer control unit (between the host and the drive cartridge) 3016 transfers the data in the physical memory region 503 in the host 110 designated by the write command to the secured buffer region (step 3104).
Next, the host command processing unit 3013 acquires an additional recording destination address (step 3105). The additional recording destination address is an address indicated by the additional write pointer. Next, the driver double write unit 3022 double writes the write data in the buffer area to the write destination address determined in step 3105 (step 3106).
Next, the drive double write unit 3022 waits for the completion of the double write, that is, the completion of the write from the drive corresponding to the double write destination (step 3107). Next, the driver double write unit 2322 receives a completion response of the double write (step 3108).
Next, the additional write pointer updating unit 3024 updates the additional write pointer to the start address of the next write destination (step 3109). The additional write pointer is determined according to the double write method shown in fig. 28 and 29. Next, the logical-physical correspondence parameter generation unit 3025 generates logical-physical correspondence parameters, and the host command processing unit 3013 generates a box command including information of the received write command and the logical-physical correspondence parameters (step 3110).
Next, the host command processing section 3013 transmits a cartridge command to the storage controller 121 via the cartridge command communication section (initiator) 3014 (step 3111). Next, the host command processing unit 3013 waits for a completion response of the controller command (corresponding to the write command) from the memory controller 121 (step 3112). Next, the host command processing unit 3013 receives a completion response from the storage controller 121 via the cartridge command communication unit (initiator) 3014, and analyzes the completion response (step 3113).
Next, a completion response to the command corresponding to the write command from the host 110 is generated (step 3114). Next, the target driver 3012 is used to send a completion response for the command to the host 110 (step 3115). Next, the secured buffer area is opened (step 3116). The process is finally completed (step 3117).
Next, returning to the description of step 2502 and subsequent steps in the flowchart, the flow of processing in the case where the command type is a management command will be described. When the process branches from step 3102 to step 1617, the same is true for the case where the command type is a management command in fig. 16. Since the processing is the same, the following description will be omitted.
(32) Target configuration system 2, and processing order of box commands in storage controller relating to write IO processing cooperation system 2
Fig. 32 is a flowchart showing a processing procedure of a box command in the memory controller according to the target configuration system 2 and the cooperation system 2 of the write IO process. Since some of the processes are the same as those in fig. 17, step numbers in fig. 17 are assigned to steps of the same processes.
When the cartridge command communication section (target) 3001 of the storage controller 121 receives a cartridge command from the drivebox 200, the cartridge command processing section 3002 starts the processing after step 3200.
First, the box command processing section 3002 analyzes the received box command, and reads in the command type, NID (namespace ID) which is an identifier of the namespace, the start address, and the field of the data transfer length (step 3201).
Processing then branches with processing of the command (step 3202). If the command type is a read command, the process proceeds to step 1709. In the case where the command category is a write command, step 3203 is entered. In the case where the command type is a management command, the process proceeds to step 1714.
When branching to step 1709, the same processing as in the case of unloading in fig. 17 is performed. Since the processing is the same, the following description will be omitted.
The flow in the case where the command type is the write command in step 3202 will be described below. When branching to step 3203, the cartridge command processing unit 3002 performs exclusive access to the write range of the logical volume based on the identifier 403 obtained from the cartridge command communication unit (target) 3001, the NID obtained in step 2301, the start address, and the information of the data transfer length (step 3203). The reason for performing the access exclusion is to secure the consistency of data even in the case where a plurality of read and write commands to access the same logical address are received.
Next, the logical-physical correspondence parameters specified by the command, that is, the parameters of the driver and the physical address of the double write destination, are analyzed (step 3204). Next, according to the result of analyzing the obtained parameters, the address conversion unit 3008 updates the correspondence relationship (mapping of logical physical addresses) of the address conversion table (step 3205). Next, the additional write pointer management unit 3007 updates the additional write pointer corresponding to the drive cartridge of the command source in accordance with the content of the additional write pointer specified by the command (step 3206). Next, the cartridge command processing section 300 generates a completion response of the cartridge command (step 3207).
Next, the cartridge command processing section 300 transmits a completion response to the drivecage 200 via the cartridge command communication section (target) 3001 (step 3208). Next, the cartridge command processing section 300 releases the access exclusion (step 3209). The process is finally completed (step 3210).
Next, returning to the description of step 3202 and thereafter in the flowchart, the flow of processing in the case where the command type is a management command will be described. When the process branches from step 3202 to step 1714, the same is true as in the case where the command category is a management command in fig. 17. Since the processing is the same, the following description will be omitted.
[ example 8 ]
Embodiment 8 is an embodiment of a manner in which the storage controller in embodiment 7 is connected to the host computer in another network (connection manner 2). The connection mode 2 is as described in example 4.
The control method of the storage device 120 and the driver case 140 in embodiment 8 is the same as that in embodiment 7, and therefore, the description is omitted.
[ example 9 ]
The same point as that in embodiment 9 and embodiment 5 is that the memory controller determines the write destination of the write data. On the other hand, embodiment 9 is different from embodiment 5 in that the storage controller operates as a target of NVMe over Fabrics with respect to the host. The identifiers of the host and the NVM subsystem in NVMe over Fabrics are as shown in fig. 4, and are the same as in embodiments 1 and 2, and therefore, the description thereof is omitted.
(33) In a system in which a storage controller operates as a target of NVMe over Fabrics with respect to a host (similar to the target configuration system 1: embodiments 1 and 2), a storage controller determines a write destination of write data at a high speed of write IO, and inquires the storage controller about the write destination of the write data (cooperation system 1 of write IO processing).
Fig. 33 is a diagram showing the program configuration of the host, the storage controller, and the drive cartridge in a system (cooperative system 1 of write IO processing) in which the storage controller determines the write destination of write data at the high speed of write IO and inquires of the storage controller about the write destination of the write data in a system in which the drive case operates as the target of NVMe over Fabrics instead of the storage controller.
The program of the memory controller 121 has the same components as those of fig. 24, and therefore, the differences will be mainly described. 3303-3011 and 2303-2311 are the same elements. Among the programs of the storage controller 121, fig. 33 differs from fig. 24 in that the cartridge command communication section (object) 2301 and the cartridge command processing section 2302 of fig. 24 are canceled, and an object driver 3301 and a host command processing section 3302 are added.
The procedure of the drivecage 200 has the same components as those of fig. 24, and therefore, the differences will be mainly described. 3312 to 3323 and 2312 to 2323 are the same structural elements. Fig. 33 differs from fig. 24 in the procedure of the drive cartridge 200 in that the cartridge command communication section (initiator) 2314 is canceled.
(34) Target configuration method 1, and processing order of host commands in storage controller according to write IO processing cooperation method 1
Fig. 34 is a flowchart showing a processing procedure of a host command in the storage controller according to the target configuration system 1 and the write IO process cooperation system 1. Is a flowchart showing the processing sequence of host commands in the memory controller of embodiment 1. Since some of the processes are the same as those in fig. 6, step numbers in fig. 6 are assigned to steps of the same processes.
When the target driver 3301 of the storage controller 121 receives the command from the host 110, the host command processing unit 3302 starts the processing at step 3400 and thereafter.
First, the host command processing unit 3302 obtains an identifier 923 (403 in fig. 4) as the NVM subsystem NQN (see fig. 9B) using the information in the host information table 920 of the storage controller, analyzes the received NVMe command (the received NVMe command refers to fig. 9A), and reads the command type 912, the NID (namespace ID)913 as the identifier of the namespace, the start address 914, and the field 915 of the data transfer length (step 3401).
Next, processing branches with the category of the command (step 3402). In the case where the command type 912 is a read command, the process proceeds to step 606. In the case where the command category is a management command, step 614 is entered. In the case where the command type is a write command, step 3403 is entered.
When branching to step 606, the same processing as in the case where the command category is an IO command and the data transfer method is uninstalled in fig. 6 is performed. Since the processing is the same, the following description will be omitted.
Next, returning to the description at step 3402 and thereafter in the flowchart, the flow of processing in the case where the command type is a management command will be described. When the process branches from step 3402 to step 614, the same is true as in the case where the command category is a management command in fig. 6. Since the processing is the same, the following description will be omitted.
Next, returning to the description at step 3402 and thereafter in the flowcharts, the flow of processing in the case where the command type is a write command will be described. When the process branches from step 3402 to step 3403, the host command processing unit 3302 performs exclusive access to the write range of the logical volume based on the identifier 403 obtained from the target driver 3301, the NID obtained in step 2301, the start address, and the information of the data transfer length (step 3403).
Next, the write-destination address determination unit 3307 determines the write-destination address of the write data by referring to the address translation table (step 3404). Next, the host command processing unit 3302 generates an unload command including the determined write destination address in order to cause the drive cartridge to perform the write command processing (3405).
Next, the host command processing unit 3302 transmits an unload command to the drivecage 200 via the unload command communication unit (initiator) 3306 (3406). Next, the completion of the unload command is awaited (step 3407). Next, the host command processing unit 3302 receives a completion response of the unload command from the drivecage 200 via the unload command communication unit (initiator) 3306, and analyzes it (3408).
Next, the address conversion unit 3308 updates the mapping of logical-physical addresses (updates the correspondence relationship of the address conversion table) (step 3409). That is, the identifier and the physical address of the drive of the double write destination are mapped to the logical address of the write range of the logical volume specified by the write command. Next, the write destination address determining unit 3307 updates the additional write pointer (step 3410). Next, the host command processing unit 3302 releases the access exclusion (step 3411).
Next, a completion response for the command corresponding to the write command from the host 110 is generated (step 3412). Next, a completion response for the command is sent to the host 110 using the target driver 3301 (step 3413). Finally, processing is complete (step 3414).
(35) Target configuration method 1, and procedure for processing an unload command for data transfer in a drive cartridge according to write IO processing cooperation method 1
Fig. 35 is a flowchart showing a processing procedure of an unload command for data transfer in a drive cartridge according to the target configuration system 1 and the write IO process cooperation system 1. Since some of the processes are the same as those in fig. 7, step numbers in fig. 7 are assigned to steps of the same processes.
The unload command processing section 3318 of the drivecage 200 starts the processing after step 3500 when receiving an unload command from the storage controller 121 via the unload command communication section (target) 313.
First, the unload command processor 3318 reads each field of the unload command (step 3501). Each field is described with reference to fig. 9D.
Next, processing is branched by the category of the command (step 3502). If the data transfer direction 909 is from the storage system to the host, the command type is determined to be the offload of the read command, and if the data transfer direction 909 is from the host to the storage system, the command type is determined to be the offload of the write command. In the case where the command category is offload of read commands, step 708 is entered. In the case where the command class is offload of write commands, step 3503 is entered.
When branching to step 708, the processing is the same as the processing after step 8 in fig. 7. Since the processing is the same, the following description will be omitted.
Next, returning to the description at step 3502 and thereafter of the flowchart, the flow of processing in the case where the command type is the unload of the write command will be described. When the process branches from step 3502 to step 3503, the unload command process part 3318 secures a buffer using the buffer control part 3315 (step 3504). Next, the data transfer controller (between the host and the drive cartridge) 3316 transfers the data of the physical memory region 503 in the host 110 designated by the write command to the secured buffer region (step 3505). Next, the drive double write unit 3322 double writes the write data of the buffer area to the write destination address designated by the unload command (3506). Next, the drive double write unit 3322 waits for a double write completion response, that is, a write completion from the drive corresponding to the double write destination (step 3507). Next, the drive double write 2322 receives a double write completion response (step 3508). Next, the unload command handler 3318 releases the buffer secured in step 3504 (step 3509). Next, the unload command processing unit 3318 generates an unload command completion response, and transmits the unload command completion response (3510). Finally, the process is completed (step 3511).
Description of the reference numerals
110: host computer, 120: storage device, 121: storage controller, 140: driver housing, 150: network, 200: drivecage, 218: a driver.

Claims (19)

1. A storage system connectable with a host via a network, comprising at least one drive casing provided with a storage device and a storage controller connected with the drive casing, characterized in that:
the driver shell is provided with a plurality of driving units,
capable of generating a logical volume having a designated identifier, providing the logical volume to the host as a storage area,
receiving a first read command issued from the host to the drive enclosure providing the logical volume,
sending a second read command corresponding to the first read command to the memory controller,
the storage controller is capable of generating and sending to the drive enclosure data transfer parameters for data transfer of a logical address range of the logical volume specified by the second read command,
and when the drive shell receives the data transmission parameters from the storage controller, reading data from the storage device according to the data transmission parameters and transmitting the data to the host.
2. The storage system of claim 1, wherein:
the first read command includes information of a command class, a start address of the logical volume, and a data transfer length.
3. The storage system of claim 1, wherein:
the storage controller is configured to control the storage device,
generating and transmitting to the drive enclosure the data transfer parameter for data transfer of the address range of the logical volume specified by the second read command if data corresponding to the address range of the logical volume specified by the second read command is not located on a cache of the storage controller,
if data corresponding to the address range of the logical volume specified by the second read command is located on a cache of the storage controller, the data on the cache is used as data to be sent to the host.
4. The storage system of claim 3, wherein:
the storage controller, when data corresponding to the address range of the logical volume specified by the second read command is located on a cache of the storage controller, transmitting the data on the cache to the drive enclosure,
the drive enclosure transmits the data received from the storage controller to the host.
5. The storage system of claim 1, wherein:
the storage controller generates the data transfer parameter from a logical address of the logical volume specified by the first read command based on an address translation table that manages a correspondence relationship between a logical address of the logical volume and a physical address of the storage device constituting the logical volume.
6. The storage system of claim 3, wherein:
the data transmission parameters include: the host identifier, the host address, the data storage destination address, the data transfer length, the data storage destination identifier, the address in the storage device, and the data transfer length.
7. The storage system of claim 1, wherein:
the storage controller manages a correspondence relationship between logical addresses of the logical volumes managed by the storage controller and physical addresses of the storage devices, converts logical addresses of data ranges included in the first read command and the second read command into corresponding physical addresses of the storage regions, and includes the converted logical addresses in the data transfer parameters.
8. The storage system of claim 1, wherein:
the network includes a first network connecting the host with the drive enclosure and a second network connecting the drive enclosure with the storage controller.
9. The storage system of claim 1, wherein:
the driver shell is provided with a plurality of driving units,
transmitting a write command to the storage controller when the write command is received from the host,
transmitting write data of the write command to the memory controller when a message indicating that write preparation is completed is received from the memory controller.
10. A storage system connectable with a host via a network, comprising at least one drive casing provided with a storage device and a storage controller connected with the drive casing, characterized in that:
the driver shell is provided with a plurality of driving units,
capable of generating a logical volume having a designated identifier, providing the logical volume to the host as a storage area,
receiving a first write command issued from the host to the drive enclosure providing the logical volume,
sending a second write command corresponding to the first write command to the memory controller,
the storage controller is capable of finding a write destination of the write data of the first write command in the storage device from the second write command and sending it to the drive casing,
the drive enclosure stores the write data corresponding to the first write command at the write destination of the storage device.
11. The storage system of claim 10, wherein:
the drive casing includes a first area for storing the write data and a second area for storing copy data of the data stored in the first area.
12. The storage system of claim 11, wherein:
the drive case is provided with a plurality of drives constituting the storage device,
the first region and the second region are constituted by different drivers.
13. The storage system of claim 12, wherein:
the storage controller is configured to control the storage device,
the storage device includes an address translation table that manages a correspondence relationship between logical addresses of the logical volume and physical addresses of the storage device included in the drive enclosure constituting the logical volume,
the address translation table is updated when a completion notification of the write processing for the first area and the second area is received from the drive case.
14. The storage system of claim 13, wherein:
the storage controller reports information that the update of the address translation table is completed to the drive casing,
the drive casing, upon receiving information that updating of the address translation table is completed, transmits information that the first write command is completed to the host.
15. The storage system of claim 10, wherein:
the second write command sent by the drive enclosure to the storage controller includes: an identifier of a host that is a transmission source of the first write command, an identifier of a drive enclosure that has received the first write command, a logical volume to which the first write command is directed, and an address of the logical volume.
16. A storage system connectable with a host via a network, comprising at least one drive casing provided with a storage device and a storage controller connected with the drive casing, characterized in that:
the drive enclosure is capable of generating a logical volume having the designated identifier, providing the logical volume to the host as a storage area,
the storage controller has an address translation table that manages a correspondence relationship between the logical volume and a physical address of the storage device constituting the logical volume,
the driver shell is provided with a plurality of driving units,
upon receiving a first write command issued from the host to the drive enclosure providing the logical volume, storing write data of the first write command in the storage device,
and notifying the storage controller of the correspondence between the physical address of the storage device storing the write data and the logical address of the first write command.
17. The storage system of claim 16, wherein:
the storage controller updates the address translation table in response to a notification from the drive housing.
18. The storage system of claim 17, wherein:
the notification of the correspondence sent from the drive enclosure to the storage controller includes: an identifier of a host that is a transmission source of the first write command, an identifier of a drive enclosure that has received the first write command, a logical volume to which the first write command is directed, an address of the logical volume, and an identifier of a drive that is a storage destination of copy data of the write data.
19. A data transfer method of a storage system connected with a host via a network, wherein the storage system includes at least one drive casing provided with a storage device and a storage controller connected with the drive casing, the data transfer method characterized by:
the driver shell is provided with a plurality of driving units,
generating a logical volume having the designated identifier according to an instruction from the storage controller, providing the logical volume to the host as a storage area,
receiving a first read command issued from the host to the drive enclosure providing the logical volume,
sending a second read command corresponding to the first read command to the memory controller,
the storage controller generates data transfer parameters for data transfer of a logical address range of the logical volume specified by the second read command and sends them to the drive enclosure,
and when the drive shell receives the data transmission parameters from the storage controller, reading data from the storage device according to the data transmission parameters and transmitting the data to the host.
CN201910777272.5A 2019-03-22 2019-08-22 Information processing system, storage system, and data transmission method Active CN111722791B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2019054919 2019-03-22
JP2019-054919 2019-03-22

Publications (2)

Publication Number Publication Date
CN111722791A true CN111722791A (en) 2020-09-29
CN111722791B CN111722791B (en) 2024-02-27

Family

ID=72563905

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910777272.5A Active CN111722791B (en) 2019-03-22 2019-08-22 Information processing system, storage system, and data transmission method

Country Status (3)

Country Link
US (1) US11301159B2 (en)
JP (2) JP6898393B2 (en)
CN (1) CN111722791B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113849129A (en) * 2021-09-18 2021-12-28 苏州浪潮智能科技有限公司 IO (input/output) request forwarding method, device and equipment between storage controllers

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11656992B2 (en) 2019-05-03 2023-05-23 Western Digital Technologies, Inc. Distributed cache with in-network prefetch
US11765250B2 (en) 2020-06-26 2023-09-19 Western Digital Technologies, Inc. Devices and methods for managing network traffic for a distributed cache
US11675706B2 (en) 2020-06-30 2023-06-13 Western Digital Technologies, Inc. Devices and methods for failure detection and recovery for a distributed cache
JP2022020344A (en) * 2020-07-20 2022-02-01 富士通株式会社 Storage control device, and storage control program
US11736417B2 (en) 2020-08-17 2023-08-22 Western Digital Technologies, Inc. Devices and methods for network message sequencing
JP7201716B2 (en) * 2021-01-22 2023-01-10 株式会社日立製作所 Information processing system and data transfer method
JP7473600B2 (en) 2022-07-15 2024-04-23 株式会社日立製作所 STORAGE SYSTEM, DATA TRANSMISSION METHOD, AND NETWORK INTERFACE

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1821979A (en) * 2005-02-15 2006-08-23 株式会社日立制作所 Storage system
US20070180188A1 (en) * 2006-02-02 2007-08-02 Akira Fujibayashi Virtual path storage system and control method for the same
US20130145223A1 (en) * 2011-12-06 2013-06-06 Hitachi, Ltd. Storage subsystem and method for controlling the same
US20150370700A1 (en) * 2014-06-23 2015-12-24 Google Inc. Managing storage devices
WO2016189640A1 (en) * 2015-05-26 2016-12-01 株式会社日立製作所 Storage device, and method
WO2017072868A1 (en) * 2015-10-28 2017-05-04 株式会社日立製作所 Storage apparatus
WO2017168690A1 (en) * 2016-03-31 2017-10-05 株式会社日立製作所 Storage device, and method
US20180018272A1 (en) * 2015-01-28 2018-01-18 Hitachi, Ltd. Storage apparatus, computer system, and method
CN108701002A (en) * 2016-02-29 2018-10-23 株式会社日立制作所 virtual storage system
CN109144888A (en) * 2017-06-28 2019-01-04 东芝存储器株式会社 Storage system

Family Cites Families (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPWO2004027625A1 (en) 2002-09-20 2006-01-19 富士通株式会社 Storage control device, storage control program, and storage control method
JP4387261B2 (en) 2004-07-15 2009-12-16 株式会社日立製作所 Computer system and storage system migration method
JP4723290B2 (en) 2005-06-06 2011-07-13 株式会社日立製作所 Disk array device and control method thereof
JP4224077B2 (en) * 2006-04-04 2009-02-12 株式会社東芝 Storage system
US7613876B2 (en) 2006-06-08 2009-11-03 Bitmicro Networks, Inc. Hybrid multi-tiered caching storage system
JP4386932B2 (en) 2007-08-17 2009-12-16 富士通株式会社 Storage management program, storage management device, and storage management method
JP4480756B2 (en) * 2007-12-05 2010-06-16 富士通株式会社 Storage management device, storage system control device, storage management program, data storage system, and data storage method
JP2010033287A (en) 2008-07-28 2010-02-12 Hitachi Ltd Storage subsystem and data-verifying method using the same
US8868869B2 (en) 2011-08-08 2014-10-21 International Business Machines Corporation Enhanced copy-on-write operation for solid state drives
US8943227B2 (en) 2011-09-21 2015-01-27 Kevin Mark Klughart Data storage architecture extension system and method
US8930745B2 (en) 2012-04-16 2015-01-06 Hitachi, Ltd. Storage subsystem and data management method of storage subsystem
US20140164728A1 (en) 2012-12-06 2014-06-12 Hitachi, Ltd. Method for allocating and reallocating logical volume
US20140189204A1 (en) 2012-12-28 2014-07-03 Hitachi, Ltd. Information processing apparatus and cache control method
US10445229B1 (en) * 2013-01-28 2019-10-15 Radian Memory Systems, Inc. Memory controller with at least one address segment defined for which data is striped across flash memory dies, with a common address offset being used to obtain physical addresses for the data in each of the dies
US9696922B2 (en) 2013-12-24 2017-07-04 Hitachi, Ltd. Storage system
US10222988B2 (en) 2014-04-22 2019-03-05 Hitachi, Ltd. Efficient management storage system via defining of several size units in advance
US9112890B1 (en) * 2014-08-20 2015-08-18 E8 Storage Systems Ltd. Distributed storage over shared multi-queued storage device
AU2014403333A1 (en) * 2014-09-15 2016-03-31 Huawei Technologies Co., Ltd. Write data request processing method and storage array
JP6511795B2 (en) 2014-12-18 2019-05-15 富士通株式会社 STORAGE MANAGEMENT DEVICE, STORAGE MANAGEMENT METHOD, STORAGE MANAGEMENT PROGRAM, AND STORAGE SYSTEM
JP6417951B2 (en) 2015-01-15 2018-11-07 富士通株式会社 Storage control device and storage control program
JP6708923B2 (en) 2016-03-14 2020-06-10 富士通株式会社 Storage system
US10467172B2 (en) * 2016-06-01 2019-11-05 Seagate Technology Llc Interconnect for shared control electronics
CN109690681B (en) * 2016-06-24 2021-08-31 华为技术有限公司 Data processing method, storage device, solid state disk and storage system
US10884630B2 (en) * 2017-04-13 2021-01-05 Hitachi, Ltd. Storage system
US10437477B2 (en) * 2017-07-20 2019-10-08 Dell Products, Lp System and method to detect storage controller workloads and to dynamically split a backplane
JP7074453B2 (en) * 2017-10-30 2022-05-24 キオクシア株式会社 Memory system and control method
CN112286838A (en) * 2019-07-25 2021-01-29 戴尔产品有限公司 Storage device configurable mapping granularity system
JP2021124796A (en) * 2020-01-31 2021-08-30 株式会社日立製作所 Decentralized computing system and resource allocation method

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1821979A (en) * 2005-02-15 2006-08-23 株式会社日立制作所 Storage system
US20070180188A1 (en) * 2006-02-02 2007-08-02 Akira Fujibayashi Virtual path storage system and control method for the same
US20130145223A1 (en) * 2011-12-06 2013-06-06 Hitachi, Ltd. Storage subsystem and method for controlling the same
US20150370700A1 (en) * 2014-06-23 2015-12-24 Google Inc. Managing storage devices
CN106575271A (en) * 2014-06-23 2017-04-19 谷歌公司 Managing storage devices
US20180018272A1 (en) * 2015-01-28 2018-01-18 Hitachi, Ltd. Storage apparatus, computer system, and method
WO2016189640A1 (en) * 2015-05-26 2016-12-01 株式会社日立製作所 Storage device, and method
WO2017072868A1 (en) * 2015-10-28 2017-05-04 株式会社日立製作所 Storage apparatus
CN108701002A (en) * 2016-02-29 2018-10-23 株式会社日立制作所 virtual storage system
WO2017168690A1 (en) * 2016-03-31 2017-10-05 株式会社日立製作所 Storage device, and method
CN109144888A (en) * 2017-06-28 2019-01-04 东芝存储器株式会社 Storage system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113849129A (en) * 2021-09-18 2021-12-28 苏州浪潮智能科技有限公司 IO (input/output) request forwarding method, device and equipment between storage controllers
CN113849129B (en) * 2021-09-18 2023-08-25 苏州浪潮智能科技有限公司 IO request forwarding method, device and equipment among storage controllers

Also Published As

Publication number Publication date
JP7135162B2 (en) 2022-09-12
JP6898393B2 (en) 2021-07-07
JP2021128802A (en) 2021-09-02
US11301159B2 (en) 2022-04-12
CN111722791B (en) 2024-02-27
JP2020161103A (en) 2020-10-01
US20200379668A1 (en) 2020-12-03

Similar Documents

Publication Publication Date Title
CN111722791B (en) Information processing system, storage system, and data transmission method
US6912669B2 (en) Method and apparatus for maintaining cache coherency in a storage system
JP6009095B2 (en) Storage system and storage control method
US5881311A (en) Data storage subsystem with block based data management
KR101055918B1 (en) Preservation of Cache Data Following Failover
US7849254B2 (en) Create virtual track buffers in NVS using customer segments to maintain newly written data across a power loss
US10705768B2 (en) Method and system for managing storage using an intermediate storage area
US20010002480A1 (en) Method and apparatus for providing centralized intelligent cache between multiple data controlling elements
US10831386B2 (en) Remote direct memory access
US10620843B2 (en) Methods for managing distributed snapshot for low latency storage and devices thereof
US20190065433A1 (en) Remote direct memory access
US10761764B1 (en) Storage system and data transfer method
JP4884721B2 (en) Storage system and storage control method that do not require storage device format
US11327653B2 (en) Drive box, storage system and data transfer method
US10877674B2 (en) Determining layout templates identifying storage drives
CN110300960B (en) Information system, management program, and program replacement method for information system
CN113360082A (en) Storage system and control method thereof
CN111857540A (en) Data access method, device and computer program product
JP2022054132A (en) Compound storage system
US11875060B2 (en) Replication techniques using a replication log
US11334508B2 (en) Storage system, data management method, and data management program
US11112973B2 (en) Computer system and data management method
US11789613B2 (en) Storage system and data processing method
JP2001067188A (en) Data duplication system
JP2000163317A (en) Information system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information
CB03 Change of inventor or designer information

Inventor after: Akaike Hirotoshi

Inventor after: Hosogi Koji

Inventor after: Shimoboku Yasuo

Inventor after: Sugimoto sadayuki

Inventor after: Yokoi Nobuhiro

Inventor after: Dami Ryosuke

Inventor before: Akaike Hirotoshi

Inventor before: Hosogi Koji

Inventor before: Shimoboku Yasuo

Inventor before: Sugimoto sadayuki

Inventor before: Yokoi Nobuhiro

GR01 Patent grant
GR01 Patent grant