WO1992015058A1 - Sous-systeme de stockage de donnees - Google Patents

Sous-systeme de stockage de donnees Download PDF

Info

Publication number
WO1992015058A1
WO1992015058A1 PCT/GB1991/000254 GB9100254W WO9215058A1 WO 1992015058 A1 WO1992015058 A1 WO 1992015058A1 GB 9100254 W GB9100254 W GB 9100254W WO 9215058 A1 WO9215058 A1 WO 9215058A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
controller
adapter
read
host
Prior art date
Application number
PCT/GB1991/000254
Other languages
English (en)
Inventor
Ian David Judd
Michael Jan Szatkowski
Original Assignee
International Business Machines Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corporation filed Critical International Business Machines Corporation
Priority to EP91904942A priority Critical patent/EP0524945A1/fr
Priority to PCT/GB1991/000254 priority patent/WO1992015058A1/fr
Priority to JP3504612A priority patent/JPH0743687B2/ja
Publication of WO1992015058A1 publication Critical patent/WO1992015058A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/10Program control for peripheral devices
    • G06F13/12Program control for peripheral devices using hardware independent of the central processor, e.g. channel or peripheral processor
    • G06F13/124Program control for peripheral devices using hardware independent of the central processor, e.g. channel or peripheral processor where hardware is a sequential transfer control unit, e.g. microprocessor, peripheral processor or state-machine
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device

Definitions

  • This invention relates to the field of data storage subsystems and more specifically to a high performance subsystem architecture.
  • Data storage subsystems used in data processing systems commonly comprise a device controller connected to one or more storage devices en which customer data is retained. These storage devices are commonly direct access storage devices e.g. disk drives. In recent yee ⁇ ;. such storage subsystems have become more sophisticated and a number of different subsystem architectures have been proposed.
  • US Patent 4 825 406 (assigned to Digital Equipment Corporation) describes a secondary storage facility which employs serial communications between a device controller and attached disk drive.
  • the serial link consists of four lines which are unidirectional bit serial channels. One of the lines carries write and command data to the drive, another carries read and response data from the drive to the controller. The remaining two lines carry signals for use in coordinating and synchronising transmissions.
  • the device controller communicates with ? host processor to which it is attached via a bus.
  • the invention provides a data storage subsystem comprising: e host adapter, a controller, a plurality of direct access storage devices, a dedicated serial link connecting the adapter to the controller, arid a plurality of dedicated links connecting the controller to a respective device, the serial link consisting of a pair of inbound and outbound connections adapted to operate in full duplex mode.
  • the dedicated links between controller and devices are also serial links comprising a pair of inbound and outbound connections adapted to operate in full duplex mode.
  • data and commands are transferred over a serial link in the form of packets, each of the inbound and outbound connections of the serial link being adapted to allow multiplexing of said packets.
  • the controller has a relatively large data buffer and the device has a relatively small data buffer.
  • the architecture of a data storage subystem to allow limited buffering in the device has a benefit in terms of cost.
  • controller data buffer is shared between the devices for reception of read data being transferred from the device to the host and for write data being transferred from the host to the devices.
  • there is one area of buffer in the controller which is used for all data transfers between the host and the device.
  • the controller is arranged to initiate transfer of data between itself and the device by means of multi sector device orders.
  • multisector orders has a performance benefit in that the controller does not need to instruct the device every sector. The controller is thus freer to carry out other tasks.
  • the adapter includes a plurality of direct memory access (DHA) channels for transferring data between the host and the dedicated serial link between adapter and controller and it is a further preference that the controller includes a direct memory access (DMA ' * controller which controls a plurality of DMA channels for transferring data between the adapter and the controller buffer or between the device and the controller buffer.
  • DHA direct memory access
  • DMA ' * controller which controls a plurality of DMA channels for transferring data between the adapter and the controller buffer or between the device and the controller buffer.
  • Figure 1 is a block diagram of the major functional units of a data storage subsystem according to the present invention.
  • Figure 2 is a block diagram showing the main components of the adapter of Fig.l;
  • Figure 3 shows the structure of the adapter link chip of Fig.2
  • Figure 4 is a block diagram showing the main components of the controller of Fig.l;
  • Figure 5 is a block diagram showing the structure of the controller link chip of Fig. ;
  • FIG. 6 is a block diagram showing the communication between tasks defined in the controller microprocessor
  • Figure 7 is a block diagram showing the inbound and outbound serial links.
  • a data storage subsystem is described which is suitable for connection to a data processing system and provides large amounts of storage which may be accessed by the host system at high speed.
  • the main functional units of the subsystem shown in Fig 1, are (i) Host Adapter, (ii) Device Controller & (iii) Direct Access Storage Device (DASD) .
  • the functional units are interconnected by point to point, full duplex serial links.
  • Figure 1 shows a basic configuration of the subsystem wherein one adapter 10 is connected via a dedicated serial link 15 to one controller 20 which is in turn connected by four serial links 25-28 to four DASDs 30. The bulk of the following description will relate to this basic configuration.
  • each adapter is connected to up to four controllers, each of which may be connected to up to four devices.
  • the controller can be attched to up to two adapters.
  • the host adapter is housed in the host system and is connected via a serial link to a housing (eg rack mounted drawer or free standing unit) which comprises one controller and four DASDs with associated power supply and cooling system (not shown) .
  • the adapter is essentially a general purpose multiplexer that connects the host system to the controllers through the serial links.
  • the adapter may be designed to attach to the host system through a variety of existing interfaces e.g. IBM's Micro Channel architecture. (Micro Channel is a trademark of International Business Machines Corporation)
  • the adapter fetches SCSI (Small Computer System Interface) commands by Direct Memory Access (DMA) from system memory and forwards them to the controller over the serial links. 2)
  • DMA Direct Memory Access
  • the adapter manages a pool of DMA channels and allocates them on request to the controllers for the transfer of read/write data.
  • the adapter fetches packets of write data from host memory by DMA and transmits them on the serial links to the controllers.
  • the adapter receives packets of read data from the serial links and stores them in system memory by DMA.
  • the adapter assembles the ending status for each command and presents it to the system. Good status can be presented for up to 4 devices at a time.
  • the adapter provides the means to abort a previous SCSI command
  • the controller implements the SCSI command set (the members of which relevant to this description are defined elsewhere) for the attached DASD. Its principal functions are as follows:
  • the controller maintains a command queue for each DASD.
  • the controller has a data buffer (shared between the DASDs) to prefetch write data from the host system, to correct read data and to prefetch read data from the DASD.
  • the controller generates SCSI status.
  • the DASD seeks to the specified cylinder and head. 2) The DASD searches for the starting Logical Block Address (LBA) supplied by the controller and then it reads or writes the specified number of blocks, seeking to the next track as necessary. If any defective blocks are found the DASD skips over them automatically.
  • LBA Logical Block Address
  • the DASD generates and checks the ECC bytes that are appended to each block.
  • the ECC hardware is contained in the DASD to allow the controller to support a range of DASD that may have different ECC algorithms.
  • the controller If the DASD detects a data error then the controller requests the DASD to supply the error pattern and displacement. The controller will then correct the data in its buffer and restart the transfer to the adapter.
  • the DASD has a recording channel for read/write data.
  • the write data is encoded, serialised and fed to the head.
  • the read signal from the head is detected, deserialised and decoded.
  • the serial link provides point to point communication between two nodes of the subsystem i.e. between adapter & controller and between controller & DASD.
  • the unit of data transfer is a packet.
  • the format of a packet is shown below and comprises a control field, an address field, a variable length data field and a CRC field.
  • Packets may be multiplexed on the serial link to perform several commands on different DASDs simultaneously.
  • Full duplex data transmission supports the transfer of read data and write data simultaneously.
  • each serial link comprises two links providing data transfer in two opposite directions. This is shown in Fig 7 in which each node has an inbound link on which it receives incoming data and messages and an outbound link over which it transmits data and messages.
  • the link has a simple protocol.
  • Each node may transmit a packet on its outbound link subject to pacing responses and acknowledgements received from the remote node on its inbound link.
  • Packets on the serial link may be classified into two types.
  • Message packets originate from a software process in one node and are addressed to a process in the destination node. (A description of the processes is given below in the sections on Controller and Adapter operations). Message packets are typically used for commands and status (the different types of messages sent in a message packet are described in greater detail below) .
  • Data packets originate from a DMA channel in one node and are addressed to a DMA channel in the destination node. Data packets normally contain read/write data.
  • Each packet contains an address field which indicates the source and/or destination of the packet.
  • Messages are packets on the serial link that are addressed to a process in the destination node.
  • the first data byte in the packet (that is the first byte in the data field of the packet) identifies the message.
  • the subsequent bytes are the parameters.
  • Most messages carry a TAG as a parameter. This allows the messages to be associated with the corresponding command sent from the adapter.
  • This message transfers a SCSI Command Descriptor Block to the controller queue.
  • DASD ADDRESS identifies the target storage device that is to execute the command
  • SCSI EXT provides an extension or modification beyond the function provided by the SCSI command set as described in ANSI specification 'Small Computer Systems Interface/2 1 : X3T9.2/86-109. This portion of the message is set to enable Split Write to the DASD or to enable Split Read on the Adapter to Controller link;
  • DMA Address is the start address in the system memory of the data area for the SCSI command
  • COMMAND DESCRIPTOR BLOCK (CDB) is the command descriptor block for the SCSI command.
  • the CDB comprises one of the commands of the SCSI command set.
  • This message is sent by the adapter to the controller in response to a DATA_READY message.
  • the Tag identifies the command with which the particular READY_FOR_READ message is associated.
  • the Link address identifies the DMA channel allocated in the adapter for this read operation.
  • This message is generated by the Adapter when executing the ABORT SCSI operation fetched from the host.
  • TAG 1 identifies the mailbox containing the ABORT_SCSI Command operation
  • TAG 2 identifies the command to be aborted. Details of the mailboxes ca be found later in the description. The message causes the controller to terminate execution of the command if it is in progress or to remove the command from its queue if execution has not begun.
  • This message is sent by the Adapter to the Controller to reset selected resources within the controller or DASD.
  • This message instructs the adapter to transfer data from the host at the DMA start address for the DMA length.
  • LINK ADDRESS identifies the DMA channel in the controller to which the data packets are to be addressed.
  • the Tag identifies the command with which the data is associated.
  • This message instructs the adapter to allocate a DMA channel to this tag if it has not already done so and prime it for a transfer into host memory beginning at the specified start address and for the specified length.
  • the Adapter responds with a READY_FOR_READ message telling the controller to which DMA channel the data packets should be addressed.
  • This message carries the SCSI status generated on completion of the command identified by the tag.
  • the main components of the Adapter hardware are shown in Fig 2.
  • the core of the adapter is the microprocessor chip (MPC) 110 which contains a high performance controller controlling the transfer of messages and data between the host system and the attached controller.
  • MPC microprocessor chip
  • ALC adapter link chips
  • Each serial link has four 128 byte packet buffers for commands, data and status in transit to and from the controllers.
  • the interface between the MPC and the ALCs is an I/O bus 115.
  • ALC Adapter Link Chip
  • the Data Ram 121 contains the following areas:
  • this buffer is used by the high performance microprocessor to build READY_FOR_READ messages destined for the controller. By setting the appropriate hardware, this buffer can then be transmitted to the controller.
  • Mailbox Pointer register 4 Byte register that can be read and written by the system. It is initialised by the system to point to the first mailbox in the chain. The system is only allowed to write to this register when the current Tag Register is equal to the Last Tag register or immediately after the adapter has been reset.
  • Last Tag Register 1 Byte register that can be read and written by the system. It is written by the system when it adds some mailboxes to the queue. It indicates the tag which is in the last mailbox. By this means the adapter knows when it has reached the end of the list. When the re ⁇ ister is written, the adapter is interrupted.
  • a 32 byte DMA packet buffer for fetching all mailboxes from host: SCSI_COMMAND, ABORT and RESET messages to the controller are sent directly from the DMA buffer. (READY_FOR READ message is built in the message buffer and sent over outbound link to the controller)
  • the DMA buffer can be used to read or write from host memory under DMA control.
  • the data RAM is time multiplexed between the serial links, the MicroChannel or inter-link transfers and the high performance microprocessor.
  • the packet buffers each require a packet status register (PSR) . These are held in the Status RAM 122 and are 16 bits wide. Packet buffers and associated Packet Status registers are shown in Figure 7. Each register contains two fields: DESTINATION - for outbound data packets this field contains s value which will be copied into the address field of the outgoing packet when the contents of the corresponding packet buffer are transmitted by the link. This value may be loaded automatically by hardware when the packet is being fetched from the packet buffer, in preparation for transmission. For inbound packets, this field contains an address extracted from the address field of the incoming packet. This value is written into the PSR by the inbound link FSM and its value is used to determine the subsequent routing of the packet.
  • PSR packet status register
  • BYTE COUNT - for outbound packets this contains a value which indicates the number of bytes which have been placed in the corresponding packet buffer.
  • this value has to be copied into a byte counter (part of the link hardware) which is decremented as each data byte is sent.
  • the value in the PSR is preserved in case the packet has to be retransmitted due to an error in transmission.
  • this field contains a value which indicates the number of data bytes which were received in the incoming packet.
  • the MicroChannel interfaces with host memory and employs the Data RAM host-interface registers defined above.
  • DMA channels There are sixteen DMA channels (not shown) numbered 0-15 in each ALC. They are employed to DMA data between host memory and the DMA packet buffer.
  • HARDWARE Figure 4 shows the main functional components of the controller.
  • the core of the controller is the MPC chip 210 that contains the high performance controller (HPC) and a DMA controller which controls the transfer of data to and from the data buffer 220.
  • HPC high performance controller
  • DMA controller which controls the transfer of data to and from the data buffer 220.
  • a DMA bus 225 connects the DMA controller to two Controller Link Chips 230.
  • Data Buffer 220 All data between the adapter(s) and DASDs passes through the data buffer.
  • the buffer is also used for storing read ahead data in case it is subsequently requested by the system (see section on READ AHEAD later in the description).
  • Sixteen DMA channels (numbered 0-15) are provided for transfer of data between the link packet buffers and the data buffer. There are two channels per device (DA) link and 4 channels per SA link.
  • the data buffer consists of an array of DRAM modules. Data in this buffer is stored with ECC to ensure data integrity.
  • the data buffer is allocated in seven 32K byte segments per DASD. If more than one task is executing on the DASD, a different segment is allocated for each task.
  • the high performance controller controls the interfaces to the controller via a series of external registers implemented in the Controller Link Chip.
  • the I/O bus 226 is used by the microprocessor to access these registers.
  • Static RAM 240 used for programme execution.
  • EPROM 250 stores the microcode employed in the operation of high performance controller. The structure and operation of the microcode is described in more detail below.
  • the main functional areas contained within a CLC are shown in Figure 5. They are: 1) Two DASD serial interfaces (DAO and DAI)
  • Packet buffers are employed to hold incoming and outgoing data.
  • an A/B buffer implementation is used. This allows the DMA logic to fill (or empty) buffer B whilst buffer A is being used by the link and vice versa.
  • the full duplex nature of the link means that both inbound and outbound links need individual sets of packet buffers. Since the controller link chip contains three serial interfaces, this means that a total of 12 packet buffers are required to service the serial links. Additional packet buffers in which microcode can build outgoing Messages are also implemented (one per link). This allows the high performance microprocessor to construct messages without having to withdraw one of the A/B packet buffers from servicing DMA transfer hence preventing ongoing data transfers from being adversely affected.
  • Each of the three links in the CLC is provided with five packet buffers classified as outbound, inbound or message.
  • Each link is provided with two A/B outbound packet buffers which are serviced by the DMA hardware. These buffers are filled with data obtained from the data buffer and sent to the DASD (DA links) or to the adapter (SA link).
  • Each link is also provided with two A/B inbound packet buffers.
  • Incoming packets (from the adapter or the DASD) are stored in these buffers and they are serviced either by DMA hardware or Spinnaker, depending on the content of the address field in the incoming packet.
  • Each link interface is also provided with a message packet: buffer which is used by the microprocessor to build outbound messages to send to the adapter or the DASD.
  • DMA Interface logic this transfers data from the packet buffer to the controller data buffer under the supervision of the DMA controller.
  • the controller microprocessor includes a DMA controller which coordinates transfer between the controller link chips and the shared data buffer.
  • the Controller Link Chip incorporates a DMA interface to transfer data between the packet buffers in the CLC and the controller data buffer.
  • the DMA interface is supervised by the DMA controller contained within the microprocessor chip.
  • the MP chip contains logic to arbitrate between DMA requests.
  • DMA transfers are preceded by an arbitration phase during which the CLC chips are allowed to signal requests for DMA channels which require servicing.
  • the DMA controller issues a grant to one of these requests, after which point the CLC can initiate transfer.
  • Each controller link chip can use 8 DMA channels to service data transfers.
  • This arrangement allows up to 2 DMA channels to service each DASD and allows the adapter link to be serviced by up to 4 DMA channels. These channels can be used concurrently to exploit the packet multiplexing feature of the serial link.
  • the device links are given priority over the adapter link.
  • the DMA bus allows a data transfer rate of up to 40 MBytes/s.
  • a 32 byte (DA) transfer from packet buffer to data buffer takes approximately 1.2 microseconds.
  • a 128 byte (SA) transfer takes 3.6 microseconds.
  • each CLC raises requests on the DMA bus as follows:
  • a DMA request will be raised on receipt of an inbound link packet, provided that the address field of the packet indicates that the data was destined for a DMA channel.
  • a DMA request will be raised for a DMA channel if either one or both of its associated link packet buffers are empty.
  • DMA Store operation DMA store operations are used to empty inbound packet buffers.
  • DMA Fetch operation DMA fetch operations are used to fill outbound packet buffers. Each transfer will normally be for a complete packet buffer.
  • the CDB's (plus ABORT and RESET) are stored in Command Descriptor Queue Entries under the control of the SA RECEIVE MESSAGE process when they are first received.
  • FREE queue Initially all CDQE's are free.
  • the FREE queue is a queu_ only in that one member points to the next allowing all free CDQEs to be found. No significance in the order of entries.
  • NEW COMMAND queue when a new command arrives, the SA task copies it into the CDQE at the head of the free queue.
  • the CDQE is removed from the free queue and added to the New Command queue. Abort messages are also put in a CDQE and added to the New Command queue.
  • DEVICE COMMAND Queue There are four device command queues - one for each device. Each queue is serviced by its own Command process. When the QUEUE MANAGER process finds a new command in a CDQE in the New Command queue, it finds the device to which it is addressed and transfers the CDQE to the corresponding Device Command queue.
  • MESSAGE Queue when command processing completes, its CDQE is loaded by the relevant Command process with a SCSI status message and transferred from the Device command queue.
  • the Message queue is thus a queue of requests to the SA Transmit Message process.
  • the same queues are used for commands received via different Adapter links.
  • the head and tail pointers for the Free, New Command and Message queues are fields in the Message Control Block (MCB).
  • the head and tail pointers for the device command queues are in the corresponding Device Control Block (DCB).
  • the head addresses the first member of the queue.
  • the tail addresses the last member.
  • Each CDQE contains a 'Next' pointer. Although a CDQE may be in one of several queues it can never be in more than one at a time.
  • Tasks defined within the high speed microprocessor control the operation of the adapter. Tasks are initiated by interrupts, which may be from a hardware event or via a software interrupt from another task. Software interrupts are the means where one task can set an interrupt to another.
  • Status the status task is responsible for managing status to be presented to the host system. Status is passed to this task from one of the other tasks and may be directly presented by writing it to the hardware.
  • Link there is one link task to handle the 4 serial links to the controllers. This task is responsible for interpreting any messages received from the controllers(s) and taking appropriate action.
  • Mailbox this manages the mailbox interface from the host system. It is responsible for receiving each mailbox from the system. If the mailbox is SEND_SCSI Command, it will be passed to the link task for transmission to the appropriate controller.
  • the adapter When instructed by the host processor, the adapter fetches commands from host memory and forwards them immediately to the appropriate controller for execution.
  • the mechanism for fetching commands is dependent on the architecture of the host system and will vary accordingly. In this description, the following mechanism is used on the MicroChannel.
  • the host system initiates subsystem operations by means of Mailboxes which are built in host memory.
  • Each mailbox contains a unique tag v/hich identifies that particular command. For example, when the host wishes to initiate an operation in the subsystem, it will build the operation in the next available mailbox end write the Last Tag Register.
  • Writing the Last Tag Register interrupts the MAILBOX task in the adapter microprocessor.
  • the MAILBOX task instructs the adapter hardware to DMA the mailbox from host memory into the 32 byte DMA buffer to a designated one of the adapter link chips (master chip) to which all messages fr ⁇ m the host are directed.
  • the MAILBOX task decodes the mailbox to determine the type of operation defined by the mailbox contents and if it finds that it is a SEND_SCSI command, the mailbox is converted to a SCSI_COMMAND message for transmission to the appropriate controller.
  • the SCSI_COMMAND message is sent from the 32 byte DMA buffer over the link in the data field of a message packet.
  • the address field of the packet contains the address of the destination which in this case is the controller microprocessor. If the command is destined for a controller which is not serviced by the master chip, it is copied into the DMA buffer in the other ALC and sent over the serial link.
  • the host defines a number of different operations as well as the 5END_SCSI command, many of which are acted on by the adapter and do not require transmission to the controller.
  • two operations namely ABORT_SCSI COMMAND and RESET are passed on to the appropriate controller in the form of ABORT and RESET messages. Details of the format of these messages were given in the list of Adapter-Controller messages earlier in the description.
  • ABORT_SCSI COMMAND and RESET operations are handled by the adapter in essentially the same way as the SEND_SCSI COMMAND operation.
  • the MAILBOX task decodes the mailbox and sends the ABORT or RESET message from the DMA buffer to the appropriate controller. Again, depending on which controller is addressed, it may be necessary to copy the message into the DMA buffer of the second ALC.
  • the adapter starts a timer for each command that it issues. This serves to detect lost commands or a hung controller without burdening the host system with a large number of timers.
  • the Adapter Idle Task periodically updates the timer and checks that the operation has not timed out.
  • Operation of the controller is achieved by means of tasks defined in microcode within the Spinnaker microprocessor. There are 8 tasks defined in this processor. These include;
  • An SA task which manages the interface with the adapter/host.
  • a Command Control Task which is the task with overall control. New SCSI commands are passed to it from SA task. It queues them, decodes them and sends instructions to SA task and appropriate device task. SA and DA tasks perform the data transfer.
  • the controller uses the above tasks but the SA task and Command Control task are extended via the concept of subtasks.
  • the controller has a number of processes which are implemented as an independent task or as a subtask within a task. Subtasks are run under the control of a subtask scheduler.
  • Fig 6 is a block diagram showing the communication between the different processes.
  • Control blocks are used in this communication, one process enters information into control block and posts another process which accesses the information in the control block. Control Blocks are passed between processes.
  • This process deals with all messages sent from the adapter to the controller i.e. SCSI_COMMAND, ABORT, RESET & READY_F0R READ. The format of these messages can be found elsewhere in this description).
  • a message packet from the adapter is received in the inbound packet buffer of the CLC. The contents of the address field of the incoming packet identify it as a message and it is serviced by the high performance controller. If the message is a new COMMAND, ABORT or RESET the SA Receive message process copies it into the Command Descriptor Queue Entry (CDQE) at the head of the free queue. The CDQE is then enqueued to the Queue Manager process. If the message is READY_F0R READ it is passed to the appropriate SA XFER process (ie the process associated with the device which contains the data to be read) .
  • CDQE Command Descriptor Queue Entry
  • This process is a subtask of the Command Control Task and services interrupts from the SA RECEIVE MESSAGE process.
  • the message is a SCSI_COMMAND message. It may also be an ABORT or RESET.
  • the SA RECEIVE MESSAGE Process has copied the message into the CDQE and transferred the CDQE from the free queue to the 'New Command Queue' and then posted this process which then transfers the commands from the 'new command 1 queue to the device specific queue and posts the appropriate command process.
  • the QUEUE MANAGER process carries some limited processing of the message in order to determine which COMMAND process should be posted.
  • the Command process (one of four subtasks of the command control task) processes the SCSI commands on the Device Command queues. There are four instances of the command process running in parallel, one for each of the four supported devices. Normally, each process handles commands addressed to its device.
  • Each Command process takes a command off its queue and :
  • This procedure is repeated for each command in the Device Queue until the queue is exhausted, then it suspends and is resumed when a new command is added to the queue by the QUEUE MANAGER process.
  • the DA process handles the following requests from the COMMAND process:
  • the appropriate routine is called to process Read command requested by the COMMAND process.
  • the routine issues a SEEK order to the appropriate DASD, initialises the DMA i.e. allocates a DMA channel, calculates the buffer size available and if there is space available sets up and issues a READ order to the DASD which initiates transfer of data from the DASD to the data buffer in the controller.
  • the DMA address is passed to the DASD in the READ order and is used in the address field of the incoming data packets to identify the destination of the data.
  • a COMMAND process communicates with a DEVICE process by posting one of these three events
  • NEWREQ signals to a device process that a new request is to be started
  • the DEVICE process On receiving a request from the COMMAND process, the DEVICE process will initiate the appropriate action to the DASD by sending the appropriate DASD order over the serial link.
  • DASD orders are low level read/write orders that are generated by the controller. The following orders are provided to allow the controller to read and write data. Each of the orders defined below are sent to the DASD over the serial link in the data field of a packet. All 'order' packets are addressed to the microprocessor in the DASD for execution or for distribution to other components of the DASD.
  • This order instructs the DASD to terminate read-ahead (if active) and seek to a specified cylinder and head. Also, for a write command, trie separate seek order allows the controller to initiate the seek as soon as it decodes the command and without waiting to receive the write data from the adapter. If the STOP order is issued before the DASD has completed a read or extended read order (status packet not returned), the DASD terminates the read operation immediately, starts a seek to the cylinder and head specified in the ST0P_AND_SEEK order and returns a status packet for the terminated READ order. No Status packet is sent for the STOP order.
  • This order is also sent to the DASD on receipt of an AB0RT_SCSI command message from the adapter. In this case no seek operation is initiated.
  • LBA Logical Block Address
  • the Address field contains the byte which is to be placed in the address field of any data packets which are returned as a result of this order.
  • the DASD sends the requested data to the controller and checks the ECC bytes at the end of each block. If the DASD encounters any blocks that are marked as defective then it skips over them automatically. Finally the DASD returns status to indicate whether any errors were detected.
  • This order has the same format as the READ order and invokes a seek operation to the selected cylinder and head address.
  • the sector- corresponding to the LBA contained in the order is located and as many records as are specified in the count field are read from the disk.
  • the address field contains the byte that is placed in the Address field of any data packets which are returned as a result of this order .
  • the CONDITIONAL_READ order is issued by the controller only when the amount of read data requested by the host is larger than a selected amount. If the amount of data requested is small then the use of CONDITIONAL_READ is not warranted.
  • This order instructs the DASD to search for a particular LBA and write a specified number of blocks.
  • the parameters are the same as for a READ order (see above) except that the controller also supplies the data to be written. In addition there is no address field.
  • the Conditional Write order has the same format as the WRITE order.
  • This order extends the operation of a preceding Read, Conditional_Read, Write or Conditional_Write order.
  • the 'count' parameter specifies the number of individual sectors required to be read or written after the current order completes.
  • the LBA field defines the address of the first block to be read or written. This will be one more than the LBA of the last block to be read or written by the previous order if contiguous reading or writing is required. If the LBA field is not the first block after the last LBA of the previous order, the blocks in between these LBA definitions are skipped over and not read or written. To be effective, the DASD must receive an EXTEND order before the previous order has completed. The controller uses EXTEND to perform a back to back write and to continue read ahead. These operations are described in more detail below.
  • Per device process which transfers data between host and a read or write buffer in the controller.
  • the COMMAND process can issue the following commands to the SA XFER process:
  • the parameters required to carry out the data transfer are passed from the COMMAND process to the SA XFER process in a control block.
  • This process transmits messages (READY_FOR_WRITE, DATA_READY & STATUS) to the adapter on behalf of other processes.
  • READY_FOR_WRITE & DATA_READY are passed to this process from the SA XFER process and STATUS messages are passed from the COMMAND process.
  • the Adapter sends a SCSI_COMMAND message including a READ operation in the CDB to the controller.
  • the message includes the address of the DASD from which the read data is to be transferred and the address in host memory to which the data is to be sent.
  • the Controller processes the command as described above and passes control to the device task which "sends a STOP_AND_SEEK order to th_ DASD. This terminates any currently active Read Ahead operation. (If no read ahead is currently active, this order is not sent).
  • the DASD returns STATUS to the controller indicating the status of the terminated Read Ahead operation and then begins the seek to the specified head and cylinder.
  • the controller device task allocates a 32K segment of the data buffer for the read data that is to be transferred.
  • the device task also allocates a DMA channel over which the data is to be transferred to the controller data buffer.
  • the device task then sends a C0NDITI0NAL_READ order to the DASD including the address of the allocated DMA channel, the data start address and the number of blocks to be transferred.
  • a 'normal' READ order rather than a C0NDITI0NAL_READ order will be sent.
  • the address field of the data packet(s) contains the address of the DMA channel.
  • the read data is transferred to the controller via the serial link and into the allocated space in the controller data buffer.
  • the Controller sends a DATA_READY message to the adapter which causes the host to initialise the host DMA channel via which the read data is to be transferred between controller and host memory.
  • the adapter sends a READY_F0R_READ message to the controller in response to the DATA_READY message.
  • the READY_F0R_READ message which identifies the DMA channel initialised in the host, is received by the controller SA RECEIVE message process and passed to the SA XFER process which initialises the SA DMA ie allocates a DMA channel over which data will be transferred from the data buffer.
  • the adapter sends a SCSI_COMMAND message defining a write operation to the controller.
  • the controller processes the command (as described earlier) and control is passed to the DEVICE process from the COMMAND process.
  • the DEVICE process sends a STOP_AND_SEEK order to the DASD (read ahead operation is currently active).
  • the DASD stops read ahead and sends status to the controller indicating the status of the read ahead operation just terminated.
  • the DASD begins seeking to the cylinder and head specified in the ST0P_AND SEEK order.
  • the controller SA XFER process allocates space in the data buffer and initialises the SA DMA channel to which the packets of write data from the host are to be addressed.
  • the SA XFER process posts the SA TRANSMIT process which transmits a READY_FOR_WRITE message to the adapter. This message identifies the DMA channel initialised in the last step.
  • the adapter LINK task allocates a DMA channel which is to be employed to transfer data to host memory and then begins transmission of packets of write data to the controller.
  • the controller DEVICE task initialises a DMA channel to be used for transfer of data between the data buffer and the DASD and in this example sends a C0NDITIONAL_WRITE order to the DASD identifying the LRA and amount of write data which the DASD is to expect.
  • the DASD begins the LBA search.
  • Read Ahead is a function provided by the controller to improve the performance of a set of READ commands which together constitute a long sequential read. This is achieved by continuing reading from the DASD into the controller's buffer in anticipation of the next read as follows:
  • a SCSI READ command causes the controller to instruct the DASD to transfer the required number of sectors into the 32K segment of the controller's data buffer that has been allocated to this read.
  • the controller will normally request 32K of data (i.e. the amount of data to fill the allocated buffer space) even when the command from host requests a lesser amount.
  • 32K of data i.e. the amount of data to fill the allocated buffer space
  • the READ order to the DASD specified a larger amount of data than that requested by the host, the extra data will be stored in the controller buffer.
  • the data transfer is extended by an EXTEND order sent by the controller to the DASD.
  • the DASD transfers the sectors specified in the EXTEND order to the new end of the buffer.
  • the controller sends an EXTEND order to the device requesting 4K of new data to fill the buffer.
  • the controller On receipt of the next READ command from the adapter, the controller examines the buffer to see whether it already has the required data (or is about to have). If so it transfers that to the host. If not, it instructs the DASD for the new read, terminating any active read ahead. The Read Ahead continues until the DASD is reinstructed for a new transfer or when the Read Ahead buffer fills.
  • Back to back writes are consecutive write commands writing consecutive blocks. The first block of a subsequent write immediately follows the last block of the previous write. Special support for back to back writes in the controller and the DASD allow the writes to be effected without the DASD taking the revolution between the commands which would otherwise be required.
  • the command specific routine for the WRITE command When the command specific routine for the WRITE command reaches a point where it could extend its currently active device transfer to include a following consecutive write, it 'looks over its shoulder' and checks whether the following CDQE contains such a write. If it does, it issues an Extend Request to the device task which sends an EXTEND order to the DASD. If the DASD receives the EXTEND order before the ⁇ receding write is finished, it simply extends the count of the current write by the amount specified by the Count parameter in the EXTEND order.
  • EXTEND orders are themselves candidates for extension.
  • Host sends SCSI write command for sectors 0 - 7 Controller issues WRITE order for sectors 0 - 7 Host sends SCSI write command for sectors 8 - 15 Controller detects back-to-back write in queue Controller issues EXTEND order for 8 sectors If EXTEND order arrives before the end of the WRITE, the DASD extends its current block count by 8 and does not return status after sector 7. Instead it returns status after sector 15.
  • the DASD If the EXTEND order arrives too late, the DASD generates normal status for the WRITE order and tells controller that EXTEND order was received too late. The Controller then reissues a WRITE order for sectors 8-15. Sectors 8 - 15 are then written on the next revolution.
  • back to back writes using the EXTEND order employs the packet multiplexing feature of the serial link i.e. the controller has to be able to send the EXTEND order over the Serial link at the same time as it is transmitting the write data to the DASD.
  • Split Read is a performance enhancement gained by starting reading from the DASD at the first sector, within the range of the current READ operation, which appears under the head, as opposed to the first sector of the range. For example, a read for sectors 4,5,6...15,16 might be read in the order 6,7,8, ...15,16,4,5 if the head happened to arrive too late to read sector 4 but early enough to read sector 6. Thus, the transfer from the DASD would complete 11 sectors (i.e 16 minus 5) earlier than would otherwise be the case.
  • the optimisation is achieved by the controller instructing the DASD to read the first sector. If the DASD then finds that the sector is more than a certain time away (1 millisecond for example), it aborts the read and returns the current LBA to the controller. The controller then makes the decision whether to reissue the same read, or to split it into a 'tail' read which transfers from the current sector to the end, followed by a 'head' read which transfers from the first sector to the start of the tail.
  • the CONDITIONAL READ order to the DASD which is aborted if there is a long delay to the first sector, is available.
  • the data arrives in the controller buffer, in the above example, 11 sectors earlier than it would if the read had been done in the normal order. However if the controller then has to rearrange the data to be sent to the host in the normal order, then the performance benefit of split read operations is greatly reduced.
  • the technique described herein realises more of the potential of 'split reads' by avoiding the need to reorder the data in the controller buffer. This is achieved by defining the message interface between the controller and the adapter to allow the controller the necessary control over the host DMA addresses. In effect the controller has random access to host memory.
  • the DATA_READY message sent from the controller to the adapter when data is placed into the controller data buffer specifies the address in host memory to which the data should be directed.
  • the controller calculates an amended start address from the address sent in the initial command from the host and sends this to the host in the DATA_READY message.
  • the controller is told the start address for sector 4 and is thus able to calculate the amended address in host where the first sector of the split read data (in this case sector 6) should be directed. If sectors 6 to 16 are already in the data buffer, then the controller will issue a single DATA_READY message to initiate transfer of sectors 6 to 16. The transfer of sectors 6 to 16 will complete. When the controller has sectors 1 to 5 in its buffer, it will then issue a DATA_READY message to the controller indicating readiness to transfer sectors 1 to 5 and the address in host memory to which sector 1 is to be directed.
  • the controller's ability to randomly access areas of the host memory ie control the host address also facilitates another performance benefit.
  • the DASD doesn't wait to the end of a sector (512 bytes) to check the ECC before sending the sector up to the host.
  • the DASD doesn't have sufficient buffering to store a sector's worth of data.
  • the data is read serially from the disk and compiled into 128 byte packets for transmission to the the controller.
  • the device checks the ECC and appends 6 ECC bytes to the end of the data.
  • the raw data will also be good data i.e. the data is transferred from the device.
  • Transfer of data directly to host without checking provides a performance benefit over prior systems in which each sector is checked any of the sector is transmitted to the host.
  • the time taken for multiple sectors of data to be received by the host is reduced in the direct case by approximately the time taken for one sector of data to travel from device to host. In high performance systems such as described herein, this benefit can reduce the overall overhead by a significant amount.
  • the DASD If, at the end of a sector, the DASD detects the presence of an error in that data, the ECC bytes indicate this to the controller which is alerted. When it has completed sending the current packet of data to the host, the controller ceases transmission and sends a DATA_RETRY message to the host. The controller requests retransmission of the sector which contained the error. The DASD retransmits the requested sector in 128 byte packets to the controller which stores the data in its buffer. If the retransmitted block of data is good, the controller passes the data up to the host.
  • the controller sends a DATA_RETRY message to the adapter which specifies the address in host memory to which the retry data is to be directed.
  • the DATA RETRY message instructs the adapter to set up a new DMA channel over which the data is to be transferred into host memory.
  • the adapter responds to the DATA_RETRY message with a READY_F0R_READ message.
  • the DATA_RETRY message indicates the amount of data which is being retried.
  • the controller will re-request the data a number of times. If, after a predetermined number of attempts, the controller still hasn't received good data, it will attempt to correct the error in the data held in its buffer. To this end, it sends Operate on ECC' order to the device, which causes the device to calculate which bytes were in error and also the correct data for those bytes.
  • the number of bytes which may be corrected is implementation dependent; in the described system the number is two.
  • the device sends the correction information to the controller which corrects the data held in its buffer. The controller now sends the block of data including the corrected bytes to the host.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Microelectronics & Electronic Packaging (AREA)
  • Human Computer Interaction (AREA)
  • Bus Control (AREA)
  • Communication Control (AREA)

Abstract

On décrit un sous-système de stockage de données à hautes performances pouvant être relié à un système de traitement de données. Les unites fonctionnelles principales du sous-système sont (i) un adaptateur central, (ii) une unité de contrôle et (iii) une mémoire à accès direct (DSAD). Les unités fonctionnelles sont interconnectées par des liaisons bidirectionnelles sérielles point à point spécialisées transmettant les instructions et les données sous forme de paquets. Les liaisons sérielles assurent également le multiplexage des paquets, ce qui permet aux instructions transmises par la liaison sérielle d'être multiplexées tandis que les données de contrôle ou d'écriture sont transférées entre les unités. Une configuration de base du sous-système comprend un adaptateur relié par une liaison sérielle à une unité de contrôle, reliée à son tour par quatre liaisons sérielles à quatre mémoires à accès direct. Cependant, l'architecture du sous-système décrit permet de connecter chaque adaptateur à quatre unités de contrôle au maximum, autorisant ainsi le rattachement d'un maximum de seize unités à un adaptateur.
PCT/GB1991/000254 1991-02-19 1991-02-19 Sous-systeme de stockage de donnees WO1992015058A1 (fr)

Priority Applications (3)

Application Number Priority Date Filing Date Title
EP91904942A EP0524945A1 (fr) 1991-02-19 1991-02-19 Sous-systeme de stockage de donnees
PCT/GB1991/000254 WO1992015058A1 (fr) 1991-02-19 1991-02-19 Sous-systeme de stockage de donnees
JP3504612A JPH0743687B2 (ja) 1991-02-19 1991-02-19 データ記憶サブシステム

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/GB1991/000254 WO1992015058A1 (fr) 1991-02-19 1991-02-19 Sous-systeme de stockage de donnees

Publications (1)

Publication Number Publication Date
WO1992015058A1 true WO1992015058A1 (fr) 1992-09-03

Family

ID=10688042

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB1991/000254 WO1992015058A1 (fr) 1991-02-19 1991-02-19 Sous-systeme de stockage de donnees

Country Status (3)

Country Link
EP (1) EP0524945A1 (fr)
JP (1) JPH0743687B2 (fr)
WO (1) WO1992015058A1 (fr)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0786719A3 (fr) * 1996-01-23 2006-06-07 Sony Corporation Pluralité de réseaux d'unités de disques, méthode d'enregistrement/reproduction de données et format de données
US7373436B2 (en) 2001-09-10 2008-05-13 Hitachi, Ltd. Storage control device and method for management of storage control device
US9575801B2 (en) 2009-12-18 2017-02-21 Seagate Technology Llc Advanced processing data storage device

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4454595A (en) * 1981-12-23 1984-06-12 Pitney Bowes Inc. Buffer for use with a fixed disk controller

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH01126749A (ja) * 1987-11-12 1989-05-18 Mitsubishi Electric Corp 周辺機器データ制御装置

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4454595A (en) * 1981-12-23 1984-06-12 Pitney Bowes Inc. Buffer for use with a fixed disk controller

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
IBM TECHNICAL DISCLOSURE BULLETIN. vol. 20, no. 10, March 1978, NEW YORK US pages 3875 - 3876; H.L. PAGE: 'Multisector Diskette Operation' see the whole document *
IBM TECHNICAL DISCLOSURE BULLETIN. vol. 30, no. 10, March 1988, NEW YORK US pages 15 - 16; 'Storage System Interface Protocol Optimized for Low Cost' see the whole document *
WESCON TECHNICAL PAPERS. 30 October 1984, NORTH HOLLYWOOD US pages 1 - 5; S. BAL: 'I/O Bottleneck for Microcomputers is Real and Solutions are here' see page 3, right column, paragraph 2; figures 5,6 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0786719A3 (fr) * 1996-01-23 2006-06-07 Sony Corporation Pluralité de réseaux d'unités de disques, méthode d'enregistrement/reproduction de données et format de données
US7373436B2 (en) 2001-09-10 2008-05-13 Hitachi, Ltd. Storage control device and method for management of storage control device
US9575801B2 (en) 2009-12-18 2017-02-21 Seagate Technology Llc Advanced processing data storage device

Also Published As

Publication number Publication date
EP0524945A1 (fr) 1993-02-03
JPH0743687B2 (ja) 1995-05-15
JPH05502316A (ja) 1993-04-22

Similar Documents

Publication Publication Date Title
US5664145A (en) Apparatus and method for transferring data in a data storage subsystems wherein a multi-sector data transfer order is executed while a subsequent order is issued
US6065087A (en) Architecture for a high-performance network/bus multiplexer interconnecting a network and a bus that transport data using multiple protocols
US6401149B1 (en) Methods for context switching within a disk controller
US5315708A (en) Method and apparatus for transferring data through a staging memory
US6330626B1 (en) Systems and methods for a disk controller memory architecture
US6810440B2 (en) Method and apparatus for automatically transferring I/O blocks between a host system and a host adapter
US5555390A (en) Data storage method and subsystem including a device controller for respecifying an amended start address
US5386517A (en) Dual bus communication system connecting multiple processors to multiple I/O subsystems having a plurality of I/O devices with varying transfer speeds
US7761642B2 (en) Serial advanced technology attachment (SATA) and serial attached small computer system interface (SCSI) (SAS) bridging
US5301279A (en) Apparatus for conditioning priority arbitration
US6421760B1 (en) Disk array controller, and components thereof, for use with ATA disk drives
US4860244A (en) Buffer system for input/output portion of digital data processing system
EP0348654A2 (fr) Méthode et appareil pour augmenter le débit d'un système
US5613141A (en) Data storage subsystem having dedicated links connecting a host adapter, controller and direct access storage devices
KR100638378B1 (ko) 디스크 제어장치의 메모리 구조에 대한 시스템 및 방법
JP2001237868A (ja) ファイバチャネルノードにおいて効率の良いi/o操作を達成するための方法及びシステム
WO1992015058A1 (fr) Sous-systeme de stockage de donnees
WO1992015054A1 (fr) Transfert de donnees entre un sous-systeme de stockage de donnees et systeme central
US6233628B1 (en) System and method for transferring data using separate pipes for command and data
JPH06105425B2 (ja) データ記憶サブシステムとホスト・データ処理システム間のデータ転送方法
WO1991013397A1 (fr) Procede et appareil de transfert de donnees a travers une memoire de transfert
JPS6051751B2 (ja) 通信制御装置
JPH0343853A (ja) データ転送装置

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): JP US

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH DE DK ES FR GB GR IT LU NL SE

WWE Wipo information: entry into national phase

Ref document number: 1991904942

Country of ref document: EP

WWP Wipo information: published in national office

Ref document number: 1991904942

Country of ref document: EP

WWW Wipo information: withdrawn in national office

Ref document number: 1991904942

Country of ref document: EP