CN102446073A - Delaying acknowledgment of an operation until operation completion confirmed by local adapter read operation - Google Patents

Delaying acknowledgment of an operation until operation completion confirmed by local adapter read operation Download PDF

Info

Publication number
CN102446073A
CN102446073A CN201110252252XA CN201110252252A CN102446073A CN 102446073 A CN102446073 A CN 102446073A CN 201110252252X A CN201110252252X A CN 201110252252XA CN 201110252252 A CN201110252252 A CN 201110252252A CN 102446073 A CN102446073 A CN 102446073A
Authority
CN
China
Prior art keywords
adapter
confirm
storer
carry out
receiving
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201110252252XA
Other languages
Chinese (zh)
Other versions
CN102446073B (en
Inventor
D·克拉德多克
T·A·格里格
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Publication of CN102446073A publication Critical patent/CN102446073A/en
Application granted granted Critical
Publication of CN102446073B publication Critical patent/CN102446073B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/20Handling requests for interconnection or transfer for access to input/output bus
    • G06F13/28Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access DMA, cycle steal

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer And Data Communications (AREA)
  • Bus Control (AREA)

Abstract

A request to perform an operation, such as a remote direct memory access (RDMA) write operation or a send operation that writes to memory, is sent from a sending input/output (I/O) adapter (e.g., an RDMA-capable adapter) to a receiving I/O adapter. The receiving I/O adapter receives the request and initiates performance of the operation, but delays sending an acknowledgment for the operation. The acknowledgment is delayed until the operation is complete (i.e., until the memory is updated and the data is visible to the remote processor), as determined by a read operation initiated and performed by the receiving I/O adapter transparent to the sending I/O adapter.

Description

The method and system that delay is accomplished until operation the affirmation of operation
Technical field
The present invention relates generally to the processing in the computing environment, specifically, relate to help and carry out and the related processing of I/O (I/O) adapter.
Background technology
The I/O adapter (such as, can carry out the adapter of long-range direct memory access (DMA) (RDMA)) communicate with one another to carry out specific operation.In an example, send RDMA and enable adapter and be transmitted to long-range reception RDMA to the RDMA write operation and enable adapter.In response to receiving the RDMA write operation, receive the RDMA adapter and confirm this write operation.Yet this affirmation only guarantees to receive this request at remote adapter.It does not guarantee that remote adapter accomplished memory write operation, does not guarantee that the data of being write are visible for teleprocessing unit yet.
Use application that the RDMA write operation communicates often need be before carrying out some other operation authentication data available in remote memory.Therefore, the RDMA adapter is transmitted to remote adapter to the RDMA read operation so that the execute store read operation comes authentication data available storer (visible for teleprocessing unit) from sending.If data can be used, then remote adapter is transmitted to the transmission adapter to another affirmation (RDMA read data).Long-range initiation be used for authentication data can with the requirement of reading and sending another affirmation increased the stand-by period between the adapter.
Summary of the invention
Through being provided for helping the computer program of the processing in the computing environment, overcoming the shortcoming of prior art and additional advantage is provided.Computer program comprises that treatment circuit is readable and stores the storage medium of the instruction of being carried out in order to carry out a kind of method by treatment circuit.This method for example comprises: the reception adapters in computing environment receives the operation that will be received the adapter execution by this from the transmission adapter of computing environment; Carry out said operation by receiving adapter; Confirm by receiving adapter whether said operation is accomplished, wherein said confirm to comprise by receiving adapter carry out read operation and accomplish to confirm said operation, said read operation is initiated in this locality by receiving adapter; And in response to confirming that through said read operation said operation accomplishes, send and confirm to sending adapter.
On the other hand, provide a kind of help to carry out the computer program of the processing in the computing environment.This computer program comprises that treatment circuit is readable and stores the storage medium of the instruction of being carried out in order to carry out a kind of method by treatment circuit.This method for example comprises: the reception adapters of computing environment receive from the transmission adapter of computing environment to receive first operation that adapter is carried out by this, first operation is confirmed type association with first; Carry out first operation by receiving adapter; Confirm type in response to first of the non-delayed acknowledgement of indication, send the confirmation of receipt that indication receives first operation to sending adapter, the completion of first operation is not indicated in said confirmation of receipt; Receive second operation that will be carried out by said reception adapter from said transmission adapter in said reception adapters, second operation is confirmed type association with second; Carry out second operation by receiving adapter; Confirm type in response to second of indication lag affirmation; Confirm by receiving adapter whether second operation is accomplished; Wherein said confirm to comprise by receiving adapter carry out read operation and accomplish to confirm second operation, said read operation is initiated in this locality by receiving adapter; And in response to confirming that through said read operation second operation accomplishes, send to accomplish and confirm to sending adapter.
Also describe in this article and require to protect and the relevant method and system in one or more aspects of the present invention.In addition, the service relevant with one or more aspects of the present invention and can require to protect said service in this article also described.
Additional features and advantage have been realized through technology of the present invention.Describe other embodiments of the invention and aspect in this article in detail and be regarded as the part of claimed invention to them.
Description of drawings
The conclusion part of instructions in claim, point out particularly as an example and require clearly protection of the present invention one or more aspect.According to the detailed description of carrying out below in conjunction with accompanying drawing, aforementioned and other purpose, feature and advantage of the present invention become clear, in the accompanying drawings:
Fig. 1 describes an example that comprises and use the Distributed Computer System of one or more aspects of the present invention;
An embodiment of the further details of the host channel adapter (HCA) of Fig. 2 description Fig. 1 according to an aspect of the present invention;
Fig. 3 describes an example of work of treatment request according to an aspect of the present invention;
Fig. 4 describes an embodiment of the part of Distributed Computer System according to an aspect of the present invention, and formation provides reliable Connection Service to the communication that is used between distributed process in this Distributed Computer System;
Fig. 5 describes an example of the layered communication framework that in Distributed Computer System, uses according to an aspect of the present invention;
Fig. 6 A is described in the example that before related with transmission or the memory write operation affirmation in one or more aspect of the present invention is handled;
Fig. 6 B describes an embodiment of the delayed acknowledgement processing of one or more aspects of the present invention;
Fig. 7 describes an embodiment of the logic related with delayed acknowledgement according to an aspect of the present invention; And
Fig. 8 describes an embodiment of the computer program that comprises one or more aspects of the present invention.
Embodiment
According to an aspect of the present invention; To by (for example receiving adapter; Can carry out the adapter of RDMA) transmission of the affirmation of the request that receives by delay until by receiving after read operation indication institute's requested operation (for example, RDMA writes or transmit operation) that adapter initiates accomplishes.Affirmation to operation is postponed to accomplish until said operation; That is the read operation instruction memory of, initiating until this locality is updated (for example, data are visible for teleprocessing unit).This be with respect to receive adapter when receiving request with regard to the RDMA technology before this request is confirmed.That is to say that this affirmation only guarantees to receive the operation that will carry out at remote adapter.It does not guarantee that this operation has been accomplished or data are visible for teleprocessing unit.Whether through postponing said affirmation until carrying out institute's requested operation, sending adapter does not need demand operating to accomplish, and has therefore eliminated the request of sending adapter and has confirmed with corresponding.This has improved the stand-by period between the adapter.
In one embodiment, the adapter that relates in the processing is the adapter that can carry out long-range direct memory access (DMA) (RDMA).The adapter that can carry out RDMA can be the RDMA channel adapter (such as, press InfiniBand TMThe RDMA channel adapter of framework normalized definition) or RDMA NIC (RNIC) (such as, press the RNIC of the iWARP definition of RDMA alliance).At " InfiniBand TMArchitecture Specification " describe InfiniBand in detail in (Volume 1, Release 1.2.1, in November, 2007) TMAn embodiment of framework standard is contained in this with its full content by reference.Though mentioned RDMA and the adapter that can carry out RDMA, it should be appreciated by those skilled in the art that one or more aspect of the present invention is not limited to use the operation of RDMA.
An embodiment who comprises and use the computing environment of one or more aspects of the present invention is described with reference to Fig. 1.Fig. 1 describes an example of the Distributed Computer System 100 of using system territory net (SAN) structure 116.Provide Distributed Computer System 100 and/or SAN structure 116 only to be used for illustrative purpose.One or more embodiment of the present invention can be implemented on the computer system of many other types and structure.For example, realize one or more embodiment computer system can for from small server with a processor and several I/O (I/O) adapter to large-scale parallel supercomputer system with hundreds of or several thousand processors and several thousand I/O adapters.
With reference to Fig. 1, SAN structure 116 is to be used to the interconnect high bandwidth of the interior node of Distributed Computer System, the network of low latency.Node is to be attached to one or more links of network and to form the starting point of message in network and/or any assembly of destination.In the example that Fig. 1 describes, Distributed Computer System 100 comprises the node of following form: host processor node 102, host processor node 104, Redundant Array of Independent Disks subsystem node 106 and I/O frame (chassis) node 108.Node shown in Fig. 1 only is used for illustrative purpose, the independent processor nodes of any amount and any kind, I/O adapter node and I/O device node because SAN structure 116 can interconnect.
Any one node can be used as end node, and end node is defined in the device of initiating or consume at last message or bag in the SAN structure 116 at this paper.In one embodiment, exist the reliable connection and/or the authentic data that allow between the end node to report the fault processing mechanism of communicating by letter.
The employed message of this paper is the unit of the exchanges data of application definition, is the base unit of the communication between the process of cooperating.Bag is a unit by the data of networking protocol head and/or afterbody encapsulation.Head is provided for control and the routing iinformation of direct packets through SAN structure 116 usually.Afterbody comprises control and the Cyclic Redundancy Check data that do not transmit the bag with damaged content in order to check usually.
The Distributed Computer System of describing among Fig. 1 100 is included in and supports the two communicate by letter and management infrastructure of I/O communication and inter-processor communication (IPC) in the Distributed Computer System.Distributed Computer System 100 comprises for example switched communication structure 116, and switched communication structure 116 allows many equipment in the environment of safety, telemanagement, to transmit data simultaneously with high bandwidth and low latency.A plurality of paths through SAN structure 116 can communicated by letter and use to end node on a plurality of ports.A plurality of ports and the path through SAN structure 116 can be used in fault-tolerant with increase band data and transmit.
In an example, SAN structure 116 comprises three switches 112,114 and 146 and router one 17.Switch is to link together a plurality of links and allow to use microcephaly portion this identifiers of destination (DLID) field in subnet, to route to the device of another link to bag from a link.Router is to link together a plurality of subnets and can use enlarged head destination global unique identification symbol (DGUID) to route to the link of bag from first subnet in the device of another link in second subnet.
In one embodiment, link be any two network structure elements (such as, end node, switch or router) between full-duplex channel.The exemplary link that is fit to includes but not limited to the P.e.c. copper cash on copper cable, optical cable and base plate and the printed circuit board (PCB).
For the reliability services type, end node (such as, host-processor end node and I/O adapter end node) produce request package and return the affirmation bag.Switch and router transmit bag from the source to the destination.Except the variable CRC trailer field that each grade in network is updated, switch transmission bag and not making amendment.Router new variables CRC trailer field and revise other field in the head more when encapsulating route.
In the exemplary distributed computer system shown in Fig. 1 100, host processor node 102, host processor node 104 and I/O frame 108 comprise that at least one that be used to dock SAN structure 116 can carry out the channel adapter (CA) of RDMA.In one or more embodiments, each CA is the source bag of transmission on specifically realization and the SAN structure 116 or the end points of the CA interface that place (sink) wraps.Host processor node 102 comprises the CA of the form that for example has the host channel adapter (HCA) 118 that can carry out RDMA and 120.Host processor node 104 comprises for example HCA 122 and 124.Host processor node 102 also comprises CPU 126-130 and the storer 132 through bus system 134 interconnection.Host processor node 104 comprises CPU 136-140 and the storer 142 through bus system 144 interconnection similarly.In host processor node 102, storer 132 is couple to HCA 118 and 120 via for example Peripheral Component Interconnect (PCI) with the mode that can communicate by letter; In host processor node 104, storer 142 is couple to HCA 122 and 124 via for example PCI interconnection with the mode that can communicate by letter similarly.HCA 118 and 120 provides from host processor node 102 to switch 112 be connected; And HCA 122 and 124 provide from host processor node 104 to switch 112 with 114 be connected.
In one or more embodiments, HCA is embodied as hardware.In this implementation, HCA hardware has reduced a lot of CPU I/O adapter communication overheads.This hardware implementation mode of HCA is also at a plurality of concurrent communications that allow under the situation of not related with communication protocol traditional overhead on the exchange network.In one embodiment; HCA among Fig. 1 and SAN structure 116 transmit for the I/O of Distributed Computer System and IPC consumption side provide zero processor copies data under the situation that does not relate to the operating system nucleus process, and adopt hardware that reliable, fault-tolerant communication is provided.
As shown in fig. 1, router one 17 is couple to wide area network (WAN) and/or the Local Area Network that is connected with other main frame or other router.In addition, I/O frame 108 comprises I/O switch 146 and a plurality of I/O module 148-156.In these examples, I/O module 148-156 adopts the form of adapter card.Exemplary adapter card comprises: the scsi adapter card that is used for I/O module 148; The ethernet adapter card that is used for I/O module 150; Be used for the fibre channel hub of I/O module 152 and the adapter card of fibre channel arbitrated loop (FC-AL) device; The EGA card that is used for I/O module 154; With the video adapter card that is used for I/O module 156.Can realize the adapter card of any known type.The I/O adapter also comprises the switch that is used for being couple to adapter card SAN structure 116 in this I/O adapter.These modules comprise target channel adapter (TCA) 158-166 that can carry out RDMA.
In the example that Fig. 1 describes, RAID subsystem node 106 comprises processor 168, storer 170, TCA 172 and a plurality of redundancy and/or band memory disc unit 174.TCA172 can be global function HCA.
SAN structure 116 is handled the data communication that is used for I/O and inter-processor communication.SAN structure 116 is supported required high bandwidth and the scalability of I/O, and supports required utmost point low latency and the low CPU expense of inter-processor communication.User's client computer can workaround system kernel process and directly the accesses network communication hardware (such as, HCA), this can realize efficient messaging protocol.SAN structure 116 is suitable for current computation model and is to be used for the member that the I/O of new model communicates by letter with computer cluster.In addition, in one embodiment, SAN structure 116 allows I/O adapter node to communicate with one another or communicates by letter with any processor node in the Distributed Computer System.For the I/O adapter that is attached to SAN structure 116, the I/O adapter node that is obtained have with Distributed Computer System 100 in the substantially the same communication capacity of any host processor node.
In one or more embodiments, SAN structure 116 supports that passage is semantic and storer is semantic.The passage semanteme is sometimes referred to as transmission/reception or pushes traffic operation.The passage semanteme is the type of the communication in the conventional I/O passage of the final destination of source apparatus propelling data and destination device specified data, adopted.In the passage semanteme, the COM1 of the bag named place of destination process of sending from originating process is not write bag where but do not specify in the storage space of destination process.Therefore, in the passage semanteme, the position of the data of transmission is placed in destination process predistribution.
In the storer semanteme, originating process directly reads or writes the virtual address space of remote node destination process.Long-range destination process only need transmit the position of the buffer zone that is used for data and need not to relate to the transmission of any data.Therefore, in the storer semanteme, originating process sends the packet of the buffer-stored address, destination that comprises the destination process.In the storer semanteme, the destination process is in advance to the permission of its storer of originating process granted.
Passage semanteme and storer semanteme all are used for I/O and inter-processor communication usually.Passage and the semantic combination of storer are adopted in typical I/O operation.For example, host processor node (such as host processor node 102) is initiated the I/O operation through using the passage semanteme to dish I/O adapter (such as RAID subsystem TCA 172) transmitting panel write order.Coiling I/O adapter check should order and the semantic direct storage space reading of data buffer zone from host processor node of use storer.After the reading of data buffer zone, dish I/O adapter adopts the passage semanteme to accomplish the message propelling movement to I/O and is back to host processor node.
In one or more embodiments, the Distributed Computer System that shows among Fig. 1 is carried out and is adopted virtual address and virtual memory protection mechanism to guarantee the correct of all storeies and the suitably operation of visit.The application that in this Distributed Computer System, moves need not be any physical addressing of manipulating.
Referring now to Fig. 2, the further details about host channel adapter is described.In an example, host channel adapter (HCA) 200 comprises that a set of queues that is used for being sent to HCA port 212-216 to message is to (QP) 202-210.Guide the buffering to the data that arrive HCA port 212-216 through Virtual Path (VL) 218-234, wherein each VL has its current control.Subnet manager utilizes local address (that is this identifiers of port (LID)) the structure channel adapter of each physical port.Subnet manager agency (SMA) the 236th, the entity of communicating by letter with subnet manager in order to construct channel adapter.Storer is changed and protection (MTP) the 238th, becomes virtual address translation the mechanism of physical address and authentication-access authority.Direct memory access (DMA) (DMA) 240 uses storer 242 that dma operation is provided to QP 202-210.
Single channel adapter (such as the HCA that shows among Fig. 2 200) can be supported several thousand QP.By contrast, the TCA in the I/O adapter supports the QP of the quantity of much less usually.Each QP comprises for example two work queues: transmit queue (SQ) and reception formation (RQ).SQ is used for sendaisle and the semantic message of storer.The semantic message of RQ receiving cable.The special-purpose DLL of the operating system that is referred to herein as " verbs interface " calls to be placed on work request (WR) in the work queue in consumption side.
Referring now to Fig. 3, the further details about the processing of work request is described.In the example of Fig. 3; Receive formation (RQ) 300, transmit queue (SQ) 302 and accomplish formation (CQ) 304 and be present in and be used to handle from the request of consumption side 306 in the storer and (for example the request of the side of consumption 306; The process of in the CPU that is couple to the HCA related with these formations, carrying out is such as consumer process).These requests from consumption side 306 finally are sent out to hardware 308 (it is couple to the hardware of another HCA and RQ, SQ and the CQ that hardware is couple to said another HCA).In this example, consumption side 306 produces also reception work completion 314 of work request 310 and 312.The work request that is placed in the work queue is called work queue element (WQE).
In an example, transmit queue 302 comprises WQE 322-328, and WQE 322-328 describes the data that will on SAN structure 116, transmit.Receive formation 300 and comprise WQE316-320, WQE 316-320 is described in where place the passage semantic data that gets into from SAN structure 116.Handle WQE by the hardware among the HCA 308.Through each QP of QP context management, the QP context is one group of information about specific QP, such as current WQE, bag sequence number, transmission parameter etc.
The Verbs interface also is provided for fetching the mechanism of the work of completion from accomplishing formation 304.As shown in Figure 3, accomplish formation 304 and comprise completion queue element (QE) (CQE) 330-336.CQE comprises the information about the WQE of previous completion.CQ 304 is used to the completion notice that a plurality of QP produce single-point.CQE comprises the enough information of the specific WQE that is used for definite QP and completion.The CQ context is the one group of information that comprises required pointer, length and out of Memory of each CQ of management.
The exemplary operation request of being supported by SQ 302 comprises following each item: send work request, this request is the passage semantic operation, is pushed to the data segment by the reception WQE reference of remote node to one group of local data section.For example, WQE 328 comprises the reference to data segment 4338, data segment 5340 and data segment 6342.Each data segment that sends work request comprises virtual continuous storage space.Be used for being positioned at the address context of the process that produces local QP with reference to the virtual address of local data section.Can in SQ WQE, the operation of other type of appointment be that RDMA writes the read operation with RDMA.These are storer semantic operations.
In one embodiment, RQ 300 supports one type WQE, and it is called as reception WQE.Receive the passage semantic operation that WQE provides a description the local storage space of the transmission message that writes entering.Receive WQE and comprise the diffusing basic sequence (scatter list) of describing several virtual continuous storage spaces.The transmission message that gets into is write these storage spaces.Virtual address is arranged in the address context of the process that produces local QP.
For inter-processor communication, data are transmitted through QP in the user model software process position that directly buffer zone belongs to from storer.In one or more embodiments, the transmission workaround system through QP also consumes the host command cycle seldom.QP allows zero processor copies data to transmit and need not operating system nucleus to participate in.Zero processor copies data is transmitted as high bandwidth, low latency communication provides efficiently and supported.
When producing QP, QP is set to provide the transmission service of selecting type.As an example, support four types transmission service: reliable connection, unreliable connection, authentic data newspaper and corrupt data newspaper.
In Fig. 4, usually show a part that adopts the Distributed Computer System that reliable Connection Service communicates by letter between distributed process.In an example, Distributed Computer System 400 comprises host processor node 1, host processor node 2 and host processor node 3.Host processor node 1 comprises process A 410.Host processor node 3 comprises process C 420 and process D 430.Host processor node 2 comprises process E 440.
Host processor node 1 comprises QP 4,6 and 7, and each among the QP 4,6 and 7 has transmit queue and receives formation.Host processor node 2 has QP 9 and host processor node 3 has QP 2 and 5.The reliable Connection Service of Distributed Computer System 400 associates local QP and a long-range QP through being configured to the local QP context according to port and the long-range QP of QP numbering identification.Therefore, QP 4 is used for communicating by letter with QP 2; QP 7 is used for communicating by letter with QP 5; And QP 6 is used for communicating by letter with QP 9.
In reliable Connection Service, being placed on a WQE on the QP is write in the reception storage space by the reception WQE reference of the QP that connects data.The RDMA operation element is in the address space of the QP that connects.
In one or more embodiments, reliable Connection Service is become reliably, because hardware keeps sequence number and confirms that all bags transmit.Communicating by letter of any failure of combination retry of hardware and SAN structure 116 driver software.Even there is error code in the process client computer of QP, is receiving under the situation of underload and network congestion and also obtain reliable communication.If in SAN structure 116, there is alternative path, even then under the situation of the fault that has devices exchange machine, link or channel adapter port, also can keep reliable communication.
In Fig. 5, usually show the example of the layered communication framework 500 that is used for Distributed Computer System.Layer architecture figure has shown the data transmitted between each layer and each layer of data communication path and the tissue of control information.
HCA end node protocol layer (for example, being adopted by end node 511) comprises upper-layer protocol 502, transport layer 504, network layer 506, link layer 508 and Physical layer 510 by 503 definition of consumption side.Exchanger layer (for example, being adopted by switch 513) comprises link layer 508 and Physical layer 510.Router layer (for example, being adopted by router five 15) comprises network layer 506, link layer 508 and Physical layer 510.
Layer architecture 500 is generally followed the summary of classical communication stack.About the protocol layer of end node 511, for example, upper-layer protocol 502 adopts the verbs interface to produce the message of transport layer 504.Network layer 506 is carried out route to bag between network subnet 516.Link layer 508 carries out route to bag in network subnet 518.Physical layer 510 sends to bit or bit groups the Physical layer of other device.Each layer do not know how upper strata or lower floor carry out their function.
Consumption side 503 and 505 representatives utilize application or the process of other layer between end node, to communicate by letter.Transport layer 504 provides end-to-end message to move.As stated, transport layer provides four types transmission service, for example comprises: reliable Connection Service, the service of authentic data newspaper, the service of corrupt data newspaper and unreliable Connection Service.Network layer 506 is carried out the bag route that arrives the destination end node via a sub-net or a plurality of subnet.Link layer 508 is carried out on link and has been carried out current control, error detection and confirmed that the bag of priority ranking transmits.The bit transfer that Physical layer 510 execution techniques are relevant.Between Physical layer, transmit bit or bit groups via link 522,524 and 526.Link can utilize P.e.c. copper cash, copper cable, optical cable or utilize other link that is fit to realize.
As stated, adapter (such as host channel adapter) communicates with one another to carry out specific operation, comprises that long-range direct memory access (DMA) writes or send.As the part of this communication, adopt from of the affirmation of an adapter to another adapter.With reference to Fig. 6 A and 6B the further details about the execution that comprises the operation that said affirmation is provided is described.Particularly, Fig. 6 A describes the embodiment that the processing of (non-delayed acknowledgement) is confirmed in the employing that does not have one or more aspects of the present invention; Fig. 6 B describes the use of delayed acknowledgement according to an aspect of the present invention.
At first with reference to Fig. 6 A, the process requested executable operations of execution is such as memory write operation, the same as transmit operation or RDMA write operation on processor 600.This request is placed in the work queue element (WQE), and this WQE is placed on the transmit queue that sends in the addressable storer of adapter.Send the corresponding data 612 that converged network adapter (CNA) 602 (such as the host channel adapters that can carry out RDMA) are fetched WQE 610 and indicated by the pointer among this WQE, and set up the bag that will on link and structure 604, send to long-range reception CNA 606 (can carry out the host channel adapter of RDMA) such as another.Remote adapter is carried out the write operation to storer.(for transmit operation, this memory of data position is write in indication in receiving WQE).Write operation to storer is the memory write request (that is to say, do not have the response that DMA is write) that proposes under the situation of PCI.In addition, remote adapter is confirming that 614 send it back the transmission adapter.This confirms that the indication remote adapter receives this request and data, but designation data is not present in the remote memory.In response to receiving this affirmation, send adapter and produce the CQE that is placed in the completion formation.According to this CQE, accomplish (that is, received at the remote adapter place said write or send) to this applicative notifications work.
Subsequently; If this application hopes that whether available specified data (promptly for teleprocessing unit; Whether be stored in the storer), then to the RDMA read operation 616 of efficient memory position (one of position that for example, writes through write operation) with specified data there whether it send.Remote adapter is carried out by the RDMA read operation of sending the adapter request.But when the data time spent, remote adapter is to another affirmation (that is RDMA read data) 618 of the completion of sending adapter transmission indication read operation.Confirm that in response to receiving to be somebody's turn to do the transmission adapter produces the CQE that is placed in the completion formation once more and carries out to be received adapter to this applicative notifications by the long-range read request of sending the adapter generation.
As stated, confirm and by sending that adapter is initiated and being used to indicate institute's requested operation (for example, memory write, read) to be performed for two by the long-range read operation that remote adapter is carried out.Confirm and the use of the read request of long-range initiation has increased the stand-by period between the adapter that is used to carry out institute's requested operation for two.Therefore, according to an aspect of the present invention, delayed acknowledgement technology is provided, wherein remote adapter does not confirm that institute's requested operation is until institute's requested operation be done (for example, data are stored in the storer and are visible for processor).An embodiment of the processing related with delayed acknowledgement is described with reference to Fig. 6 B.
With reference to Fig. 6 B, the same with the non-delayed acknowledgement of Fig. 6 A, the process requested executable operations of on processor 600, carrying out is such as transmit operation or RDMA write operation.This request is placed in the work queue element (WQE) 610, and this WQE 610 is placed on the transmit queue.Send the corresponding data 612 that adapter 602 is fetched WQE 610 and indicated by the pointer among this WQE, and set up the bag that will on link and structure 604, send to remote adapter 606.Remote adapter is carried out the write operation to storer as the memory write request that proposes, and does not confirm but do not send at this moment.Instead, remote adapter is carried out the read operation of being initiated in this locality by this remote adapter 630.This read operation be not by send the adapter request and be transparent for sending adapter.This read operation is the last byte of for example RDMA write operation or the DMA read operation of cache line.Normal PCI ordering rule should be accomplished all at preceding DMA write operation before being defined in and returning the DMA read data.Write storer and accomplish for the visible DMA read operation of teleprocessing unit in response to designation data, remote adapter sends and confirms 632 to sending adapter.This affirm warranty that produces in response to the completion of DMA read operation data in storer; Because (for example by the interconnection between adapter and the storer; PCI) the ordering rule regulation of following: for read operation completes successfully, all data should be stored in the storer.
Through when writing or transmit operation is only sent one when accomplishing and confirmed and through avoiding sending the request of request read operation from sending adapter to remote adapter, the stand-by period reduces, and has therefore improved system performance.
The further details of handling about delayed acknowledgement is described with reference to Fig. 7.This is handled by receiving adapter and carries out.At first, in step 700, receive adapter receive from the execution of sending adapter data write storer operation (such as, RDMA write operation or transmit operation) request.In step 702,, receive adapter and begin this operation in response to this request.For example, it begins to write storer to data.In an example, this write operation is the memory write request that proposes.
In this embodiment, data are write storer through the PCI interconnection.Yet PCI does not provide the response to the DMA write operation.In addition, the framework of adapter (for example, InfiniBand TM) regulation: if should request and send and confirm in response to receiving, then this affirmation only confirms that adapter receives this request and data and do not guarantee that data are stored in the storer.In this, receiving adapter can send affirmation or can not send affirmation.
According to an aspect of the present invention, in inquiry step 704, determine whether to have indicated delay to handle.As an example, can carry out that this is definite through the inspection formation related to the indicator in the context or the indicator from the bag of transmission WQE generation with this request.If indicated and postpone to have handled, then send the ignorant reception adapter of adapter and carry out local read operation and whether be stored in the storer with specified data.This read operation is for example to by the DMA read operation that receives the last memory location that adapter writes.In another example, the DMA read operation can be directed against another efficient memory position (that is the memory location that, had before write).
, then in step 710, send to the transmission adapter to affirmation if thereby read operation success designation data is available in storer in inquiry step 708.This confirms therefore to be postponed after can using through the read operation specified data.
Turn back to step inquiry 708,, then receive pending data such as adapter and become available if data are current unavailable.
Turn back to inquiry step 704, if confirm not indication lag handle (that is, hereto particular queue as far as or the unavailable or not enabled of request), send in step 712 then that indication is write or the affirmation of the reception of transmit operation.This affirmation is postponed and is not guaranteed that data are through writing or storer is write in transmit operation.This finishes processing.
It should be appreciated by those skilled in the art that each side of the present invention can be implemented as system, method or computer program.Therefore, each side of the present invention can adopt the form of the embodiment of devices at full hardware embodiment, full software implementation example (comprising firmware, resident software, microcode etc.) or the integration software and the hardware aspect that all can be referred to herein as " circuit ", " module " or " system " usually.In addition, each side of the present invention can adopt the form that is implemented in the computer program in the one or more computer-readable mediums with computer readable program code.
Can use any combination of one or more computer-readable mediums.Computer-readable medium can be computer-readable signal media or computer-readable recording medium.The computer-readable signal media can comprise in the base band for example or as the realization of the part of carrier wave the propagation data signal of computer readable program code.This transmitting signal can adopt various ways, includes but not limited to electromagnetic signal, optical signalling or their any appropriate combination.The computer-readable signal media can be not to be computer-readable recording medium and any computer-readable medium that can transmit, propagate or transmit the program of being used by instruction execution system, equipment or device use or combined command executive system, equipment or device.
Computer-readable recording medium for example can be any appropriate combination of (but being not limited to) electronics, magnetic, optics, electromagnetism, infrared or semiconductor system, equipment or device or aforementioned each item.The example more specifically of computer-readable recording medium (tabulation of non exhaustive property) comprises following each item: any appropriate combination with electrical connection, portable computer diskette, hard disk, random-access memory (ram), ROM (read-only memory) (ROM), Erasable Programmable Read Only Memory EPROM (EPROM or flash memory), optical fiber, portable compact disk ROM (read-only memory) (CD-ROM), optical storage or magnetic memory apparatus or aforementioned each item of one or more line.In the context of this paper, computer-readable recording medium can be any tangible medium that can comprise or store the program of being used by instruction execution system, equipment or device use or combined command executive system, equipment or device.
Referring now to Fig. 8; In an example; Computer program 800 comprises for example one or more computer-readable recording mediums 802, stores computer-readable program code means on the computer-readable recording medium 802 to provide and to promote one or more aspect of the present invention.In one embodiment, this storage medium is tangible and non-instantaneous.In an example, this storage medium is a memory storage.
Can use suitable medium (including but not limited to any appropriate combination of wireless, wired, fiber optic cable, RF etc. or aforementioned each item), transmit the program code of realizing on the computer-readable medium.
Can write the computer program code of the operation that is used to carry out each side of the present invention according to any combination of one or more programming languages, said programming language comprises: object oriented programming languages, such as Java, Smalltalk, C++ etc.; With the process programming language of routine, such as " C " programming language, assembly language or similar programming language.Program code can be fully carry out on the user's computer, part carry out on the user's computer, as the stand alone software bag carry out, part on the user's computer and part carrying out on the remote computer, or on remote computer or server, carrying out fully.In one situation of back, remote computer can be connected to user's computer through the network (comprising Local Area Network or wide area network (WAN)) of any kind, perhaps can (for example, the internet usage service provider passes through the internet) be connected to outer computer.
Here reference is described each side of the present invention according to the process flow diagram and/or the block diagram of method, equipment (system) and the computer program of the embodiment of the invention.Should be appreciated that each frame in process flow diagram and/or the block diagram and the combination of the frame in process flow diagram and/or the block diagram can be realized by computer program instructions.These computer program instructions can be provided for the processor of multi-purpose computer, special purpose computer or other programmable data processing device; Producing a machine, thereby instruction (said instruction is carried out via the processor of computing machine or other programmable data processing device) produces the device of function/action that one or more frames of being used for realization flow figure and/or block diagram stipulate.
These computer program instructions also can be stored in the computer-readable medium; This computer-readable medium can instruct computer, other programmable data processing device or other device play a role according to specific mode; Thereby the instruction that is stored in the computer-readable medium produces a kind of goods, and these goods comprise the instruction of the function/action of stipulating in one or more frames of realization flow figure and/or block diagram.
Computer program instructions also can be loaded on computing machine, other programmable data processing device or other device; So that on said computing machine, other programmable device or other device, carry out the sequence of operations step producing computer implemented process, thereby make the process of function/action that the instruction of on this computing machine or other programmable device, carrying out is provided for stipulating in one or more frames of realization flow figure and/or block diagram.
Process flow diagram in the accompanying drawing and block diagram show framework, function and the operation of the possible implementation of system according to various embodiments of the present invention, method and computer program product.Aspect this, each frame in process flow diagram or the block diagram can be represented module, section or the part of the code that comprises the one or more executable instructions that are used to realize the logic function stipulated.Be also to be noted that in some other implementations the function that marks in the frame can be carried out not according to the order that marks in the accompanying drawing.For example, in fact, according to the function that relates to, two frames that illustrate continuously can be carried out basically concomitantly, and perhaps these frames sometimes can be carried out with opposite order.The combination that is also to be noted that each frame and the frame in block diagram and/or the process flow diagram in block diagram and/or the process flow diagram can realizing based on the system of specialized hardware or the combination of specialized hardware and computer instruction by function that puts rules into practice or action.
Except that above situation, one or more aspects of the present invention can be equipped with by the service provider of the management that client's environment is provided, provide, dispose, manage, service etc.For example, the service provider can produce, safeguard, support to carry out the computer code and/or the computer based Infrastructure of one or more aspects of the present invention for one or more clients.In return, for example, the service provider can charge to client according to subscription and/or paying contract.Additionally or instead, the service provider can charge according to being sold to ad content one or more third parties.
In one aspect of the invention, can dispose the application that is used to realize one or more aspects of the present invention.As an example, the deployment of application comprises provides the computer based Infrastructure that can be used for realizing one or more aspects of the present invention.
As another aspect of the present invention, can dispose computing basic facility, comprise being integrated in computer-readable code in the computing system that wherein code combines with computing system to realize one or more aspect of the present invention.
As another aspect of the present invention, can provide to comprise and be integrated in the process that is used for integrated computing basic facility in the computer system to computer-readable code.Computer system comprises computer-readable medium, and wherein computer media comprises one or more aspect of the present invention.Code combines computer system can realize one or more aspect of the present invention.
Though more than described various embodiment, these only are examples.For example, the computing environment of other framework can comprise and use one or more aspect of the present invention.In addition, one or more aspect of the present invention can relate to except that memory write and/or the operation sending.In addition, memory write needs not to be RDMA and writes, and/or adapter can be that the adapter outside the RDMA is carried out in decapacitation.In addition, the interconnection between adapter and the storer can be the interconnection except that PCI, includes but not limited to not provide other interconnection to the response of memory write.In addition, the framework of adapter can be to remove InfiniBand TMOutside framework.Also can adopt many other modification.
In addition, the computing environment of other type can be benefited from one or more aspect of the present invention.As an example; Environment (for example can comprise emulator; Software or other emulation mechanism); Wherein (for example, on local computer system) emulation certain architectures (for example comprise instruction is carried out, the function (such as address translation) of design and the register that designs) or its subclass with processor and storer.In this environment, can have and just be different from even carry out the computing machine of emulator by the framework of the ability of emulation, one or more copyings of emulator also can be realized one or more aspect of the present invention.As an example, in simulation model, align, and set up suitable copying to realize each instruction or operation by the specific instruction of emulation or operation decodes.
In simulated environment, host computer for example comprises: storer, storage instruction and data; The instruction fetching unit is fetched instruction and is provided locally buffered for the instruction of fetching alternatively from storer; Instruction decode unit, the type of the instruction that reception is fetched and definite instruction of having fetched; And instruction execution unit, carry out said instruction.Said execution can comprise: be loaded into data the register from storer; Return storer to data storage from register; Perhaps carry out the arithmetic or the logical operation of certain type of confirming by decoding unit.In an example, each unit is embodied as software.For example, the operation of being carried out by each unit is embodied as the one or more subroutines in the emulator software.
In addition, can use the data handling system that is fit to storage and/or executive routine code, this data handling system comprises at least one processor that directly or indirectly is couple to memory component through system bus.Memory component comprises the local storage, mass storage and the cache memory that the term of execution of program code actual, adopt of temporary transient storage that at least some program codes for example are provided, so as to reduce the term of execution must be from the number of times of mass storage retrieval coding.
I/O or I/O device (including but not limited to keyboard, display, fixed-point apparatus, DASD, tape, CD, DVD, thumb actuator and other storage medium etc.) can directly or be couple to system through middle I/O controller.Network adapter also can be couple to system and be couple to other data handling system or remote printer or memory storage so that data handling system can become through intermediate dedicated net or common network.Modulator-demodular unit, cable modem and Ethernet card only are the network adapter of some available types.
The term that this paper uses only is used to describe the purpose of specific embodiment, rather than will limit the present invention.The employed singulative of this paper " a " " an " and " the " also comprise plural form, only if the indication of clear from context ground is not like this.In addition; Should be appreciated that; When using in this manual; Term " comprises " and/or " comprising " specifies existence to explain characteristic, integer, step, operation, element and/or parts, but do not get rid of existence or increase one or more further features, integer, step, operation, element, parts and/or their combination.
Corresponding construction, material, action and the equivalent that all devices in the following claim or step add function element is intended to comprise that other element that is used to combine the specific requirement protection carries out any structure, material or the action of function.It is in order to explain and illustrative purpose that instructions of the present invention is provided, but not is exhaustive or the present invention is restricted to disclosed form.Under situation about not departing from the scope of the present invention with spirit, many modifications and modification are clearly for those of ordinary skills.Selecting and describing these embodiment is in order to explain principle of the present invention and practical application best, and makes other those of ordinary skills understand the present invention to the various embodiment of the various modified examples with the special-purpose that is fit to imagination.

Claims (18)

1. method that is used for helping carrying out the processing of computing environment, this method comprises:
Reception adapters in computing environment receives the operation that will be received the adapter execution by this from the transmission adapter of computing environment;
Carry out said operation by receiving adapter;
Confirm by receiving adapter whether said operation is accomplished, wherein said confirm to comprise by receiving adapter carry out read operation and accomplish to confirm said operation, said read operation is initiated in this locality by receiving adapter; And
In response to confirming that through said read operation said operation accomplishes, send and confirm to sending adapter.
2. the method for claim 1, the affirmation of wherein said operation are postponed after said read operation has confirmed that said operation is accomplished.
3. method as claimed in claim 2 is wherein avoided confirming before said operation is accomplished said operation to be confirmed in said read operation.
4. the method for claim 1, wherein said operation comprises long-range direct memory access (DMA) write operation or the transmit operation that storer is write.
5. the method for claim 1; Wherein said operation handlebar data are write storer; Said storer is couple to the reception adapter via helping to carry out the interconnection of said operation, and said interconnection has such structure: do not indicate the completion of said operation to sending adapter.
6. method as claimed in claim 5, wherein said interconnection are Peripheral Component Interconnect (PCI).
7. the method for claim 1; The one or more positions in the storer of being write by said operation are read in wherein said read operation; And wherein said operation is to carry out via coupling the interconnection that receives adapter and storer; Said interconnection has the structure that ordering rule is arranged, and said the reading of said ordering rule indication do not complete successfully after writing storer by said operation handlebar data.
8. the method for claim 1, wherein said method also comprises: by receiving the transmission whether adapter check will postpone said affirmation, and will postpone the transmission of said affirmation and carry out saidly to confirm in response to said inspection indication.
9. the method for claim 1, wherein said method also comprises: the completion queue element (QE) that produces the completion that is used to indicate said operation based on said affirmation.
10. the method for claim 1, wherein said read operation is transparent for sending adapter.
11. a computer system that is used for helping carrying out the processing of computing environment, this computer system comprises:
Receive adapter, be configured to carry out a kind of method, said method comprises:
Receiving the adapter reception from the operation that will receive the adapter execution of sending adapter by this;
Carry out said operation by receiving adapter;
Confirm by receiving adapter whether said operation is accomplished, wherein said confirm to comprise by receiving adapter carry out read operation and accomplish to confirm said operation, said read operation is initiated in this locality by receiving adapter; And
In response to confirming that through said read operation said operation accomplishes, send and confirm to sending adapter.
12. system as claimed in claim 11, the affirmation of wherein said operation is postponed after said read operation confirms that said operation is accomplished.
13. system as claimed in claim 12 wherein avoids confirming before said operation is accomplished said operation to be confirmed in said read operation.
14. system as claimed in claim 11, wherein said operation comprises long-range direct memory access (DMA) write operation or the transmit operation that storer is write.
15. system as claimed in claim 11; Wherein said operation handlebar data are write storer; Said storer is couple to the reception adapter via helping to carry out the interconnection of said operation, and said interconnection has such structure: do not indicate the completion of said operation to sending adapter.
16. system as claimed in claim 11; The one or more positions in the storer of being write by said operation are read in wherein said read operation; And wherein said operation is to carry out via coupling the interconnection that receives adapter and storer; Said interconnection has the structure that ordering rule is arranged, and said the reading of said ordering rule indication do not complete successfully after writing storer by said operation handlebar data.
17. system as claimed in claim 11, wherein said method also comprises: by receiving the transmission whether adapter check will postpone said affirmation, and will postpone the transmission of said affirmation and carry out saidly to confirm in response to said inspection indication.
18. the method for the processing in the computing environment is carried out in a help, this method comprises:
The reception adapters of computing environment receive from the transmission adapter of computing environment to receive first operation that adapter is carried out by this, first operation is confirmed type association with first;
Carry out first operation by receiving adapter;
Confirm type in response to first of the non-delayed acknowledgement of indication, send the confirmation of receipt that indication receives first operation to sending adapter, the completion of first operation is not indicated in said confirmation of receipt;
Receive second operation that will be carried out by said reception adapter from said transmission adapter in said reception adapters, second operation is confirmed type association with second;
Carry out second operation by receiving adapter;
Confirm type in response to second of indication lag affirmation; Confirm by receiving adapter whether second operation is accomplished; Wherein said confirm to comprise by receiving adapter carry out read operation and accomplish to confirm second operation, said read operation is initiated in this locality by receiving adapter; And
In response to confirming that through said read operation second operation accomplishes, send to accomplish and confirm to sending adapter.
CN201110252252.XA 2010-08-30 2011-08-30 Delaying acknowledgment method and system of an operation until operation completion confirmed by local adapter read operation Expired - Fee Related CN102446073B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US12/871,532 2010-08-30
US12/871,532 US8589603B2 (en) 2010-08-30 2010-08-30 Delaying acknowledgment of an operation until operation completion confirmed by local adapter read operation

Publications (2)

Publication Number Publication Date
CN102446073A true CN102446073A (en) 2012-05-09
CN102446073B CN102446073B (en) 2014-10-01

Family

ID=45698640

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201110252252.XA Expired - Fee Related CN102446073B (en) 2010-08-30 2011-08-30 Delaying acknowledgment method and system of an operation until operation completion confirmed by local adapter read operation

Country Status (3)

Country Link
US (1) US8589603B2 (en)
JP (1) JP5735883B2 (en)
CN (1) CN102446073B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018119738A1 (en) * 2016-12-28 2018-07-05 Intel Corporation Speculative read mechanism for distributed storage system
CN109426632A (en) * 2018-02-01 2019-03-05 新华三技术有限公司 Memory pool access method and device

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9191441B2 (en) * 2013-03-15 2015-11-17 International Business Machines Corporation Cell fabric hardware acceleration
JP6217192B2 (en) 2013-07-09 2017-10-25 富士通株式会社 Storage control device, control program, and control method
JP6395380B2 (en) * 2014-01-07 2018-09-26 キヤノン株式会社 Information processing apparatus, information processing method, and program
WO2015167505A2 (en) 2014-04-30 2015-11-05 Hewlett-Packard Development Company, L.P. Determining lengths of acknowledgment delays for i/o commands
US10055371B2 (en) 2014-11-03 2018-08-21 Intel Corporation Apparatus and method for RDMA with commit ACKs
US9792248B2 (en) * 2015-06-02 2017-10-17 Microsoft Technology Licensing, Llc Fast read/write between networked computers via RDMA-based RPC requests
US20170034267A1 (en) * 2015-07-31 2017-02-02 Netapp, Inc. Methods for transferring data in a storage cluster and devices thereof
US10063376B2 (en) 2015-10-01 2018-08-28 International Business Machines Corporation Access control and security for synchronous input/output links
US10009423B2 (en) 2015-10-01 2018-06-26 International Business Machines Corporation Synchronous input/output initialization exchange sequences
US10120818B2 (en) 2015-10-01 2018-11-06 International Business Machines Corporation Synchronous input/output command
US9898227B2 (en) 2016-04-27 2018-02-20 International Business Machines Corporation Synchronous input/output virtualization
US10229084B2 (en) 2016-06-23 2019-03-12 International Business Machines Corporation Synchronous input / output hardware acknowledgement of write completions
US10133691B2 (en) 2016-06-23 2018-11-20 International Business Machines Corporation Synchronous input/output (I/O) cache line padding
WO2022017628A1 (en) * 2020-07-24 2022-01-27 Huawei Technologies Co., Ltd. Devices, methods, and system for reducing latency in remote direct memory access system
US11321152B1 (en) 2021-07-08 2022-05-03 Cloudflare, Inc. Concurrency control in an asynchronous event-loop based program environment
US11622004B1 (en) * 2022-05-02 2023-04-04 Mellanox Technologies, Ltd. Transaction-based reliable transport

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050240941A1 (en) * 2004-04-21 2005-10-27 Hufferd John L Method, system, and program for executing data transfer requests
CN1771495A (en) * 2003-05-07 2006-05-10 国际商业机器公司 Distributed file serving architecture system
CN101459676A (en) * 2008-12-31 2009-06-17 中国科学院计算技术研究所 Message transmission frame and method based on high-speed network oriented to file system
US7558839B1 (en) * 2004-12-14 2009-07-07 Netapp, Inc. Read-after-write verification for improved write-once-read-many data storage

Family Cites Families (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4716523A (en) * 1985-06-14 1987-12-29 International Business Machines Corporation Multiple port integrated DMA and interrupt controller and arbitrator
US4858116A (en) * 1987-05-01 1989-08-15 Digital Equipment Corporation Method and apparatus for managing multiple lock indicators in a multiprocessor computer system
US5210829A (en) * 1990-12-12 1993-05-11 Digital Equipment Corporation Adjustable threshold for buffer management
US5435001A (en) * 1993-07-06 1995-07-18 Tandem Computers Incorporated Method of state determination in lock-stepped processors
US5544331A (en) * 1993-09-30 1996-08-06 Silicon Graphics, Inc. System and method for generating a read-modify-write operation
US5572687A (en) * 1994-04-22 1996-11-05 The University Of British Columbia Method and apparatus for priority arbitration among devices in a computer system
US6647450B1 (en) * 1999-10-06 2003-11-11 Cradle Technologies, Inc. Multiprocessor computer systems with command FIFO buffer at each target device
JP2001188748A (en) * 1999-12-27 2001-07-10 Matsushita Electric Ind Co Ltd Data transferring device
US7143410B1 (en) * 2000-03-31 2006-11-28 Intel Corporation Synchronization mechanism and method for synchronizing multiple threads with a single thread
US6842840B1 (en) * 2001-02-27 2005-01-11 Intel Corporation Controller which determines presence of memory in a node of a data network
US6917987B2 (en) * 2001-03-26 2005-07-12 Intel Corporation Methodology and mechanism for remote key validation for NGIO/InfiniBand™ applications
US6948004B2 (en) * 2001-03-28 2005-09-20 Intel Corporation Host-fabric adapter having work queue entry (WQE) ring hardware assist (HWA) mechanism
US7190667B2 (en) * 2001-04-26 2007-03-13 Intel Corporation Link level packet flow control mechanism
US7155537B1 (en) * 2001-09-27 2006-12-26 Lsi Logic Corporation Infiniband isolation bridge merged with architecture of an infiniband translation bridge
US6920510B2 (en) * 2002-06-05 2005-07-19 Lsi Logic Corporation Time sharing a single port memory among a plurality of ports
US7133943B2 (en) * 2003-02-26 2006-11-07 International Business Machines Corporation Method and apparatus for implementing receive queue for packet-based communications
US7895390B1 (en) * 2004-05-25 2011-02-22 Qlogic, Corporation Ensuring buffer availability
JP4421999B2 (en) * 2004-08-03 2010-02-24 株式会社日立製作所 Storage apparatus, storage system, and data migration method for executing data migration with WORM function
US7685335B2 (en) * 2005-02-25 2010-03-23 International Business Machines Corporation Virtualized fibre channel adapter for a multi-processor data processing system
US7934025B2 (en) * 2007-01-24 2011-04-26 Qualcomm Incorporated Content terminated DMA
JP5186779B2 (en) * 2007-03-01 2013-04-24 日本電気株式会社 Computer system, host computer

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1771495A (en) * 2003-05-07 2006-05-10 国际商业机器公司 Distributed file serving architecture system
US20050240941A1 (en) * 2004-04-21 2005-10-27 Hufferd John L Method, system, and program for executing data transfer requests
US7558839B1 (en) * 2004-12-14 2009-07-07 Netapp, Inc. Read-after-write verification for improved write-once-read-many data storage
CN101459676A (en) * 2008-12-31 2009-06-17 中国科学院计算技术研究所 Message transmission frame and method based on high-speed network oriented to file system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
INFINIBAND TRADE ASSOCIATION: "《Infiniband Architecture Specification Volume1 Release1.2.1》", 31 December 2007 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018119738A1 (en) * 2016-12-28 2018-07-05 Intel Corporation Speculative read mechanism for distributed storage system
CN109426632A (en) * 2018-02-01 2019-03-05 新华三技术有限公司 Memory pool access method and device
CN109426632B (en) * 2018-02-01 2021-09-21 新华三技术有限公司 Memory access method and device

Also Published As

Publication number Publication date
US20120054381A1 (en) 2012-03-01
US8589603B2 (en) 2013-11-19
CN102446073B (en) 2014-10-01
JP2012048712A (en) 2012-03-08
JP5735883B2 (en) 2015-06-17

Similar Documents

Publication Publication Date Title
CN102446073B (en) Delaying acknowledgment method and system of an operation until operation completion confirmed by local adapter read operation
KR100555394B1 (en) Methodology and mechanism for remote key validation for ngio/infiniband applications
US8341237B2 (en) Systems, methods and computer program products for automatically triggering operations on a queue pair
US9134913B2 (en) Methods and structure for improved processing of I/O requests in fast path circuits of a storage controller in a clustered storage system
TWI570563B (en) Posted interrupt architecture
JP4961481B2 (en) Bridging Serial Advanced Technology Attachment (SATA) and Serial Attached Small Computer System Interface (SCSI) (SAS)
US6831916B1 (en) Host-fabric adapter and method of connecting a host system to a channel-based switched fabric in a data network
US9684611B2 (en) Synchronous input/output using a low latency storage controller connection
US6775719B1 (en) Host-fabric adapter and method of connecting a host system to a channel-based switched fabric in a data network
CN106575206B (en) Memory write management in a computer system
US7181541B1 (en) Host-fabric adapter having hardware assist architecture and method of connecting a host system to a channel-based switched fabric in a data network
US9734031B2 (en) Synchronous input/output diagnostic controls
US10585821B2 (en) Synchronous input/output command
US9710172B2 (en) Synchronous input/output commands writing to multiple targets
US10068001B2 (en) Synchronous input/output replication of data in a persistent storage control unit
US20170371828A1 (en) Synchronous input / output hardware acknowledgement of write completions
CN100442256C (en) Method, system, and storage medium for providing queue pairs for I/O adapters
US10133691B2 (en) Synchronous input/output (I/O) cache line padding

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20141001

CF01 Termination of patent right due to non-payment of annual fee