CN100442256C - Method, system, and storage medium for providing queue pairs for I/O adapters - Google Patents
Method, system, and storage medium for providing queue pairs for I/O adapters Download PDFInfo
- Publication number
- CN100442256C CN100442256C CNB2005101246118A CN200510124611A CN100442256C CN 100442256 C CN100442256 C CN 100442256C CN B2005101246118 A CNB2005101246118 A CN B2005101246118A CN 200510124611 A CN200510124611 A CN 200510124611A CN 100442256 C CN100442256 C CN 100442256C
- Authority
- CN
- China
- Prior art keywords
- formation
- message
- adapter
- queue
- transmit queue
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Images
Landscapes
- Computer And Data Communications (AREA)
Abstract
A low-latency queue pair (QP) is provided for I/O Adapters that eliminates the overhead associated with work queue elements (WQEs) and defines the mechanisms necessary to allow the placement of the message directly on the queue pair.
Description
Technical field
Present disclosure is usually directed to that computing machine and processor architecture, I/O (I/O) are handled, operating system, and more particularly, relate to be used for the I/O adapter low latency (low-latency) formation to (QP).
Background technology
The I/O adapter, all if can long-range direct memory access (DMA) (remote direct memoryaccess) adapter (RDMA), or RDMA network interface unit (RNIC) is such as InfiniBand
TM(IB) host channel adapter (host channel adapter) (HCA) is defined on the network structure information is sent to the formation of adapter to (QPs) from soft-article-consumption person.Industrial standard, such as can from
The InfiniBand that Trade Association obtains
TMArchitecture specification and from the iWarp of RDMA Consortium is defined in the information that carries on the QP and is work queue unit (work queue element, form WQE) to carry the control information that belongs to this message.In addition, one or more data descriptors point to the message data that will transmit or the message that is received with the position of locating.
Some QP use, need be reduced in such as high-performance calculation (HPC) message is sent to the stand-by period that produces the process of another node from a computing node.Even now, above-mentioned industrial standard mechanism also no longer is fit to high performance computing system.Need that a kind of to strengthen standard QP semantic so that can realize by the required more low latency of these application simultaneously to the mechanism that influences minimum of existing hardware.
Summary of the invention
The present invention be intended to propose a kind of be used to provide eliminate the expense relevant and be defined as permission with the work queue unit message is placed directly in low latency formation right system, method and the computer-readable medium of formation to last required mechanism.
An aspect is the right system of formation that is used to be provided for I/O (I/O) adapter, comprises primary memory, I/O adapter and processor.Primary memory has transmit queue and receives formation.The message that the I/O adapter will receive on link is placed on and receives in the formation, and is transmitted in the message of preserving in the transmit queue on link.Processor and primary memory and I/O adapter communicate, and the user who carries out in the primary memory handles (consumer process).The user handles the access transmit queue and receives formation.
Provide the right method of formation that is used for the I/O adapter on the other hand.The message that the I/O adapter will receive on link is placed on and receives in the formation.The I/O adapter transmits the message that is kept in the transmit queue on link.Receive formation and transmit queue in primary memory.The user handles the access transmit queue and receives formation.The user handle with primary memory and processor that the I/O adapter is communicated by letter on carry out.
Be computer-readable medium on the other hand, storage is used to carry out the instruction of the right method of the formation that is provided for the I/O adapter.The message that the I/O adapter will receive on link is placed on and receives in the formation.The I/O adapter transmits the message that is kept in the transmit queue on link.Receive formation and transmit queue in primary memory.The user handles the access transmit queue and receives formation.The user handle with primary memory and processor that the I/O adapter is communicated by letter on carry out.
Description of drawings
According to following instructions, accessory claim and accompanying drawing, will understand these and other features of the present invention, aspect and advantage better, wherein:
Fig. 1 is the figure exemplary operation environment, distributed computing system of the prior art that is used for embodiments of the invention;
Fig. 2 is a figure part, host channel adapter of the prior art that is used for the exemplary operation environment of embodiments of the invention;
Fig. 3 is the figure of a processing part, work request of the prior art of the example explanation exemplary operation environment that is used for embodiments of the invention;
Fig. 4 is the figure that example illustrates the part of Distributed Computer System of the prior art, and the part of described system is a part that is used for the exemplary operation environment of embodiments of the invention, wherein service-strong Connection Service;
Fig. 5 is a figure part, that be used in layered communication architecture of the prior art that is used for the exemplary operation environment of embodiments of the invention;
Fig. 6 is the block diagram of standard queue of the prior art to structure; And
Fig. 7 is the block diagram of the right exemplary embodiment of low latency formation.
Embodiment
Exemplary embodiment of the present invention provides the elimination expense relevant with the work queue unit and has been defined as permission that message is placed directly in formation is right to the low latency formation of last required mechanism.Exemplary embodiment is preferably in distributed computing system, realizes such as the prior art systems regional network (SAN) of the link with terminal node, switch, router and these parts of interconnection is middle.Fig. 1 to 5 has shown each part of the exemplary operation environment that is used for embodiments of the invention.Fig. 6 has shown that standard queue of the prior art is to structure.Fig. 7 has shown the right exemplary embodiment of low latency formation.
Fig. 1 is the figure of Distributed Computer System.The Distributed Computer System that Fig. 1 represents adopts the form of system area network (SAN) 100, and only is providing for illustrative purpose.Exemplary embodiment of the present invention described below can realize on the computer system of various other types and structure.For example, the scope of the computer system of realization example embodiment from servlets with a processor and a small amount of I/O (I/O) adapter to huge parallel supercomputer system with hundreds of or several thousand processors and several thousand I/O adapters.
SAN100 is the high bandwidth in the Distributed Computer System, the network interconnection node of low latency.Node is to be connected to one or more links of network and to form the source of the message in the network and/or any parts of destination.In described example, SAN100 comprises the node with the form of host processor node 102, host processor node 104, redundant array independent disk (RAID) subsystem node 106 and I/O chassis node 108.SAN100 only is used for the example purpose at the node shown in Fig. 1, because can connect independent processor nodes, I/O adapter node and the I/O device node of arbitrary number and any type.Any one of these nodes can both be served as terminal node, is defined herein as the equipment of initiating or finally use message or frame among the SAN100.
In one exemplary embodiment, have fault processing mechanism in the Distributed Computer System, wherein, this fault processing mechanism allows Distributed Computer System, communicates by letter such as reliable connection between the terminal node among the SAN100 or authentic data newspaper.
Message as used in this, is the unit of the application definition of exchanges data, and it is the base unit of the communication between the processing procedure of cooperating.Grouping is a unit by the data of gateway protocol head and/or afterbody encapsulation.Head is provided for the control and the routing iinformation that guide frame to pass through SAN100 usually.Afterbody comprises control and the Cyclic Redundancy Check that is used to guarantee not send grouping under the situation of destroying content usually.
SAN100 comprises communicating by letter and management infrastructure of interior I/O of support Distributed Computer System and inter-processor communication (IPC).SAN100 shown in Figure 1 comprises switched communication structure 116, and it allows many equipment to utilize high bandwidth and low latency to transmit data in the environment of safety, telemanagement simultaneously.The a plurality of paths by the SAN structure can be communicated by letter and utilize to terminal node on a plurality of ports.Can be used for band data transmission fault-tolerant and that increase by a plurality of ports and the path of SAN shown in Figure 1.
SAN100 among Fig. 1 comprises switch 112, switch 114, switch 146 and router one 17.Switch is a plurality of links are linked together and to allow to use little head destination local identifier (DLID) field will divide into groups to be routed to from the link equipment of another link.Router is a plurality of subnets are linked together and to use big head destination Globally Unique Identifier (DGUID) that frame is routed to the equipment of another link second subnet from a link of first subnet.
In one embodiment, link be any two network structure elements, such as the full-duplex channel between terminal node, switch or router.Exemplary suitable link includes but not limited to the P.e.c. copper cash on copper cable, optical cable and base plate and the printed circuit board (PCB).
To the reliability services type, terminal node generates request grouping and echo reply grouping such as host-processor terminal node and I/O adapter terminal node.Switch and router are along transmit grouping from the source point to the destination.Except that the different CRC trailer fields that each grade in network located to upgrade, switch forwards grouping unchangeably.When routing packets, router upgrades different CRC trailer fields and revises other fields in the head.
In SAN100 shown in Figure 1, host processor node 102, host processor node 104 and I/O chassis 108 comprise that at least one channel adapter (CA) is to carry out interface with SAN100.In one embodiment, each channel adapter is fully at length to be implemented to the source of transmission on SAN structure 116 or the end points of the channel adapter interface that receiver (sink) divides into groups.Host processor node 102 comprises the channel adapter with the form of host channel adapter 118 and host channel adapter 120.Host processor node 104 comprises host channel adapter 122 and host channel adapter 124.Host processor node 102 also comprises CPU (central processing unit) 126-130 and the storer 132 by bus system 134 interconnection.Host processor node 104 comprises CPU (central processing unit) 136-140 and the storer 142 by bus system 144 interconnection similarly.
In one embodiment, host channel adapter is realized with hardware.In this was realized, host channel adapter hardware unloaded many CPU (central processing unit) I/O adapter communication overheads.This hardware realization of host channel adapter also allows a plurality of parallel communicationss on the switching network, and does not have the traditional overhead relevant with communication protocol.In one embodiment, host channel adapter among Fig. 1 and SAN100 are that the I/O and inter-processor communication (IPC) user of Distributed Computer System provides zero processor to copy data to transmit, do not handle and do not comprise operating system nucleus, and utilize hardware that reliable fault-tolerant communications is provided.
As shown in Figure 1, router one 17 is coupled to the wide area network (WAN) and/or the Local Area Network connection of linking other main frames or other routers.I/O chassis 108 among Fig. 1 comprises I/O switch 146 and a plurality of I/O module 148-156.In these examples, the I/O module adopts the form of adapter card.Exemplary adapter card shown in Fig. 1 comprise the scsi adapter card that is used for I/O module 148, to the adapter card of the fibre channel hub that is used for I/O module 152 and optical-fibre channel arbitration ring (FC-AL) equipment, be used for I/O module 150 the Ethernet adapter card, be used for the graphics adapter card of I/O module 154 and the video adapter card that is used for I/O module 156.Can realize the adapter card of any known type.The I/O adapter comprises that also the switch in the I/O adapter is so that be connected to the SAN structure with adapter card.These modules comprise target channel adapter 158-166.
In this example, the RAID subsystem node 106 among Fig. 1 comprises processor 168, storer 170, target channel adapter (TCA) 172 and a plurality of redundancy and/or banded memory disc unit 174.Target channel adapter 172 can be the global function host channel adapter.
SAN100 handles data communication and the inter-processor communication that is used for I/O.SAN100 supports to be used for required high bandwidth of I/O and extensibility (scalability), and supports to be used for required extremely low stand-by period of inter-processor communication and low CPU expense.User's client computer can also be handled and direct access network service hardware by the workaround system kernel, and such as host channel adapter, this allows messaging protocol efficiently.SAN100 is suitable for current computation model, and is to be used for the troop tectonic block of the new model of communicating by letter of I/O and computing machine.In addition, the SAN100 among Fig. 1 allows I/O adapter node to communicate by letter between them or communicate by letter with the arbitrary or whole processor node in the Distributed Computer System.By being connected to the I/O adapter of SAN100, final I/O adapter node have basically with SAN100 in the identical communication capacity of any host processor node.
In one embodiment, SAN100 shown in Figure 1 supports communication semanteme (channelsemantics) and storer semanteme (memory semantics).The passage semanteme is sometimes referred to as transmission/reception or advances communication (push communication) operation.The passage semanteme is the communication type that adopts in traditional I/O passage, wherein, and source device propulsion data, and the final destination of destination equipment specified data.In the passage semanteme, the communication port that the grouping that handle to transmit from the source specifies ground to handle, but designated packet will not be written in what position of the storage space of handling the destination.Therefore, in the passage semanteme, the destination handles to allocate where place the data that transmitted in advance.
In the storer semanteme, the source is handled and is directly read or write the virtual address space that handle the remote node destination.The processing of long-range destination only needs and is used for the location communication of the impact damper of data, and does not need to be included in the transmission of arbitrary data.Therefore, in the storer semanteme, the source is handled and is sent the packet that comprises the destination buffer memory address of handling the destination.In the storer semanteme, its storer of the visit of allowance source processing is in advance handled in the destination.
Concerning I/O and inter-processor communication, passage semanteme and storer semanteme all are essential usually.The combination of passage and storer semanteme is adopted in typical I/O operation.In the exemplary I/O operation of Distributed Computer System shown in Figure 1, host processor node, by using the passage semanteme, start the I/O operation such as host processor node 102 to dish I/O adapter, such as RAID subsystem objectives channel adapter (TCA) 172 transmitting panel write commands.Order is somebody's turn to do in dish I/O adapter check, and uses the storer semanteme to come direct storage space read data buffer from host processor node.After read data buffer, dish I/O adapter adopts the passage semanteme that I/O is finished message and back into host processor node.
In one exemplary embodiment, carry out to adopt the operation of virtual address and virtual memory protection mechanism so that guarantee correct and suitable access in the Distributed Computer System shown in Fig. 1 to all storeies.The application that moves in this Distributed Computer System does not require the physical addressing that is used for any operation.
Now, with reference to figure 2, the figure of host channel adapter of the prior art is described.Host channel adapter 200 shown in Figure 2 comprises that a set of queues to (QP) 202-210, is used for transmitting the message to host channel adapter port 212-216.By Virtual Path (VL) 218-234, vectoring information is buffered to host channel adapter port 212-216, and wherein, each VL has its oneself flow control.Subnet manager is by being used for the local address of each physical port, i.e. the LID of port, collocation channel adapter.Subnet manager agency (SMA) the 236th, the entity of communicating by letter with subnet manager for the purpose of collocation channel adapter.Memory map and protection (MTP) the 238th become physical address with virtual address translation and verify the mechanism of access right.Direct memory access (DMA) (DMA) 240 to 202-210, is used storer 242 with respect to formation, and the direct memory access (DMA) operation is provided.
Single channel adapter, all host channel adapters 200 as shown in Figure 2 can support that several thousand formations are right.On the contrary, the target channel adapter in the I/O adapter supports that usually the formation of lesser number is right.Each formation sends work queue (SWQ) and receives work queue comprising.Send work queue and be used for sendaisle and the semantic message of storer.Receive the semantic message of work queue receiving cable.Client's call operation system-specific DLL (dynamic link library), it is called Verbs at this, so that work request (WR) is placed in the work queue.
With reference now to Fig. 3,, the figure of the processing that illustrates work request of the prior art is described.In Fig. 3, exist to receive work queue 300, send work queue 302 and finish formation 304, be used to handle from and the request that is used for user 306.These requests from user 306 finally are sent to hardware 308.In this example, user 306 generates work request 310 and 312, and reception work finishes 314.As shown in Figure 3, the work request that is placed in the work queue is called as work queue unit (WQE).
Send work queue 302 and comprise work queue unit (WQEs) 322-328, it has described the data that will transmit on the SAN structure.Receive work queue 300 and comprise work queue unit (WQE) 316-320, it describes the arrival passage semantic data of where placing from the SAN structure.The work queue unit is handled by the hardware in the host channel adapter 308.
Verbs also is provided for from finishing the mechanism that formation 304 retrievals have been finished the work.As shown in Figure 3, finishing formation 304 comprises and finishes queue unit (CQE) 330-336.Finish the information that queue unit comprises relevant previous completed work queue unit.Finishing formation 304 is used for creating and is used for the right a single point of finishing communication of a plurality of formations.Finishing queue unit is the data structure of finishing in the formation.Completed work queue unit is described in this unit.Finishing queue unit comprises enough information and determines the work queue unit of formation to finishing with appointment.Finish the formation context and be and comprise pointer, length and management each finishes the message block of other required information of formation.
The exemplary operation request of support transmission work queue 302 shown in Figure 3 is as follows.Send work queue and be and be used for one group of local data's section is advanced to passage semantic operation by the data segment of the reception work queue elements reference (reference) of remote node.For example, work queue unit 328 covers the reference of data segment 4 338, data segment 5 340 and data segment 6 342.Each data segment that sends work request comprises virtual continuous storage space.Be used for reference to the virtual address of local data's section in the address context of creating the right processing of local queue.
In one embodiment, a kind of work queue unit is only supported in reception work queue 300 shown in Figure 3, and it is called reception work queue unit.Reception work queue unit provides a description the passage semantic operation to the local storage space of the transmission message that wherein writes arrival.Receive the work queue unit and comprise the scattering tabulation (scatterlist) of describing several virtual continuous storage space.The transmission message that arrives is written in these storage space.Virtual address is in the address context of creating the right processing of local queue.
To inter-processor communication, the user model software processes is right by formation, directly transmits data from the position that impact damper resides in the storer.In one embodiment, by the right transmission workaround system of formation and consume the still less host command cycle.Formation allows zero processor to copy data to transmit and does not relate to operating system nucleus.Zero processor to copy data transmit effective support that high bandwidth is communicated by letter with low latency are provided.
When create formation to the time, formation is set to so that the transmission service of selection type is provided.In one embodiment, realize four kinds of transmission services of Distributed Computer System support of the present invention: reliable connection, unreliable connection, authentic data newspaper and corrupt data are reported Connection Service.
Utilize reliable Connection Service so as the part of the Distributed Computer System of between distributed treatment, communicating by letter usually as shown in Figure 4.Distributed Computer System 400 among Fig. 4 comprises host processor node 1, host processor node 2, and host processor node 3.Host processor node 1 comprises handles A 410.Host processor node 3 comprises to be handled C420 and handles D430.Host processor node 2 comprises handles E440.
A formation that is arranged in reliable Connection Service makes data be written into reception memorizer space by the right reception WQE reference of the formation that connects to last WQE.RDMA operates in to connect on the right address space of formation and operates.
In one embodiment, because hardware is kept serial number and replied all groupings and transmit, institute is so that reliable Connection Service is reliable.Communicating by letter of any failure of combination retry of hardware and SAN driver software.Even the right processing client of formation exist receive under error code, the operation and the situation of network congestion under, also can obtain reliable communication.If there is alternative path to be present in the SAN structure,, also can keep reliable communication even so under the situation that fabric switch machine, link or channel adapter port break down.
In addition, can utilize and reply to come on the SAN structure, to transmit reliably data.Reply and can be, or can not be to handle level to reply, i.e. checking receives to handle have used replying of data.As selection, replying can be only to represent that data have arrived replying of its destination.
An embodiment who realizes layered communication architecture 500 of the present invention usually as shown in Figure 5.The layered architecture figure of Fig. 5 has shown each layer and the data of interlayer transmission and the institutional framework of control information of Data Communications Channel.
Host channel adapter terminal node protocol layer (being adopted by for example terminal node 511) comprises by user's 503 defined upper-layer protocols 502, transport layer 504, network layer 506, link layer 508 and Physical layer 510.Exchange layer (being adopted by for example switch 513) comprises link layer 508 and Physical layer 510.Route layer (being adopted by for example router five 15) comprises network layer 506, link layer 508 and Physical layer 510.
The position transmission that Physical layer 510 execution techniques are relevant.Through link 522,524 and 526, between Physical layer, transmit position or hyte.Link can or pass through other suitable links and realize with P.e.c. copper cash, copper cable, optical cable.
Fig. 6 represents that the formation of standard in the prior art is to structure.Fig. 6 is by dot-dash horizontal line separated into two parts, i.e. host channel adapter (HCA) 602 under primary memory on the line 600 and the line.
HCA602 comprises QP table 622, and it has a plurality of clauses and subclauses 624 (QPTEs a/k/a QP context).Each clauses and subclauses 626 comprises transmit queue head pointer 628, reception queue head pointer 630, transmit queue totalizer counting 636, receives formation totalizer counting 638 and other information 640.
Standard queue shown in Figure 6 transmits and receives in the process of message being used in.
For transmitting message, HCA602 at first takes out WQE.Then, handle, determine the physical address of the message in the primary memory by the virtual address among the WQE, key and length information by address translation.Then, the message in the taking-up primary memory 600.At last, make up one or more groupings so that on link, transmit this message.
When HCA602 received grouping on link, the part of packet header comprised QP number.Message during adapter will divide into groups is placed in the reception formation 606 of the QP608 with that number.Then, where the WQE (WQE1 660) that take out to receive the head place of formation 606 is placed on this message in the primary memory 600 so that determine.By being used for the reception queue head pointer 630 of clauses and subclauses 626 of that QP number QP table 622, point to the head that receives formation.HCA602 takes out WQE (WQE1 660), and it comprises virtual address, key and the length of describing the position of placing this message, and HCA carries out conversion so that determine physical address, and then, HCA is placed on this place with this message.
Fig. 7 has shown the right exemplary embodiment of low latency formation.Low latency is meant and is used for message is sent to the time that another node spends from a node.The application of some performance-critical is arranged,, wherein need low latency such as high-performance calculation.For example, compare with using the time that exemplary embodiment of the present invention spent, the time that the about twice of cost is handled in some modelings with I/O adapter of standard QP is sent to storer in another node with the storer of message from a node.
Fig. 7 is by dot-dash horizontal line separated into two parts, i.e. I/O adapter 702 under primary memory on the line 700 and the line.Primary memory 700 and processor, relevant such as server.The user software that moves on processor uses the data that produced by hardware generator, I/O adapter 702.Data can be the data of message or any kind.The example of I/O adapter 702 comprises the adapter of support RDMA (RDMA-capable) or the adapter of RNIC, HCA or any other type.Preferably, I/O adapter 702 is relatively near primary memory 700.
Primary memory 700 keeps transmit queue 704 and receives formation 706, and they form formation to 708.
Adapter 702 comprises QP table 712, and it has by a plurality of clauses and subclauses of QP number 716 index (QPTEs a/k/a QP context).Each clauses and subclauses 718 comprises transmit queue head pointer 720, receives queue head pointer 722, the reception number of queues 734 of reception queue length 726, the transmit queue totalizer counting 728 of the transmit queue length 724 of message, message, the transmit queue numbers 732 that receives formation totalizer counting 730, message, message, a plurality of transmit queue message 738 that each is finished, receive formation and whether finish 740 and other information 742.Preferably, the information in the queue table 712 in the I/O adapter by high-speed cache.
For example, transmitting and receiving in the process of message, use exemplary low latency formation shown in Figure 7 right.For transmitting message 710, the user uses and simply 710 message is placed directly on the transmit queue 704.User notification I/O adapter 702 is placed on one or more message 710 on the transmit queue 704 by that number storage is counted in 718 in the transmit queue totalizer.Then, the message that I/O adapter 702 directly takes out by 720 references of transmit queue head pointer from primary memory 710, and make up grouping so that on link, send.When adapter 702 received grouping on link, adapter 702 was directly receiving mobile messaging 710 in the formation 706 simply.Therefore, the stand-by period be lower than standard queue shown in Figure 6 to and more efficient.
A right application of exemplary low latency formation is in high-performance computing environment, wherein, a plurality of nodes that connect in cluster is arranged and carry out parallel processing on very large task.Data and control messages flow between node.The exemplary embodiment of Fig. 7 will help to increase the processing speed of this system.Usually, the message in this system may be 128 byte longs.
Compare the WQE in exemplary embodiment shown in Figure 7 of no use with Fig. 6.Eliminate four problems that solving has appearred in WQEs in the exemplary embodiment of Fig. 7.
The first, adapter 702 need can be searched the message 710 that will transmit under the situation without any WQE.This solves by message 710 is placed directly on the transmit queue 704.
The second, adapter 702 need know that reception is maybe with the length of the message 710 that transmits.This by generate as by the characteristic of the described QP table clause 718 of RQ length of the SQ length of LL message 724 and LL message 726, be that length solves.This length is fixed size, and this is favourable to adapter 702 hardware.The example of message size comprises 128 bytes, 256 bytes, 512 bytes or the like.
The 3rd, software users needs finishing of success message transmission to notify so that the space that recovery force lists.Traditionally, this information is the available parameter among the WQE.Expectation is each to generate the queue entries of finishing that is used for a not only message 710, so that reduce bandwidth and improve performance.Therefore, each QP table clause 718 comprises the transmit queue message count 738 that each is finished.Each transmit queue message count 738 of finishing can be any requisite number, comprises one.
Similarly, software users need know when receive message 710.This solves by all or noon (all-or-nothing) option, and it is whether reception formation in the QP table clause 718 finishes 740 fields.In " having entirely " pattern, each 710 message that is received are provided finish." completely without " in the pattern, to the message 710 that is received, never provide and finish.In this case, the fact that receives message 710 is embedded in the message 710 itself that receives formation 706.For example, the available potential energy in the message 710 receives efficient message 710 by the software users inquiry to determine when.
The 4th, it is right that adapter 702 need know when formation is configured to the low latency formation to 708.This is by generating config option, being that low latency solves.For example, when create formation to the time, software users can with formation to be configured to the low latency formation to 708 or standard queue to 608 (Fig. 6).
Exemplary embodiment of the present invention has many advantages.The low latency formation that exemplary embodiment of the present invention provides the elimination expense relevant with the work queue unit to and defined and allow message is placed directly in formation to last required mechanism.Can realize these savings at the transmission and the receiving end of link.Simulation result has shown that use the present invention can make the stand-by period of node-to-node approximately reduce by half.In addition, exemplary embodiment can be operated mutually with other standard nodes that do not realize those exemplary embodiments, and does not have adverse effect (still not realizing the over-all properties benefit when realizing on two nodes).
As mentioned above, can realize embodiments of the invention to be used to implement the computer implemented processing of those processing and the form of device.Embodiments of the invention also can be with comprising tangible medium, realizing such as the form of the computer program code of the instruction that comprises in floppy disk, CD-ROMs, hard disk drive or any other computer-readable recording medium, wherein, when computer program code being loaded in the computing machine and being carried out by it, computing machine becomes and is used to implement device of the present invention.The present invention also form of general-purpose computers program code realizes, for example, no matter whether this program code is stored in the storage medium, is loaded into and/or is carried out or on some transmission mediums, such as electric wire or cable, transmit by optical fiber or through electromagnetic radiation by computing machine, wherein, when computer program code being loaded in the computing machine and being carried out by it, computing machine becomes and is used to implement device of the present invention.When carrying out on general purpose microprocessor, computer program code section configure microprocessor is so that create dedicated logic circuit.
Although reference example embodiment has described the present invention, it should be appreciated by those skilled in the art that and to make various changes and can replace its element, and do not deviate from scope of the present invention with equivalent.In addition, various parts can be realized with hardware, software or firmware or its combination in any.At last, can make various improvement and make particular condition or material be applicable to instruction of the present invention, and not deviate from its essential scope.Therefore, the invention is not restricted to conduct and be used to realize desired the best of the present invention or the disclosed specific embodiment of sole mode, but the present invention will comprise all embodiment that drop in the accessory claim book scope.Use first, second grade of term not represent any order or importance, first, second waits distinct elements and be to use term.Use one, one of term etc. not represent the restriction of quantity, there is at least one referenced items in the phase antirepresentation.
Claims (16)
1. right system of formation that is used to be provided for I/O-I/O adapter comprises:
Primary memory has transmit queue and receives formation;
The I/O adapter, the message self that is used for receiving on link is placed on described reception formation, and is used for being transmitted on link the message self that described transmit queue is preserved; And
Processor is communicated by letter with described I/O adapter with described primary memory, and the user that described processor is carried out in the described primary memory handles, and described user handles the access transmit queue and receives formation,
Wherein, described I/O adapter comprises the formation his-and-hers watches, and described formation his-and-hers watches comprise the transmit queue characteristic that is used for described transmit queue and are used for the reception formation characteristic of described reception formation.
2. the system as claimed in claim 1, wherein, described transmit queue and described reception formation do not keep work queue unit-WQE.
3. the system as claimed in claim 1, wherein, described transmit queue characteristic comprises message-length.
4. the system as claimed in claim 1, wherein, described reception formation characteristic comprises message-length.
5. the system as claimed in claim 1, wherein, described formation his-and-hers watches comprise the transmit queue totalizer counter that is used for notifying described I/O adapter when message has been placed on the described transmit queue.
6. the system as claimed in claim 1, wherein, described characteristic comprises the transmit queue message count that each is finished.
7. the system as claimed in claim 1, wherein, described characteristic comprises that whether having the formation of reception finishes.
8. the system as claimed in claim 1, wherein, described user's processing configuration particular queue is right, so that described I/O adapter will point to the work queue unit-WQE of the message that receives and be placed on and be used for the right reception formation of that particular queue on described link, and on described link, transmit by being kept at the message that the WQE that is used for the right transmit queue of that particular queue points to.
9. right method of formation that is provided for I/O-I/O adapter comprises:
The message self that will be received on link by the I/O adapter is placed in the reception formation of primary memory;
Transmit the message self that is kept in the transmit queue by the I/O adapter on link, described transmit queue is in described primary memory;
Handle described transmit queue of access and described reception formation by the user, described user handle with described primary memory and processor that described I/O adapter is communicated by letter on carry out,
Wherein, described I/O adapter comprises the formation his-and-hers watches, and described formation his-and-hers watches comprise the transmit queue characteristic that is used for described transmit queue and are used for the reception formation characteristic of described reception formation.
10. method as claimed in claim 9, wherein, work queue unit-WQE is not preserved in described transmit queue and described reception formation.
11. method as claimed in claim 9, wherein, described transmit queue characteristic comprises message-length.
12. method as claimed in claim 9, wherein, described reception formation characteristic comprises message-length.
13. method as claimed in claim 9, wherein, described formation his-and-hers watches comprise and being used for when message has been placed on the described transmit queue, notify the transmit queue totalizer counter of described I/O adapter.
14. method as claimed in claim 9, wherein, described characteristic comprises the transmit queue message count that each is finished.
15. method as claimed in claim 9, wherein, described characteristic comprises that whether having the formation of reception finishes.
16. method as claimed in claim 9 further comprises:
Right by described user's processing configuration particular queue, so that described I/O adapter will point to the work queue unit-WQE of the message that receives and be placed on and be used for the right reception formation of that particular queue on described link, and on described link, transmit by being kept at the message that the WQE that is used for the right transmit queue of that particular queue points to.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US10/985,460 | 2004-11-10 | ||
US10/985,460 US8055818B2 (en) | 2004-08-30 | 2004-11-10 | Low latency queue pairs for I/O adapters |
Publications (2)
Publication Number | Publication Date |
---|---|
CN1815458A CN1815458A (en) | 2006-08-09 |
CN100442256C true CN100442256C (en) | 2008-12-10 |
Family
ID=36907674
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CNB2005101246118A Expired - Fee Related CN100442256C (en) | 2004-11-10 | 2005-11-09 | Method, system, and storage medium for providing queue pairs for I/O adapters |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN100442256C (en) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7668984B2 (en) * | 2007-01-10 | 2010-02-23 | International Business Machines Corporation | Low latency send queues in I/O adapter hardware |
TW201237632A (en) * | 2010-12-21 | 2012-09-16 | Ibm | Buffer management scheme for a network processor |
US9354933B2 (en) * | 2011-10-31 | 2016-05-31 | Intel Corporation | Remote direct memory access adapter state migration in a virtual environment |
CN104426797B (en) * | 2013-08-27 | 2018-03-13 | 华为技术有限公司 | A kind of communication means and device based on queue |
CN103942097B (en) * | 2014-04-10 | 2017-11-24 | 华为技术有限公司 | A kind of data processing method, device and the computer for possessing related device |
CN112256407A (en) * | 2020-12-17 | 2021-01-22 | 烽火通信科技股份有限公司 | RDMA (remote direct memory Access) -based container network, communication method and computer-readable medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6480500B1 (en) * | 2001-06-18 | 2002-11-12 | Advanced Micro Devices, Inc. | Arrangement for creating multiple virtual queue pairs from a compressed queue pair based on shared attributes |
US20030035433A1 (en) * | 2001-08-16 | 2003-02-20 | International Business Machines Corporation | Apparatus and method for virtualizing a queue pair space to minimize time-wait impacts |
US20030202519A1 (en) * | 2002-04-25 | 2003-10-30 | International Business Machines Corporation | System, method, and product for managing data transfers in a network |
CN1487417A (en) * | 2002-09-05 | 2004-04-07 | �Ҵ���˾ | ISCSI drive program and interface protocal of adaptor |
US6742075B1 (en) * | 2001-12-03 | 2004-05-25 | Advanced Micro Devices, Inc. | Arrangement for instigating work in a channel adapter based on received address information and stored context information |
US6789143B2 (en) * | 2001-09-24 | 2004-09-07 | International Business Machines Corporation | Infiniband work and completion queue management via head and tail circular buffers with indirect work queue entries |
-
2005
- 2005-11-09 CN CNB2005101246118A patent/CN100442256C/en not_active Expired - Fee Related
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6480500B1 (en) * | 2001-06-18 | 2002-11-12 | Advanced Micro Devices, Inc. | Arrangement for creating multiple virtual queue pairs from a compressed queue pair based on shared attributes |
US20030035433A1 (en) * | 2001-08-16 | 2003-02-20 | International Business Machines Corporation | Apparatus and method for virtualizing a queue pair space to minimize time-wait impacts |
US6789143B2 (en) * | 2001-09-24 | 2004-09-07 | International Business Machines Corporation | Infiniband work and completion queue management via head and tail circular buffers with indirect work queue entries |
US6742075B1 (en) * | 2001-12-03 | 2004-05-25 | Advanced Micro Devices, Inc. | Arrangement for instigating work in a channel adapter based on received address information and stored context information |
US20030202519A1 (en) * | 2002-04-25 | 2003-10-30 | International Business Machines Corporation | System, method, and product for managing data transfers in a network |
CN1487417A (en) * | 2002-09-05 | 2004-04-07 | �Ҵ���˾ | ISCSI drive program and interface protocal of adaptor |
Also Published As
Publication number | Publication date |
---|---|
CN1815458A (en) | 2006-08-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN100361100C (en) | Method and system for hardware enforcement of logical partitioning of a channel adapter's resources in a system area network | |
US7233570B2 (en) | Long distance repeater for digital information | |
US6748559B1 (en) | Method and system for reliably defining and determining timeout values in unreliable datagrams | |
EP1374521B1 (en) | Method and apparatus for remote key validation for ngio/infiniband applications | |
US8265092B2 (en) | Adaptive low latency receive queues | |
CN100375469C (en) | Method and device for emulating multiple logic port on a physical poet | |
US7668984B2 (en) | Low latency send queues in I/O adapter hardware | |
EP1399829B1 (en) | End node partitioning using local identifiers | |
US6578122B2 (en) | Using an access key to protect and point to regions in windows for infiniband | |
US8341237B2 (en) | Systems, methods and computer program products for automatically triggering operations on a queue pair | |
TW583544B (en) | Infiniband work and completion queue management via head and tail circular buffers with indirect work queue entries | |
US7899050B2 (en) | Low latency multicast for infiniband® host channel adapters | |
US6938138B2 (en) | Method and apparatus for managing access to memory | |
US9037640B2 (en) | Processing STREAMS messages over a system area network | |
JP5735883B2 (en) | How to delay the acknowledgment of an operation until the local adapter read operation confirms the completion of the operation | |
US20020073257A1 (en) | Transferring foreign protocols across a system area network | |
US20090077268A1 (en) | Low Latency Multicast for Infiniband Host Channel Adapters | |
US20030035433A1 (en) | Apparatus and method for virtualizing a queue pair space to minimize time-wait impacts | |
US20030018828A1 (en) | Infiniband mixed semantic ethernet I/O path | |
US6990528B1 (en) | System area network of end-to-end context via reliable datagram domains | |
CN100442256C (en) | Method, system, and storage medium for providing queue pairs for I/O adapters | |
US7409432B1 (en) | Efficient process for handover between subnet managers | |
US20020198927A1 (en) | Apparatus and method for routing internet protocol frames over a system area network | |
US6601148B2 (en) | Infiniband memory windows management directly in hardware | |
JP2002305535A (en) | Method and apparatus for providing a reliable protocol for transferring data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20081210 Termination date: 20181109 |
|
CF01 | Termination of patent right due to non-payment of annual fee |