CN101409715A - Method and system for communication using InfiniBand network - Google Patents

Method and system for communication using InfiniBand network Download PDF

Info

Publication number
CN101409715A
CN101409715A CNA2008102246636A CN200810224663A CN101409715A CN 101409715 A CN101409715 A CN 101409715A CN A2008102246636 A CNA2008102246636 A CN A2008102246636A CN 200810224663 A CN200810224663 A CN 200810224663A CN 101409715 A CN101409715 A CN 101409715A
Authority
CN
China
Prior art keywords
recipient
packet
transmit leg
rdma
current data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CNA2008102246636A
Other languages
Chinese (zh)
Other versions
CN101409715B (en
Inventor
林瑶
韩冀中
张洪伟
贺劲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Computing Technology of CAS
Original Assignee
Institute of Computing Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Computing Technology of CAS filed Critical Institute of Computing Technology of CAS
Priority to CN2008102246636A priority Critical patent/CN101409715B/en
Publication of CN101409715A publication Critical patent/CN101409715A/en
Application granted granted Critical
Publication of CN101409715B publication Critical patent/CN101409715B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention relates to a method which uses InfiniBand network for communication and a system thereof. The method comprises: first step, a sender and a receiver exchange handshaking information which comprises QPN that is used for establishing a new link in the InfiniBand network, RDMA buffer address of the receiver and the size of the RDMA buffer of the receiver; second step, the sender directly writes a current data packet into the RDMA buffer of the receiver according to the RDMA buffer address of the receiver and the size of the RDMA buffer of the receiver; third step, the sender upgrades the buffer state of the receiver which is stored by the sender; forth step, when the application data transmission is finished, the sender closes the link. The method and the system thereof have the advantages that uStream is superior to SDP in both bandwidth and delay and performance comparative to a bottom InfiniBand Verbs interface is obtained.

Description

The method and system that a kind of InfiniBand of utilization network communicates
Technical field
The present invention relates to the InfiniBand network, relate in particular to the method and system that a kind of InfiniBand of utilization network communicates.
Background technology
Owing to have low cost, high-performance and good extensibility, since generation nineteen ninety, the computer cluster that is connected with other high performance network based on Ethernet has obtained increasingly extensive application in high-performance calculation and enterprise calculation field.Major technique as the computer cluster interconnected, system area network (SAN, system area networks) obtained very fast development simultaneously, some have the system area network of the low lag characteristic of high bandwidth, as Myrinet, Quadrics, SCI and InfiniBand etc. are owing to can provide the main flow interconnection technique that becomes the network high-speed passage than the higher performance of Ethernet gradually.
Wherein, InfiniBand is one of at present most widely used system area network, and it is widely used in the HPCC, and gets the nod in enterprise data center market.On the Top500 high-performance computer ranking list of announcing in November, 2007, there is 24.2% cluster computing system to use InfiniBand.InfiniBand has high bandwidth, the low performance that postpones, and it provides many advanced features, as mechanism such as remote direct memory visit (RDMA, Remote Direct Memory Access) and zero-copy.
The RDMA communication mechanism allows data directly to transmit between application program address space and network, and the critical path of operating system nucleus from transfer of data bypassed, and has reduced the memory copying number of times, is a kind of data transmission mechanism efficiently.Zero-copy mechanism has been avoided the frequent copy of data between each layer of communication protocol stack, has alleviated the load of operating system nucleus, is the effective means that improves communication performance.How to utilize these advanced features of InfiniBand network, become a research focus in trunking communication field for application provides the high performance communication technology.
At present the communication protocol stack on the InfiniBand network mainly contains IPoIB (IP over IB) and SDP (Socket Direct Protocol), and they provide approach for the advanced feature based on the applications exploiting InfiniBand network of Socket.But they all are to depend on the communication protocol that operating system nucleus is realized, always can introduce the expense that user/kernel spacing context switches and data copy, and have suitable complexity.Wherein, IPoIB by the TCP/IP emulation technology realize that traditional TCP is applied to the mapping of InfiniBand network; SDP then is based on the Socket of kernel state driving interface (kVerbs) design, realizes the adaptive of InfiniBand DLL (dynamic link library) and traditional Socket DLL (dynamic link library).Owing to increased unnecessary protocol hierarchy, IPoIB has introduced the expense of protocol processes; Though and the SDP that realizes based on Send/Receive model and kernel-bypass message transmission protocol can reach than IPoIB more performance, but the expense that user/kernel spacing context switches and data copy is arranged equally, and destroyed the asynchronous model of InfiniBand, agreement itself is more complicated also.
Though some research institutions and scholar have proposed the improvement at SDP, data copy by eliminating user/kernel spacing or provide the asynchronous communication model to improve SDP, but expense that user/kernel spacing context switches and the complexity of SDP still exist.Test result shows, the delay of SDP and bandwidth all have big gap with the performance of bottom-layer network driving interface (InfiniBand Verbs), the minimum delay of SDP almost is 6 times of bottom InfiniBand Verbs interface, and the peak bandwidth of SDP also only reaches about 70% of bottom InfiniBand Verbs interface.As seen, InfiniBand network high bandwidth, the low performance that postpones can't be fully utilized by existing IPoIB and SDP.
Summary of the invention
In order to solve above-mentioned technical problem, the method and system that provide a kind of InfiniBand of utilization network to communicate, its purpose is, utilize the InfiniBand network characteristic, for application provides than existing IPoIB and the more high performance communication plan of SDP, InfiniBand network high bandwidth, the low performance that postpones can be given full play to.
The invention provides the method that a kind of InfiniBand of utilization network communicates, comprising:
Step 1, transmit leg and recipient exchange handshaking information, comprising the QPN, recipient's RDMA buffer zone address and recipient's the RDMA buffer size that are used for creating at the InfiniBand network a new connection
Step 2, transmit leg is directly write current data packet in recipient's the RDMA buffering area according to described recipient's RDMA buffer zone address and described recipient's RDMA buffer size;
Step 3, transmit leg upgrades recipient's buffer state that transmit leg is preserved;
Step 4, after the application data transmission was finished, transmit leg was closed above-mentioned connection.
Packet comprises packet header and data; Packet header comprises following parameter: the destination address of the data length of the packet number of current data packet, the destination address of current data packet, current data packet, unroll address and next packet, the address of wherein unrolling is used for being set to the destination address of the data of current data packet when packet header of current data packet and data are separately sent; The destination address of next packet is used to inform that next packet of recipient is with the position that is written into; The recipient RDMA buffer state that transmit leg is preserved according to this locality obtains the destination address of current data packet.
In the step 1, transmit leg exchanges handshaking information with the recipient by being connected initial message.
Handshaking information also comprises the RDMA buffer size of the RDMA buffer zone address and the transmit leg of transmit leg.
In the step 2:
If transmit leg is when sending current data packet, judge recipient's RDMA buffering area afterbody remaining can write the space continuously and write the data of current data packet inadequately the time, then the address of unrolling that comprises in the current data packet packet header is changed to the destination address of the data of current data packet; Transmit leg is directly write remaining can the writing continuously in the space of recipient RDMA buffering area afterbody according to the destination address of current data packet with the packet header of current data packet, according to the address of unrolling of current data packet the data of current data packet is written to the head of recipient RDMA buffering area then;
If transmit leg judge recipient's RDMA buffering area afterbody remaining can write the space continuously and enough write current data packet the time, current data packet is directly write in recipient's the RDMA buffering area according to the destination address of current data packet;
If transmit leg when sending current data packet, judge recipient's RDMA buffering area afterbody remaining can write the space continuously and write the packet header of next packet inadequately the time, then the destination address of the next packet that comprises in the current data packet packet header is changed to the initial address of recipient RDMA buffering area.
In the step 3:
When unacknowledged data length in recipient's the buffering area surpassed pre-set threshold, the recipient initiatively sent first request message, confirmed the recipient's buffer state that receives data and notify transmit leg to upgrade its preservation; Perhaps
Can write the space when not enough when thereby unacknowledged data in recipient's the buffering area too much causes, transmit leg initiatively sends second request message, and the request recipient confirms data and replys first request message to upgrade recipient's buffer state of its preservation.
Step 2 comprises:
Step 21, the main thread of transmit leg are inquired about a buffering area that is fit to deposit current data packet in transmit queue;
Step 22, the main thread of transmit leg writes data in this buffer area;
Step 23, the main thread of transmit leg judge whether recipient's buffer area has enough spaces, if having then wake the transmission thread of transmit leg up;
Step 24, after the transmission thread of transmit leg is waken up, with write direct recipient's RDMA buffering area of current data packet, and write recipient's RDMA buffering area at packet after, this buffering area of main thread of the transmission thread of transmit leg notice transmit leg can write new packet.
Step 2 and step 3 are carried out alternately.
In the step 3, the recipient uses recipient's control thread to send first request message, and transmit leg uses the control thread of transmit leg to send second request message.
The invention provides the system that a kind of InfiniBand of utilization network communicates, comprise transmit leg and recipient,
Transmit leg is used for exchanging handshaking information with the recipient, comprising QPN, recipient's RDMA buffer zone address and recipient's RDMA buffer size; QPN is used for creating a new connection at the InfiniBand network;
Transmit leg is also directly write current data packet in recipient's the RDMA buffering area according to recipient RDMA buffer zone address and RDMA buffer size; Upgrade recipient's buffer state that transmit leg is preserved; After the application data transmission is finished, close above-mentioned connection.
Packet comprises packet header and data; Packet header comprises following parameter: the destination address of the data length of the packet number of current data packet, the destination address of current data packet, current data packet, unroll address and next packet, the address of wherein unrolling is used for being set to the destination address of the data of current data packet when packet header of current data packet and data are separately sent; The destination address of next packet is used to inform that next packet of recipient is with the position that is written into; The recipient RDMA buffer state that transmit leg is preserved according to this locality obtains the destination address of current data packet and the destination address of next packet.
Transmit leg exchanges handshaking information with the recipient by being connected initial message.
Handshaking information also comprises the RDMA buffer size of the RDMA buffer zone address and the transmit leg of transmit leg.
Transmit leg also is used for:
When transmit leg sends current data packet, judge recipient's RDMA buffering area afterbody remaining can write the space continuously and write the data of current data packet inadequately the time, then the address of unrolling that comprises in the current data packet packet header is changed to the destination address of the data of current data packet; Transmit leg is directly write remaining can the writing continuously in the space of recipient RDMA buffering area afterbody according to the destination address of current data packet with the packet header of current data packet, according to the address of unrolling of current data packet the data of current data packet is written to the head of recipient RDMA buffering area then;
When transmit leg sends current data packet, judge recipient's RDMA buffering area afterbody remaining can write the space continuously and write the packet header of next packet inadequately the time, then the destination address of the next packet that comprises in the current data packet packet header is changed to the initial address of RDMA buffering area.
The recipient also is used for initiatively sending first request message when unacknowledged data length in recipient's the buffering area surpasses pre-set threshold, confirms the recipient's buffer state that receives data and notify transmit leg to upgrade its preservation; Perhaps
Transmit leg, also be used for to write the space when not enough when thereby unacknowledged data in recipient's the buffering area too much causes, transmit leg initiatively sends second request message, and the request recipient confirms data and replys first request message to upgrade recipient's buffer state of its preservation.
Transmit leg comprises main thread and sends thread;
Main thread is used at buffering area that is fit to deposit current data packet of transmit queue inquiry; Main thread writes data in this buffer area; Whether the RDMA buffer area of judging the recipient has enough spaces, if having then wake the transmission thread up;
Send thread, the recipient's that is used for after being waken up current data packet write direct RDMA buffering area, and write recipient's RDMA buffering area at packet after, this buffering area of main thread of the transmission thread of transmit leg notice transmit leg can write new packet.
The recipient uses recipient's control thread to send first request message, and transmit leg uses the control thread of transmit leg to send second request message.
No matter the present invention still postpones uStream in bandwidth, all than SDP very big performance boost has been arranged, and has reached and the suitable performance of bottom InfiniBand Verbs interface.
The present invention has realized an advantages of simplicity and high efficiency high-performance stream communication unit uStream, it supports the RDMA characteristic and the zero-copy mechanism of InfiniBand network, eliminated the expense that user/kernel spacing context switches, for application provides than the existing IPoIB and the higher communication technology of SDP performance, InfiniBand network high bandwidth, the low performance that postpones are given full play to.
Description of drawings
Fig. 1 is separate double path (data/control) schematic diagram of uStream;
Fig. 2 is the structural representation in uStream packet packet header;
Fig. 3 is the segmentation operation chart of unrolling;
Fig. 4 is the not segmentation operation chart of unrolling;
Fig. 5 is the first kind of situation that produces and send confirmation of receipt message;
Fig. 6 is the second kind of situation that produces and send confirmation of receipt message;
Fig. 7 is the operating state of transmit queue and the state transition graph in each minibuffer district;
Fig. 8 is the communication process schematic diagram of uStream;
Fig. 9 is uStream and SDP delay performance result relatively;
Figure 10 is uStream and InfiniBand bottom layer driving interface (uVerbs) delay performance result relatively;
Figure 11 is uStream and SDP bandwidth performance result relatively;
Figure 12 is uStream and InfiniBand bottom layer driving interface (uVerbs) bandwidth performance result relatively.
Embodiment
The present invention is by adopting series of key techniques to design and Implement a high performance stream communication unit, called after uStream in user's attitude.Owing to realize in user's attitude, uStream has eliminated the expense of user/kernel spacing context switching and the complexity that kernel relies on, and supports RDMA characteristic and the zero-copy mechanism of InfiniBand.It provides high performance communication by a stream interface for application.
UStream has adopted following key technology means to eliminate performance cost in the communication process, thereby reaches high bandwidth, the low performance that postpones.
1, separate double path (data/control) strategy;
2, unroll algorithm and pre-registration of buffering area;
3, non real-time is confirmed strategy;
4, asynchronous transmission mechanism and zero-copy.
Introduce the several key means that adopt among the uStream below respectively, briefly describe the communication process of uStream then.
Separate double path (data/control) strategy:
A complete communication process comprises transfer of data and control information exchange, and they have different requirements to communication.Transfer of data requires high bandwidth to postpone with low usually, and the exchange of control messages then depends on reliable connection and real-time.Therefore, uStream has adopted two independently paths: data path and control access, carry out the exchange of transfer of data and control messages respectively.Wherein, data path adopts the RDMA write operation to combine with zero-copy, reaches high bandwidth, the low performance that postpones.But because RDMA is monolateral operation, be not suitable for being used for realizing the exchange of control messages,,, thereby realize timely information interaction reliably as TCP, SDP or IPoIB so adopt based on the communication protocol of blocking the Send/Receive model control access of uStream.
As shown in Figure 1, in uStream, data path is responsible for transmits data packets, and the exchange of control messages is responsible in the control access.To introduce the structure of uStream packet and the type of control messages respectively below.
Packet structure:
The packet of uStream comprises two parts: packet header and data.Packet header is the array of one 32 byte, comprises 5 parameters: packet number (PSN, Packet Serial Number), the destination address of current bag, data segment, length, the destination address of unroll address and next bag.
Fig. 2 has described the structure in uStream packet packet header.Wherein, the address of unrolling is to be used for judging that whether a packet is by segmentation; When a packet by segmentation, its packet header and data can separately send.If a packet is not by segmentation, then its address of unrolling is NULL, otherwise, its destination address of address that unroll for its data segment.When packet during not by segmentation, its packet header and data send together with an integral body, and this moment, packet had only the destination address of an integral body, just was included in the destination address in the packet header; And when packet during by segmentation, its packet header and data are sent separately, this moment packet header and data segment just respectively a destination address, the destination address that comprises in the packet header to be arranged be the destination address in packet header, and the destination address of data segment just is placed in the address of unrolling.In addition, the destination address of next bag is to be used for telling the recipient position that next bag will write.
The control messages type:
UStream has four kinds of control messages to be used for realizing connection management and current control.Connection management message comprises connection initial message (Connection Initialization Message) and connection closed message (Connection Close Message).The former has carried QPN (Queue Pair Number) (formation check mark) and RDMA buffer information, comprises Remote Key, RDMA buffer zone address and size etc.The latter is used for notifying remote node to discharge resource closing when connecting, comprise RDMA buffering area, QP (Queue Pair, formation to) etc.Wherein, Remote Key is used to authorize the visit of remote node to this machine internal memory, and the RDMA buffer zone address is used to notify the initial address of transmit leg purpose buffering area.Flow control messages comprises ACK PLEASE and ACK REQUEST, as shown in fig. 1.Transmit leg and recipient use the state of flow control messages exchange recipient buffering area.
Buffering area unroll algorithm and pre-registration:
Buffering area is registered as communication process and has brought very big expense, and in order to eliminate the expense of buffering area registration, uStream adopts recipient's buffering area of registered in advance, simultaneously, can improve recipient's memory usage.Because recipient's buffering area of uStream is a registered in advance,, realize the recycling between recipient's buffer empty so must design an effective management algorithm.Therefore, the present invention proposes the algorithm that unrolls, not only solved the problem that reuses between buffer empty, and support stream interface and elongated bag.The algorithm that unrolls mainly solves when packet has arrived recipient's buffering area afterbody, how to handle the problem of buffering area space reuse.
The algorithm that unrolls has defined two kinds of operations of unrolling and solved this problem: segmentation is unrolled and not segmentation is unrolled, and Fig. 3 and Fig. 4 have described this two kinds of operations respectively.
As shown in Figure 3, segmentation unroll be when recipient's buffering area afterbody remaining can write the not enough length of data package in space continuously, but when enough writing a packet header, packet just is divided into two parts respectively and sends, wherein, packet header is written to the afterbody of recipient's buffering area, and data are then write the head of recipient's buffering area by unrolling.
Fig. 4 has described the situation that not segmentation is unrolled, as shown in the figure, when recipient's buffering area afterbody remaining can write packet header, space continuously when all not enough, whole packet will be written to the head of recipient's buffering area by unrolling.
In order to support elongated packet, in the algorithm that unrolls, the writing position in packet header is calculated by a last packet, and the writing position of data is then calculated by the notebook data bag.
For elongated packet, transmit leg is when sending current data packet, it is the size of not knowing next packet, and the length in packet header is (32 byte) of fixing, therefore when sending current data packet, transmit leg can only be judged the remaining packet header whether enough next ones wrap, space of can writing continuously of recipient's buffering area afterbody, and (just recipient's buffering area afterbody can write the space continuously after writing current packet, remaining can write the packet header whether space enough writes next bag continuously), if it is enough even continue the packet header of wrapping into next, if just the packet header of next one bag is write the head of recipient's buffering area inadequately, so the destination address in packet header is judged by a last packet.And the data segment, length of each packet is only just known when sending this packet, so when sending current data packet, judge recipient's buffering area afterbody remaining can write whether the break even data segment, length of packet of space continuously, if enough just data segment and packet header are write together, if will separate data segment and packet header inadequately, because the destination address in packet header is at the afterbody of recipient's buffering area, this has just determined that by a last packet destination address of data segment then will be set to the head of recipient's buffering area.
Non real-time is confirmed strategy:
In uStream, the copy of the in store recipient's buffer state of transmit leg (comprising: write how many data in the buffering area, what have confirmed, what have do not have to confirm).Connecting the starting stage of setting up, by the exchange of memory information, recipient's buffer state that transmitting-receiving two sides preserve is consistent.After data transfer phase, recipient's buffer state that transmit leg is preserved upgrades by the ACKREQUEST control messages.
Usually, each packet sends all should have ACK REQUEST message to confirm, but can cause so a large amount of acknowledge messages frequent interrupt data transmission procedure, thereby cause performance cost.Therefore, uStream has adopted non real-time affirmation strategy to realize flow-control mechanism.
Confirm in the strategy at non real-time, have only under two kinds of situations and can produce and send ACK REQUEST message: the one, when unacknowledged data length in recipient's buffering area surpasses pre-set threshold threshold, the recipient can initiatively send ACK REQUEST message, confirms to receive data and notify transmit leg to upgrade its state (being the data address of having confirmed in the buffering area); The 2nd, thus when too much causing, unacknowledged data in recipient's buffering area can write the space when not enough, and transmit leg can be found this situation and initiatively send ACK PLEASE message, asks the recipient to confirm data and reply ACK REQUEST message to upgrade its state.Fig. 5 and Fig. 6 have described both of these case respectively.Head_pos is the initial address of RDMA recipient's buffering area, and tail_pos is the tail address of RDMA recipient's buffering area; Ack_pos is the address of the data confirmed in RDMA recipient's buffering area; Nextpack_pos is the address that next packet will write.
Non real-time confirms that strategy can make the RDMA operation carry out continuously and repeatedly and not be identified the message interruption, eliminated the expense of frequent exchange acknowledge message, most of communication process of uStream only just can be finished by a RDMA write operation, thereby improved delay and bandwidth performance.
Asynchronous transmission mechanism and zero-copy:
In order to realize high bandwidth, uStream has used asynchronous transmission mechanism, and this mechanism is mainly by an asynchronous transmission formation with independently send thread and realize.The transmission buffering area of uStream also is a registered in advance, and is defined as the transmit queue be made up of a series of minibuffers district, wherein puts a packet in each minibuffer district.The process of transmitting of uStream has been realized zero-copy, is applied to the data copy that sends between the buffering area thereby eliminated.Fig. 7 has described the operating state of transmit queue and the state transition graph in each minibuffer district.
UStream has defined one and has independently sent thread and realize asynchronous transmission mechanism.5 operations that the process of transmitting of uStream is carried out by two parallel threads are finished, and these 5 operations are respectively: RDMA_malloc, Write, Flush, Post and Poll.Detailed process is as follows: at first, main thread call RDMA_malloc in transmit queue, inquire about one that can write and big or smallly be fit to deposit the minibuffer district (being the buffer area of transmit leg) that will send packet and obtain its address; Then, main thread calls Write and write data in the minibuffer district that obtains; Then, main thread calls flush and judges whether recipient's buffering area has enough spaces, if have and just wake the transmission thread up, if the request recipient does not confirm data and waits for then main thread transmits control message, after receiving the affirmation message that the recipient replys, recomputate the remaining space of recipient's buffering area, there are enough spaces just to wake the transmission thread up up to definite recipient's buffering area, otherwise repeat this process; Call Post after the transmission thread is waken up and send this packet; Send thread and also be responsible for obtaining the incident of finishing of transmit operation, revise the state in minibuffer district then and notify main thread, make it can be written into new packet again.
The transmission thread of uStream is sightless, because main thread can be operated minibuffer districts different in the transmit queue simultaneously with the transmission thread, therefore application needn't be waited until after a packet is sent completely and submit next packet again to, can make the work of transmit queue flowing water like this, thereby make uStream obtain quite high bandwidth performance.
The communication process of uStream:
Fig. 8 has described the communication process of uStream.UStream adopts two daemon threads to carry out the exchange of transfer of data and control messages respectively, and the hiding independent thread that sends moves at transmit leg.
As shown in Figure 8, the communication process of uStream may be summarized to be following four steps:
I, receiving-transmitting sides exchange handshaking information comprise QPN, RDMA buffer zone address and size etc.; Transmit leg need be known recipient's RDMA buffer zone address and size, just can carry out RDMA operation, and each side who connects may be transmit leg or recipient, all will exchange separately RDMA recipient's buffer information so receive and dispatch two sides.Wherein, QPN is used for creating a new connection on InfiniBand, specifically is to create formation earlier in this locality to QP, obtains QPN, tells remote node QPN by the exchange handshaking information then, connects and has just set up.In the communication process of uStream, transmit leg has been preserved the address and the size of recipient's buffering area, and is connecting the starting stage of setting up, and the state of recipient's buffering area that receiving-transmitting sides is preserved is consistent.And in data transmission procedure subsequently, recipient's buffer state that transmit leg is preserved is upgraded by the ACK REQUEST message that the recipient sends.
II, transmit leg are carried out the RDMA operation packet are directly write in recipient's the RDMA buffering area according to asynchronous transmission mechanism, and this process is a zero-copy.In the process of transmitting of uStream, transmit leg at first will calculate can writing the space continuously and whether enough writing the packet that will send of recipient's buffering area by recipient's buffer state that preserve this locality before carrying out each RDMA operation.
This computational process is operated by Flush and is finished, and its pseudo-code is as follows:
if(psn>ack_psn)
if(next_pack_pos>ack_pos)
consecutiveLen=tail_pos-next_pack_pos;
else
consecutiveLen=ack_pos-next_pack_pos;
else?if(psn=ack_psn)
if(next_pack_pos=ack_pos)
consecutiveLen=tail_pos-next_pack_pos;
else
exit?for?error;
else
exit?for?error;
If the space of can writing continuously of the recipient's buffering area that calculates writes the packet that will send inadequately, will adopt the algorithm judgment data bag that unrolls whether to want segmentation.After main thread has determined whether packet wants segmentation and destination address and the address of unrolling, just wake the transmission thread up and carry out the RDMA operation, send packet.
III, exchange ACK PLEASE and ACK REQUEST message are upgraded recipient's buffer state that transmit leg is preserved, and Here it is, and non real-time is confirmed strategy, and this process is finished by the control thread, and carry out alternately simultaneously suddenly with previous step.Confirm strategy as can be known by the non real-time of introducing above, the communication process of uStream only can exchange acknowledge message in both cases: the one, and when the unacknowledged data length in recipient's buffering area had surpassed predefined receive threshold threshold, the recipient can initiatively send ACK REQUEST message; Another kind of situation is not have enough can write the space continuously and write packet the time in transmit leg is found recipient's buffering area, and transmit leg can initiatively send ACK PLEASE message, asks the recipient to confirm to receive data and reply ACK REQUEST message.
IV, close connection.After application data (for example audio-video document) transmission was finished, transmit leg main thread notice control thread sent connection closed message, notify the recipient to connect and will close, and receiving-transmitting sides will discharge resource.Before connection closed, transmit leg will be guaranteed to have obtained all and be sent completely incident, otherwise both sides will wait thread to be sent to obtain to close connection again after all are sent completely incident.
Introduce delay and the bandwidth performance of uStream below respectively.Because the performance of SDP is well more a lot of than IPoIB, so the performance of uStream and SDP is mainly compared in following test.
Delay performance:
Fig. 9 is uStream and SDP delay performance result relatively.Wherein, recipient's buffering area is 256K, and the length of uStream transmit queue is 16, and 16 minibuffer districts are promptly arranged in the transmit queue.As can be seen from the figure, the delay performance of uStream parcel transmission has improved 40%~50% than SDP, and the delay performance of big bag transmission has improved 60%~75% than SDP, and its minimum delay performance can reach 7.9us, considerably beyond the 15.3us of SDP.
Figure 10 has shown uStream and uVerbs delay performance result relatively.As shown in figure 10, the delay performance of uStream has reached the level suitable with InfiniBand uVerbs.Specifically, the twice of the delay of uStream parcel transmission the chances are uVerbs, but along with packet increases, the gap of both delay performances reduces gradually, and reach identical delay performance during for 8K at packet, along with packet continue to increase, the delay performance of uStream even surmounted uVerbs.This is because the benchmark of uVerbs adopts is that synchronous post operates with poll, and is asynchronous in the realization of uStream.
From above test result as can be known the delay performance of uStream than SDP large increase has been arranged, below the communication semanteme that uses from uStream, realize its reason of angle analysis.
Pre-registration and zero-copy have been eliminated memory cost.The expense of internal memory registration is very big, and delay performance is made a big impact, and uStream has eliminated the expense of internal memory registration with copy by buffering area and the zero-copy mechanism of using registered in advance, thereby has played crucial effect to improving delay performance;
Non real-time is confirmed the tactful expense of having eliminated the frequent exchange acknowledge message.Most of communication process of uStream all is only to be finished by a RDMA write operation, and need not frequent exchange acknowledge message.SDP then needs to send SrcAvail and SinkAvail message earlier before sending data, and also will send WrComp or RdCompl message after data send;
Use the RDMA write operation.The data path of uStream is to adopt RDMA to write realization, and existing SDP
Realization is to use the Send/Recv model; The delay performance that RDMA writes is better than Send/Recv.
Bandwidth performance:
Figure 11 is uStream and SDP bandwidth performance result relatively, and wherein, uStream recipient's buffering area is 8M, and the length of transmit queue is 128.As can be seen from the figure, the high bandwidth of uStream can reach 10.4Gbps, considerably beyond the 7.8Gbps of SDP.Generally speaking, when parcel transmitted, the bandwidth performance of uStream had improved 30%~60% than SDP, and when big bag transmitted, its bandwidth performance had on average improved 30% than SDP.
As shown in figure 11, according to our test result, SDP reaches its high bandwidth performance when recipient's buffering area is the 85K left and right sides.And for uStream, owing to adopted the buffering area of pre-registration and the algorithm that unrolls, so big more its bandwidth performance of recipient's buffering area is good more.Because recipient's buffering area is big more, the time that reaches confirmation of receipt threshold value threshold is just long more, just can send the more data bag so before being identified the message interruption, thereby obtains higher bandwidth.In addition, because the buffering area of uStream is a pre-registration, so in the buffering area hour of log-on is not calculated in, so even registering big buffering area can not impact the performance of uStream yet, opposite SDP then may be affected.
Figure 12 has shown uStream and uVerbs bandwidth performance result relatively.As shown in figure 12, the peak bandwidth of uStream 10.4Gbps and the 11Gbps of uVerbs ten minutes are approaching, have proved that uStream can effectively utilize the communication link of bottom to reach the performance suitable with InfiniBand Verbs.
UStream can reach high like this bandwidth, mainly is because used asynchronous transmission formation and the independent thread that sends.Use uStream, use data are write and send buffering area and wake up to send and just can return immediately behind the thread and submit next packet to, and needn't wait pending data to be sent completely.In addition, non real-time confirms that strategy can reduce data and send interrupted number of times, makes communication process more smooth and easy.Like this, transmit queue just can be filled, and making always has lot of data in transmission in the data path of uStream.And, send thread one and be waken up, just the packet in the transmit queue can be sent in order and is not interrupted, unless there are the data in certain minibuffer district to be not ready for.This flowing water mode of operation makes uStream can make full use of the bandwidth of bottom InfiniBand Verbs.
Those skilled in the art can also carry out various modifications to above content under the condition that does not break away from the definite the spirit and scope of the present invention of claims.Therefore scope of the present invention is not limited in above explanation, but determine by the scope of claims.

Claims (17)

1. a method of utilizing the InfiniBand network to communicate is characterized in that, comprising:
Step 1, transmit leg and recipient exchange handshaking information, comprising the QPN, recipient's RDMA buffer zone address and recipient's the RDMA buffer size that are used for creating at the InfiniBand network a new connection;
Step 2, transmit leg is directly write current data packet in recipient's the RDMA buffering area according to described recipient's RDMA buffer zone address and described recipient's RDMA buffer size;
Step 3, transmit leg upgrades recipient's buffer state that transmit leg is preserved;
Step 4, after the application data transmission was finished, transmit leg was closed above-mentioned connection.
2. the method that the InfiniBand of utilization network as claimed in claim 1 communicates is characterized in that packet comprises packet header and data; Packet header comprises following parameter: the destination address of the data length of the packet number of current data packet, the destination address of current data packet, current data packet, unroll address and next packet, the address of wherein unrolling is used for being set to the destination address of the data of current data packet when packet header of current data packet and data are separately sent; The destination address of next packet is used to inform that next packet of recipient is with the position that is written into; The recipient RDMA buffer state that transmit leg is preserved according to this locality obtains the destination address of current data packet.
3. the method that the InfiniBand of utilization network as claimed in claim 1 communicates is characterized in that, in the step 1, transmit leg exchanges handshaking information with the recipient by being connected initial message.
4. the method that the InfiniBand of utilization network as claimed in claim 1 communicates is characterized in that, handshaking information also comprises the RDMA buffer size of the RDMA buffer zone address and the transmit leg of transmit leg.
5. the method that the InfiniBand of utilization network as claimed in claim 2 communicates is characterized in that, in the step 2:
If transmit leg is when sending current data packet, judge recipient's RDMA buffering area afterbody remaining can write the space continuously and write the data of current data packet inadequately the time, then the address of unrolling that comprises in the current data packet packet header is changed to the destination address of the data of current data packet; Transmit leg is directly write remaining can the writing continuously in the space of recipient RDMA buffering area afterbody according to the destination address of current data packet with the packet header of current data packet, according to the address of unrolling of current data packet the data of current data packet is written to the head of recipient RDMA buffering area then;
If transmit leg judge recipient's RDMA buffering area afterbody remaining can write the space continuously and enough write current data packet the time, current data packet is directly write in recipient's the RDMA buffering area according to the destination address of current data packet;
If transmit leg when sending current data packet, judge recipient's RDMA buffering area afterbody remaining can write the space continuously and write the packet header of next packet inadequately the time, then the destination address of the next packet that comprises in the current data packet packet header is changed to the initial address of recipient RDMA buffering area.
6. the method that the InfiniBand of utilization network as claimed in claim 2 communicates is characterized in that, in the step 3:
When unacknowledged data length in recipient's the buffering area surpassed pre-set threshold, the recipient initiatively sent first request message, confirmed the recipient's buffer state that receives data and notify transmit leg to upgrade its preservation; Perhaps
Can write the space when not enough when thereby unacknowledged data in recipient's the buffering area too much causes, transmit leg initiatively sends second request message, and the request recipient confirms data and replys first request message to upgrade recipient's buffer state of its preservation.
7. the method that the InfiniBand of utilization network as claimed in claim 5 communicates is characterized in that step 2 comprises:
Step 21, the main thread of transmit leg are inquired about a buffering area that is fit to deposit current data packet in transmit queue;
Step 22, the main thread of transmit leg writes data in this buffer area;
Step 23, the main thread of transmit leg judge whether recipient's buffer area has enough spaces, if having then wake the transmission thread of transmit leg up;
Step 24, after the transmission thread of transmit leg is waken up, with write direct recipient's RDMA buffering area of current data packet, and write recipient's RDMA buffering area at packet after, this buffering area of main thread of the transmission thread of transmit leg notice transmit leg can write new packet.
8. the method that the InfiniBand of utilization network as claimed in claim 1 communicates is characterized in that step 2 and step 3 are carried out alternately.
9. the method that the InfiniBand of utilization network as claimed in claim 6 communicates is characterized in that, in the step 3, the recipient uses recipient's control thread to send first request message, and transmit leg uses the control thread of transmit leg to send second request message.
10. a system that utilizes the InfiniBand network to communicate comprises transmit leg and recipient, it is characterized in that,
Transmit leg is used for exchanging handshaking information with the recipient, comprising QPN, recipient's RDMA buffer zone address and recipient's RDMA buffer size; QPN is used for creating a new connection at the InfiniBand network;
Transmit leg is also directly write current data packet in recipient's the RDMA buffering area according to recipient RDMA buffer zone address and RDMA buffer size; Upgrade recipient's buffer state that transmit leg is preserved; After the application data transmission is finished, close above-mentioned connection.
11. the system that the InfiniBand of utilization network as claimed in claim 10 communicates is characterized in that packet comprises packet header and data; Packet header comprises following parameter: the destination address of the data length of the packet number of current data packet, the destination address of current data packet, current data packet, unroll address and next packet, the address of wherein unrolling is used for being set to the destination address of the data of current data packet when packet header of current data packet and data are separately sent; The destination address of next packet is used to inform that next packet of recipient is with the position that is written into; The recipient RDMA buffer state that transmit leg is preserved according to this locality obtains the destination address of current data packet and the destination address of next packet.
12. the system that the InfiniBand of utilization network as claimed in claim 10 communicates is characterized in that, transmit leg exchanges handshaking information with the recipient by being connected initial message.
13. the system that the InfiniBand of utilization network as claimed in claim 10 communicates is characterized in that, handshaking information also comprises the RDMA buffer size of the RDMA buffer zone address and the transmit leg of transmit leg.
14. the system that the InfiniBand of utilization network as claimed in claim 11 communicates is characterized in that,
Transmit leg also is used for:
When transmit leg sends current data packet, judge recipient's RDMA buffering area afterbody remaining can write the space continuously and write the data of current data packet inadequately the time, then the address of unrolling that comprises in the current data packet packet header is changed to the destination address of the data of current data packet; Transmit leg is directly write remaining can the writing continuously in the space of recipient RDMA buffering area afterbody according to the destination address of current data packet with the packet header of current data packet, according to the address of unrolling of current data packet the data of current data packet is written to the head of recipient RDMA buffering area then;
When transmit leg sends current data packet, judge recipient's RDMA buffering area afterbody remaining can write the space continuously and write the packet header of next packet inadequately the time, then the destination address of the next packet that comprises in the current data packet packet header is changed to the initial address of RDMA buffering area.
15. the system that the InfiniBand of utilization network as claimed in claim 8 communicates is characterized in that,
The recipient also is used for initiatively sending first request message when unacknowledged data length in recipient's the buffering area surpasses pre-set threshold, confirms the recipient's buffer state that receives data and notify transmit leg to upgrade its preservation; Perhaps
Transmit leg, also be used for to write the space when not enough when thereby unacknowledged data in recipient's the buffering area too much causes, transmit leg initiatively sends second request message, and the request recipient confirms data and replys first request message to upgrade recipient's buffer state of its preservation.
16. the method that the InfiniBand of utilization network as claimed in claim 14 communicates is characterized in that, transmit leg comprises main thread and sends thread;
Main thread is used at buffering area that is fit to deposit current data packet of transmit queue inquiry; Main thread writes data in this buffer area; Whether the RDMA buffer area of judging the recipient has enough spaces, if having then wake the transmission thread up;
Send thread, the recipient's that is used for after being waken up current data packet write direct RDMA buffering area, and write recipient's RDMA buffering area at packet after, this buffering area of main thread of the transmission thread of transmit leg notice transmit leg can write new packet.
17. the method that the InfiniBand of utilization network as claimed in claim 15 communicates is characterized in that, the recipient uses recipient's control thread to send first request message, and transmit leg uses the control thread of transmit leg to send second request message.
CN2008102246636A 2008-10-22 2008-10-22 Method and system for communication using InfiniBand network Expired - Fee Related CN101409715B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2008102246636A CN101409715B (en) 2008-10-22 2008-10-22 Method and system for communication using InfiniBand network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2008102246636A CN101409715B (en) 2008-10-22 2008-10-22 Method and system for communication using InfiniBand network

Publications (2)

Publication Number Publication Date
CN101409715A true CN101409715A (en) 2009-04-15
CN101409715B CN101409715B (en) 2012-04-18

Family

ID=40572503

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2008102246636A Expired - Fee Related CN101409715B (en) 2008-10-22 2008-10-22 Method and system for communication using InfiniBand network

Country Status (1)

Country Link
CN (1) CN101409715B (en)

Cited By (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102404212A (en) * 2011-11-17 2012-04-04 曙光信息产业(北京)有限公司 Cross-platform RDMA (Remote Direct Memory Access) communication method based on InfiniBand
CN102404398A (en) * 2011-11-17 2012-04-04 曙光信息产业(北京)有限公司 Multi-client-side supported RDMA (Remote Direct Memory Access) message sending method
CN102438048A (en) * 2011-12-15 2012-05-02 北京新媒传信科技有限公司 Method and system for calling remote service from Internet
US20130191548A1 (en) * 2012-01-19 2013-07-25 International Business Machines Corporation Processing STREAMS Messages Over a System Area Network
CN103248467A (en) * 2013-05-14 2013-08-14 中国人民解放军国防科学技术大学 In-chip connection management-based RDMA communication method
CN103440202A (en) * 2013-08-07 2013-12-11 华为技术有限公司 RDMA-based (Remote Direct Memory Access-based) communication method, RDMA-based communication system and communication device
CN103716360A (en) * 2012-10-09 2014-04-09 宇瞻科技股份有限公司 Method for sharing files in network transmission system
WO2014186940A1 (en) * 2013-05-20 2014-11-27 华为技术有限公司 Hard disk and data processing method
CN105446936A (en) * 2015-11-16 2016-03-30 上海交通大学 Distributed hash table method based on HTM and one-way RDMA operation
CN105630426A (en) * 2016-01-07 2016-06-01 清华大学 Method and system for obtaining remote data based on RDMA (Remote Direct Memory Access) characteristics
CN107147722A (en) * 2017-05-19 2017-09-08 郑州云海信息技术有限公司 A kind of IB RTI methods based on RDMA communication mechanisms
CN107451092A (en) * 2017-08-09 2017-12-08 郑州云海信息技术有限公司 A kind of data transmission system based on IB networks
CN107579892A (en) * 2017-08-29 2018-01-12 郑州云海信息技术有限公司 A kind of communication means based on RapidIO agreements and RDMA technologies
WO2018077284A1 (en) * 2016-10-28 2018-05-03 北京市商汤科技开发有限公司 Communication method and system, electronic device and computer cluster
CN109067752A (en) * 2018-08-15 2018-12-21 无锡江南计算技术研究所 A method of compatible ICP/IP protocol is realized using RDMA Message
CN109117288A (en) * 2018-08-15 2019-01-01 无锡江南计算技术研究所 A kind of message optimisation method of low latency bypass
CN109274647A (en) * 2018-08-27 2019-01-25 杭州创谐信息技术股份有限公司 Distributed credible memory exchanges method and system
CN109691039A (en) * 2018-01-16 2019-04-26 华为技术有限公司 A kind of method and device of message transmissions
CN110602211A (en) * 2019-09-16 2019-12-20 无锡江南计算技术研究所 Out-of-order RDMA method and device with asynchronous notification
CN111158936A (en) * 2017-06-15 2020-05-15 北京忆芯科技有限公司 Method and system for queue exchange information
CN111314311A (en) * 2020-01-19 2020-06-19 苏州浪潮智能科技有限公司 Method, system, equipment and medium for improving performance of switch
CN111400213A (en) * 2019-09-29 2020-07-10 杭州海康威视系统技术有限公司 Method, device and system for transmitting data
CN111988241A (en) * 2020-08-20 2020-11-24 恒生电子股份有限公司 Message queuing method, system, device and storage medium
CN112003860A (en) * 2020-08-21 2020-11-27 上海交通大学 Memory management method, system and medium suitable for remote direct memory access
WO2021097802A1 (en) * 2019-11-22 2021-05-27 华为技术有限公司 Method for processing non-buffer data write request, and buffer and node
CN113422793A (en) * 2021-02-05 2021-09-21 阿里巴巴集团控股有限公司 Data transmission method and device, electronic equipment and computer storage medium
CN113572582A (en) * 2021-07-15 2021-10-29 中国科学院计算技术研究所 Data transmission and retransmission control method and system, storage medium and electronic device
CN115002047A (en) * 2022-05-20 2022-09-02 北京百度网讯科技有限公司 Remote direct data access method, device, equipment and storage medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1238796C (en) * 2002-10-30 2006-01-25 华为技术有限公司 Device and method for realizing interface conversion
CN1312577C (en) * 2003-05-07 2007-04-25 中兴通讯股份有限公司 Method for realizing communication process zero copy information queue
US7421488B2 (en) * 2003-08-14 2008-09-02 International Business Machines Corporation System, method, and computer program product for centralized management of an infiniband distributed system area network
US7620695B2 (en) * 2003-12-02 2009-11-17 International Business Machines Corporation Storing fibre channel information on an Infiniband administration data base
CN100464304C (en) * 2006-08-29 2009-02-25 飞塔信息科技(北京)有限公司 Device and method for realizing zero copy based on Linux operating system

Cited By (48)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102404398A (en) * 2011-11-17 2012-04-04 曙光信息产业(北京)有限公司 Multi-client-side supported RDMA (Remote Direct Memory Access) message sending method
CN102404212A (en) * 2011-11-17 2012-04-04 曙光信息产业(北京)有限公司 Cross-platform RDMA (Remote Direct Memory Access) communication method based on InfiniBand
CN102438048A (en) * 2011-12-15 2012-05-02 北京新媒传信科技有限公司 Method and system for calling remote service from Internet
CN102438048B (en) * 2011-12-15 2014-04-30 北京新媒传信科技有限公司 Method and system for calling remote service from Internet
US9037640B2 (en) * 2012-01-19 2015-05-19 International Business Machines Corporation Processing STREAMS messages over a system area network
US20130191548A1 (en) * 2012-01-19 2013-07-25 International Business Machines Corporation Processing STREAMS Messages Over a System Area Network
US20130191547A1 (en) * 2012-01-19 2013-07-25 International Business Machines Corporation Processing STREAMS Messages Over a System Area Network
CN103716360A (en) * 2012-10-09 2014-04-09 宇瞻科技股份有限公司 Method for sharing files in network transmission system
CN103248467A (en) * 2013-05-14 2013-08-14 中国人民解放军国防科学技术大学 In-chip connection management-based RDMA communication method
CN103248467B (en) * 2013-05-14 2015-10-28 中国人民解放军国防科学技术大学 Based on the RDMA communication means of sheet inner connection tube reason
WO2014186940A1 (en) * 2013-05-20 2014-11-27 华为技术有限公司 Hard disk and data processing method
CN103440202B (en) * 2013-08-07 2016-12-28 华为技术有限公司 A kind of communication means based on RDMA, system and communication equipment
CN103440202A (en) * 2013-08-07 2013-12-11 华为技术有限公司 RDMA-based (Remote Direct Memory Access-based) communication method, RDMA-based communication system and communication device
CN105446936A (en) * 2015-11-16 2016-03-30 上海交通大学 Distributed hash table method based on HTM and one-way RDMA operation
CN105446936B (en) * 2015-11-16 2018-07-03 上海交通大学 Distributed hashtable method based on HTM and unidirectional RDMA operation
CN105630426A (en) * 2016-01-07 2016-06-01 清华大学 Method and system for obtaining remote data based on RDMA (Remote Direct Memory Access) characteristics
WO2018077284A1 (en) * 2016-10-28 2018-05-03 北京市商汤科技开发有限公司 Communication method and system, electronic device and computer cluster
CN108011909A (en) * 2016-10-28 2018-05-08 北京市商汤科技开发有限公司 Communication means and system, electronic equipment and computer cluster
CN108011909B (en) * 2016-10-28 2020-09-01 北京市商汤科技开发有限公司 Communication method and system, electronic device and computer cluster
US10693816B2 (en) 2016-10-28 2020-06-23 Beijing Sensetime Technology Development Co., Ltd Communication methods and systems, electronic devices, and computer clusters
CN107147722A (en) * 2017-05-19 2017-09-08 郑州云海信息技术有限公司 A kind of IB RTI methods based on RDMA communication mechanisms
CN111158936B (en) * 2017-06-15 2024-04-09 北京忆芯科技有限公司 Method and system for exchanging information by queues
CN111158936A (en) * 2017-06-15 2020-05-15 北京忆芯科技有限公司 Method and system for queue exchange information
CN107451092A (en) * 2017-08-09 2017-12-08 郑州云海信息技术有限公司 A kind of data transmission system based on IB networks
CN107579892A (en) * 2017-08-29 2018-01-12 郑州云海信息技术有限公司 A kind of communication means based on RapidIO agreements and RDMA technologies
CN109691039A (en) * 2018-01-16 2019-04-26 华为技术有限公司 A kind of method and device of message transmissions
CN111654447A (en) * 2018-01-16 2020-09-11 华为技术有限公司 Message transmission method and device
US11716409B2 (en) 2018-01-16 2023-08-01 Huawei Technologies Co., Ltd. Packet transmission method and apparatus
CN111654447B (en) * 2018-01-16 2023-04-18 华为技术有限公司 Message transmission method and device
CN109691039B (en) * 2018-01-16 2020-04-28 华为技术有限公司 Message transmission method and device
CN109117288B (en) * 2018-08-15 2022-04-12 无锡江南计算技术研究所 Message optimization method for low-delay bypass
CN109067752B (en) * 2018-08-15 2021-03-26 无锡江南计算技术研究所 Method for realizing compatibility of TCP/IP protocol by using RDMA message
CN109067752A (en) * 2018-08-15 2018-12-21 无锡江南计算技术研究所 A method of compatible ICP/IP protocol is realized using RDMA Message
CN109117288A (en) * 2018-08-15 2019-01-01 无锡江南计算技术研究所 A kind of message optimisation method of low latency bypass
CN109274647B (en) * 2018-08-27 2021-08-10 杭州创谐信息技术股份有限公司 Distributed trusted memory exchange method and system
CN109274647A (en) * 2018-08-27 2019-01-25 杭州创谐信息技术股份有限公司 Distributed credible memory exchanges method and system
CN110602211A (en) * 2019-09-16 2019-12-20 无锡江南计算技术研究所 Out-of-order RDMA method and device with asynchronous notification
CN110602211B (en) * 2019-09-16 2022-06-14 无锡江南计算技术研究所 Out-of-order RDMA method and device with asynchronous notification
CN111400213A (en) * 2019-09-29 2020-07-10 杭州海康威视系统技术有限公司 Method, device and system for transmitting data
WO2021097802A1 (en) * 2019-11-22 2021-05-27 华为技术有限公司 Method for processing non-buffer data write request, and buffer and node
US11789866B2 (en) 2019-11-22 2023-10-17 Huawei Technologies Co., Ltd. Method for processing non-cache data write request, cache, and node
CN111314311A (en) * 2020-01-19 2020-06-19 苏州浪潮智能科技有限公司 Method, system, equipment and medium for improving performance of switch
CN111988241A (en) * 2020-08-20 2020-11-24 恒生电子股份有限公司 Message queuing method, system, device and storage medium
CN112003860A (en) * 2020-08-21 2020-11-27 上海交通大学 Memory management method, system and medium suitable for remote direct memory access
CN112003860B (en) * 2020-08-21 2021-09-21 上海交通大学 Memory management method, system and medium suitable for remote direct memory access
CN113422793A (en) * 2021-02-05 2021-09-21 阿里巴巴集团控股有限公司 Data transmission method and device, electronic equipment and computer storage medium
CN113572582A (en) * 2021-07-15 2021-10-29 中国科学院计算技术研究所 Data transmission and retransmission control method and system, storage medium and electronic device
CN115002047A (en) * 2022-05-20 2022-09-02 北京百度网讯科技有限公司 Remote direct data access method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN101409715B (en) 2012-04-18

Similar Documents

Publication Publication Date Title
CN101409715B (en) Method and system for communication using InfiniBand network
US11899596B2 (en) System and method for facilitating dynamic command management in a network interface controller (NIC)
US10652367B2 (en) Reducing network latency
US8140696B2 (en) Layering serial attached small computer system interface (SAS) over ethernet
CN105531685B (en) The port general PCI EXPRESS
TWI332150B (en) Processing data for a tcp connection using an offload unit
CN104011696B (en) Explicit flow control for implied memory registration
CN101340574B (en) Method and system realizing zero-copy transmission of stream media data
US20060075057A1 (en) Remote direct memory access system and method
US20080002578A1 (en) Network with a constrained usage model supporting remote direct memory access
US20110106905A1 (en) Direct sending and asynchronous transmission for rdma software implementations
US8356112B1 (en) Intelligent network adaptor with end-to-end flow control
JP2006033854A (en) Method of enabling transmission between nodes, system, and program
US7826350B1 (en) Intelligent network adaptor with adaptive direct data placement scheme
TW200814672A (en) Method and system for a user space TCP offload engine (TOE)
WO2017186042A1 (en) Method and device for data transmission in virtual switch technique
US7536468B2 (en) Interface method, system, and program product for facilitating layering of a data communications protocol over an active message layer protocol
WO2012106934A1 (en) Device, link energy management method and link energy management system for peripheral component interconnect (pci) express
US11403253B2 (en) Transport protocol and interface for efficient data transfer over RDMA fabric
JP2010183450A (en) Network interface device
US7788437B2 (en) Computer system with network interface retransmit
CN111459417A (en) NVMeoF storage network-oriented lock-free transmission method and system
Chadalapaka et al. A study of iSCSI extensions for RDMA (iSER)
US8589587B1 (en) Protocol offload in intelligent network adaptor, including application level signalling
CN106302426A (en) A kind of udp protocol stack implementation method of band retransmission mechanism based on FPGA

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20120418

Termination date: 20201022

CF01 Termination of patent right due to non-payment of annual fee