CN114944966B - RDMA multicast-based data transmission method and system - Google Patents

RDMA multicast-based data transmission method and system Download PDF

Info

Publication number
CN114944966B
CN114944966B CN202210414928.9A CN202210414928A CN114944966B CN 114944966 B CN114944966 B CN 114944966B CN 202210414928 A CN202210414928 A CN 202210414928A CN 114944966 B CN114944966 B CN 114944966B
Authority
CN
China
Prior art keywords
terminal
receiving terminal
information
network information
switch
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210414928.9A
Other languages
Chinese (zh)
Other versions
CN114944966A (en
Inventor
赵铭
林圳杰
王晓亮
刘德瑞
林强
王李明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Southern Power Grid Digital Platform Technology Guangdong Co ltd
Original Assignee
China Southern Power Grid Digital Platform Technology Guangdong Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Southern Power Grid Digital Platform Technology Guangdong Co ltd filed Critical China Southern Power Grid Digital Platform Technology Guangdong Co ltd
Priority to CN202210414928.9A priority Critical patent/CN114944966B/en
Publication of CN114944966A publication Critical patent/CN114944966A/en
Application granted granted Critical
Publication of CN114944966B publication Critical patent/CN114944966B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/02Details
    • H04L12/16Arrangements for providing special services to substations
    • H04L12/18Arrangements for providing special services to substations for broadcast or conference, e.g. multicast
    • H04L12/185Arrangements for providing special services to substations for broadcast or conference, e.g. multicast with management of multicast group membership
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/16Multipoint routing

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a data transmission method and a system based on RDMA multicast, wherein the method comprises the following steps: the sending terminal sends a datagram Wen Zhijiao for switching; the switch judges whether target terminal network information in the data message is matched with the first terminal network information according to the first terminal network information of the first receiving terminal stored in advance, and a first judging result is obtained; if the first judgment result is yes, the switch forwards the data message to the first receiving terminal and the second receiving terminal. Therefore, by designing a data transmission mode based on the RDMA multicast frame, according to the binding relation between a sender and a receiver under the RDMA multicast frame, only one data message is required to be sent by the data sender in the data transmission process, and the message is copied and forwarded to a plurality of receivers by utilizing the copying capability of the switch, so that the network bandwidth can be saved and the data transmission can be quickened.

Description

RDMA multicast-based data transmission method and system
Technical Field
The present invention relates to the field of communications technologies, and in particular, to a data transmission method and system based on RDMA multicast.
Background
The existing RDMA (Remote Direct Memory Access ) technology realizes the direct access to the remote memory under the condition that the remote CPU is not aware by means of sinking a protocol stack, kernel bypass and the like, thereby realizing low delay and high throughput of the network. However, in the existing RDMA transmission mode, each receiving terminal establishes a one-to-one bidirectional connection relationship with the transmitting terminal, and under the condition that one transmitting terminal corresponds to more receiving terminals, a large amount of data copying operations need to be performed on the transmitting terminal, so that a large amount of network bandwidth is wasted, additional delay is introduced, and overall data processing time is increased. Therefore, it is important to propose an RDMA solution that saves bandwidth and speeds up data transfer.
Disclosure of Invention
The invention aims to solve the technical problem of providing a data transmission method and a system based on RDMA multicast, which can save bandwidth and quicken data transmission by designing a data transmission mode based on an RDMA multicast frame.
In order to solve the technical problem, the first aspect of the present invention discloses a data transmission method based on RDMA multicast, the method is applied to an RDMA multicast system, and the RDMA multicast system comprises a switch, a sending terminal, a first receiving terminal and a second receiving terminal; the sending terminal establishes bidirectional connection with the first receiving terminal, and the sending terminal establishes unidirectional connection with the second receiving terminal; the method comprises the following steps:
the sending terminal sends a data message to the switch;
The switch judges whether target terminal network information in the data message is matched with the first terminal network information according to the prestored first terminal network information of the first receiving terminal, and a first judging result is obtained;
And if the first judgment result is yes, the switch forwards the data message to the first receiving terminal and the second receiving terminal.
As an optional implementation manner, in the first aspect of the present invention, before the sending terminal sends the data packet to the switch, the method further includes:
the transmitting terminal transmits QP information thereof to the first receiving terminal and the second receiving terminal;
the second receiving terminal binds the QP corresponding to each second receiving terminal with QP information of the sending terminal so as to establish unidirectional connection with the sending terminal;
the QP of the first receiving terminal is bound with QP information of the sending terminal, and the QP information of the first receiving terminal is sent to the sending terminal;
the sending terminal receives QP information of the first receiving terminal, and binds the QP of the sending terminal with QP information of the first receiving terminal so as to establish bidirectional connection between the sending terminal and the first receiving terminal.
As an optional implementation manner, in the first aspect of the present invention, after the sending terminal receives the QP information of the first receiving terminal and binds the QP of the sending terminal with the QP information of the first receiving terminal to establish a bidirectional connection between the sending terminal and the first receiving terminal, and before the sending terminal sends a data packet to the switch, the method further includes:
the first receiving terminal and the second receiving terminal send terminal network information to the switch; the terminal network information comprises second terminal network information of the second receiving terminal and first terminal network information of the first receiving terminal;
And the switch creates a multicast member table according to the terminal network information.
As an optional implementation manner, in the first aspect of the present invention, the multicast member table includes an exact table and a linear table; the switch creates a multicast member table according to the terminal network information, and the method comprises the following steps:
The switch creates the accurate table according to the first terminal network information of the first receiving terminal, wherein the accurate table is used for judging whether the target terminal network information in the data message is matched with the first terminal network information of the first receiving terminal;
The switch creates the linear table according to the first terminal network information of the first receiving terminal and the second terminal network information of all the second receiving terminals, and the linear table is used for copying and editing data messages and linking the linear table to the precise table.
As an optional implementation manner, in the first aspect of the present invention, the forwarding, by the switch, the data packet to the first receiving terminal and the second receiving terminal includes:
the switch replicates the data message by traversing the linear table to obtain a plurality of replicated messages with the number being the sum of the numbers of the first receiving terminal and the second receiving terminal;
The switch correspondingly modifies the target terminal network information in the plurality of replication messages into first terminal network information of the first receiving terminal and second terminal network information corresponding to all the second receiving terminals respectively to obtain a plurality of modified replication messages;
And the switch forwards the plurality of modified copy messages to the first receiving terminal and the second receiving terminal respectively.
As an optional implementation manner, in the first aspect of the present invention, the first terminal network information includes at least one of an IP address, QP information, RDMA transmission identification information, and device identification information of the first receiving terminal;
And/or the second terminal network information includes at least one of IP address, QP information, RDMA transfer identity information, and device identity information of the second receiving terminal;
And/or the target terminal network information comprises at least one of IP address, QP information, RDMA transmission identification information and equipment identification information of a target transmission terminal of the data message.
As an optional implementation manner, in the first aspect of the present invention, after the switch forwards the data packet to the first receiving terminal and the second receiving terminal, the method further includes:
The first receiving terminal and all the second receiving terminals send respective ACK messages corresponding to the data messages to the switch;
For any one of the ACK messages, the switch judges whether the source terminal network information in the ACK message is matched with the terminal network information of the corresponding receiving terminal according to the pre-stored first terminal network information to obtain a second judging result;
if the second judgment result is yes, adding 1 to the ACK count value of the switch;
when the ACK count value is judged to be equal to the sum of the numbers of the first receiving terminal and all the second receiving terminals, the switch determines one of the plurality of ACK messages as a corrected ACK message, changes the source terminal network information of the corrected ACK message into first terminal network information of the first receiving terminal, and sends the corrected ACK message to the sending terminal.
As an optional implementation manner, in the first aspect of the present invention, the source terminal network information includes at least one of an IP address, QP information, RDMA transmission identification information, and device identification information of a source terminal of the ACK packet.
As an optional implementation manner, in the first aspect of the present invention, after the switch forwards the data packet to the first receiving terminal and the second receiving terminal, the method further includes:
under the condition that the overtime retransmission condition is met, triggering overtime retransmission by the sending terminal;
Wherein the timeout retransmission condition includes: and in a preset time period after the switch forwards the data message to the first receiving terminal and the second receiving terminal, the sending terminal does not receive the ACK message corresponding to the data message.
The second aspect of the invention discloses a data transmission system based on RDMA multicast, which comprises a switch, a sending terminal, a first receiving terminal and a second receiving terminal; the sending terminal establishes bidirectional connection with the first receiving terminal, and the sending terminal establishes unidirectional connection with the second receiving terminal; wherein:
The sending terminal is used for sending a data message to the switch;
The switch is used for judging whether target terminal network information in the data message is matched with the first terminal network information according to the prestored first terminal network information of the first receiving terminal, so as to obtain a first judgment result;
and the switch is further used for forwarding the data message to the first receiving terminal and the second receiving terminal when the first judging result is yes.
As an alternative embodiment, in a second aspect of the invention,
The sending terminal is further configured to send QP information thereof to the first receiving terminal and the second receiving terminal;
The second receiving terminal is used for binding QPs corresponding to the second receiving terminal with QP information of the sending terminal so as to establish unidirectional connection with the sending terminal;
The first receiving terminal is used for binding QP information of the first receiving terminal with QP information of the sending terminal and sending the QP information of the first receiving terminal to the sending terminal;
The sending terminal is further configured to receive QP information of the first receiving terminal, and bind the QP of the sending terminal with the QP information of the first receiving terminal, so as to establish bidirectional connection between the sending terminal and the first receiving terminal.
As an optional implementation manner, in the second aspect of the present invention, the first receiving terminal and the second receiving terminal are further configured to send terminal network information to the switch; the terminal network information comprises second terminal network information of the second receiving terminal and first terminal network information of the first receiving terminal;
The switch is also used for creating a multicast member table according to the terminal network information.
As an alternative embodiment, in the second aspect of the present invention, the multicast member table includes an exact table and a linear table; the specific mode of the switch for creating the multicast member table according to the terminal network information comprises the following steps:
The switch creates the accurate table according to the first terminal network information of the first receiving terminal, wherein the accurate table is used for judging whether the target terminal network information in the data message is matched with the first terminal network information of the first receiving terminal;
The switch creates the linear table according to the first terminal network information of the first receiving terminal and the second terminal network information of all the second receiving terminals, and the linear table is used for copying and editing data messages and linking the linear table to the precise table.
As an optional implementation manner, in the second aspect of the present invention, the switch is further configured to forward the data packet to the first receiving terminal and the second receiving terminal, including:
The switch is further configured to copy the data packet by traversing the linear table, to obtain a plurality of copy packets with a number equal to a sum of the numbers of the first receiving terminal and the second receiving terminal;
The switch is further configured to correspondingly modify the target terminal network information in the multiple replication messages into first terminal network information of the first receiving terminal and second terminal network information corresponding to all the second receiving terminals, so as to obtain multiple modified replication messages;
The switch is further configured to forward the plurality of modified duplicate packets to the first receiving terminal and the second receiving terminal, respectively.
As an optional implementation manner, in the second aspect of the present invention, the first terminal network information includes at least one of an IP address, QP information, RDMA transmission identification information, and device identification information of the first receiving terminal;
And/or the second terminal network information includes at least one of IP address, QP information, RDMA transfer identity information, and device identity information of the second receiving terminal;
And/or the target terminal network information comprises at least one of IP address, QP information, RDMA transmission identification information and equipment identification information of a target transmission terminal of the data message.
As an alternative embodiment, in a second aspect of the invention,
The first receiving terminal and all the second receiving terminals are used for sending respective ACK messages corresponding to the data messages to the switch;
The switch is further configured to determine, for any one of the ACK messages, whether source terminal network information in the ACK message matches with terminal network information of a corresponding receiving terminal according to the first terminal network information stored in advance, to obtain a second determination result, and if the second determination result is yes, trigger an ACK count value to be increased by 1;
The switch is further configured to determine one of the plurality of ACK messages as a corrected ACK message when the ACK count value is determined to be equal to a sum of the numbers of the first receiving terminal and all the second receiving terminals, change source terminal network information of the corrected ACK message to first terminal network information of the first receiving terminal, and send the corrected ACK message to the transmitting terminal.
As an optional implementation manner, in the second aspect of the present invention, the source terminal network information includes at least one of an IP address, QP information, RDMA transmission identification information, and device identification information of a source terminal of the ACK packet.
As an optional implementation manner, in the second aspect of the present invention, the sending terminal is further configured to trigger a timeout retransmission if a timeout retransmission condition is met;
Wherein the timeout retransmission condition includes: and in a preset time period after the switch forwards the data message to the first receiving terminal and the second receiving terminal, the sending terminal does not receive the ACK message corresponding to the data message.
Compared with the prior art, the embodiment of the invention has the following beneficial effects:
In the embodiment of the invention, a sending terminal sends a datagram Wen Zhijiao for switching, and a switch judges whether target terminal network information in the datagram is matched with first terminal network information according to the first terminal network information of a first receiving terminal stored in advance, so that a first judgment result is obtained; if the first judgment result is yes, the switch forwards the data message to the first receiving terminal and the second receiving terminal. Therefore, by designing a data transmission mode based on the RDMA multicast frame, according to the binding relation between a sender and a receiver under the RDMA multicast frame, only one data message is required to be sent by the data sender in the data transmission process, and the message is copied and forwarded to a plurality of receivers by utilizing the copying capability of the switch, so that the network bandwidth can be saved and the data transmission can be quickened.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic flow diagram of a data transmission method based on RDMA multicasting according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart of another RDMA multicast-based data transmission method according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a RDMA multicast-based data transmission system according to an embodiment of the present invention;
Fig. 4 is a schematic diagram of a multicast member table according to an embodiment of the present invention;
Fig. 5 is a schematic diagram of another multicast member table disclosed in an embodiment of the present invention.
Detailed Description
In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The terms first, second and the like in the description and in the claims and in the above-described figures are used for distinguishing between different objects and not necessarily for describing a sequential or chronological order. Furthermore, the terms "comprise" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, apparatus, article, or article that comprises a list of steps or elements is not limited to only those listed but may optionally include other steps or elements not listed or inherent to such process, method, article, or article.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the invention. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.
The invention discloses a data transmission method and a system based on RDMA multicast, which can save network bandwidth and accelerate data transmission. The following will describe in detail.
Example 1
Referring to fig. 1, fig. 1 is a flow chart of a data transmission method based on RDMA multicast according to an embodiment of the present invention. The method is applied to an RDMA multicast system, and the RDMA multicast system comprises a switch, a sending terminal, a first receiving terminal and a second receiving terminal; the sending terminal establishes a bidirectional connection with the first receiving terminal, and the sending terminal establishes a unidirectional connection with the second receiving terminal. As shown in fig. 1, the data transmission method based on RDMA multicast may include the following operations:
101. The transmitting terminal transmits the data message to the switch.
Here, the operation of the transmitting terminal to transmit the data message may occur in various application scenarios, such as offline searching, big data processing, high performance computing, distributed storage, and the like.
Optionally, the switch comprises one of a TOR topology switch, an EOR topology switch, and a MOR topology switch. Preferably, the switch is a TOR topology switch.
102. The switch judges whether target terminal network information in the data message is matched with the first terminal network information according to the prestored first terminal network information of the first receiving terminal, and a first judging result is obtained.
Optionally, the first terminal network information includes at least one of IP address, QP information, RDMA transmission identification information, and device identification information of the first receiving terminal; the target terminal network information includes at least one of IP address, QP information, RDMA transfer identification information, and device identification information of a target transfer terminal of the data packet.
103. If the first judgment result is yes, the switch forwards the data message to the first receiving terminal and the second receiving terminal.
Therefore, by implementing the embodiment of the invention, by designing the data transmission mode based on the RDMA multicast frame, according to the binding relation between the sender and the receiver under the RDMA multicast frame, the data sender only needs to send one data message in the data transmission process, and the message is copied and forwarded to a plurality of receivers by utilizing the copying capability of the switch, so that the network bandwidth can be saved and the data transmission can be quickened.
In an alternative embodiment, before the step 101, the data transmission method based on RDMA multicast further includes:
The transmitting terminal transmits QP information thereof to the first receiving terminal and the second receiving terminal;
The second receiving terminal binds the QP corresponding to each receiving terminal with QP information of the sending terminal so as to establish unidirectional connection with the sending terminal;
the QP of the first receiving terminal is bound with QP information of the sending terminal, and the QP information of the first receiving terminal is sent to the sending terminal;
The transmitting terminal receives QP information of the first receiving terminal, and binds the QP of the transmitting terminal with the QP information of the first receiving terminal to establish bidirectional connection between the transmitting terminal and the first receiving terminal.
Here, a brief explanation will be given of QP in this alternative embodiment: RDMA supports three queues, a Send Queue (SQ), a Receive Queue (RQ), and a Completion Queue (CQ), respectively, where SQ and RQ are typically created in pairs, referred to as Queue Pairs (QP), the QP in this alternative embodiment.
Optionally, the QP information includes identification information of the QP required for the RDMA connection.
It can be seen that this optional embodiment gives a specific way for the sending terminal to establish a bidirectional connection with the first receiving terminal and for the second receiving terminal to connect to the sending terminal in one way, and by implementing this optional embodiment, the bidirectional connection between the sending terminal and the first receiving terminal and the unidirectional connection between the sending terminal and the second receiving terminal are completed, so that the subsequent sending terminal only needs to send one time of datagram Wen Bianke to implement RDMA data transmission with the first receiving terminal and the second receiving terminal.
In an alternative embodiment, after the sending terminal receives the QP information of the first receiving terminal and binds the QP of the sending terminal with the QP information of the first receiving terminal to establish a bidirectional connection between the sending terminal and the first receiving terminal, and before the sending terminal sends the data packet to the switch, the method further includes:
the first receiving terminal and the second receiving terminal send terminal network information to the exchanger;
The switch creates a multicast member table according to the terminal network information.
Optionally, the terminal network information includes second terminal network information of the second receiving terminal and first terminal network information of the first receiving terminal.
It can be seen that by implementing this optional embodiment, the switch may create the multicast member table in advance according to the network information of the first receiving terminal and the second receiving terminal, so that after receiving the data packet of the sending terminal, the switch may copy the packet according to the members of the multicast member table and forward the packet to all the members of the multicast member table, thereby implementing one-to-many data transmission.
In an alternative embodiment, the multicast member table includes an exact table and a linear table; the switch creates a multicast member table according to the terminal network information, including:
The switch creates an accurate table according to the first terminal network information of the first receiving terminal, wherein the accurate table is used for judging whether the target terminal network information in the data message is matched with the first terminal network information of the first receiving terminal;
The switch creates a linear table according to the first terminal network information of the first receiving terminal and the second terminal network information of all the second receiving terminals, the linear table is used for copying and editing the data message, and the linear table is linked to the accurate table.
Optionally, before the switch creates the multicast membership table according to the terminal network information, an IGMP (Internet Group Manage Protocol, internet group management protocol) multicast protocol is enabled. Wherein the IGMP multicast protocol includes one of IGMPv1 multicast protocol, IGMPv2 multicast protocol and IGMPv3 multicast protocol. By this arrangement, multicast membership is established and maintained between the receiving terminal and its immediate neighboring multicast router.
Preferably, the switch enables IGMPv3 multicast protocol before creating the multicast member table according to the terminal network information.
Optionally, the first terminal network information includes at least one of IP address, QP information, RDMA transfer identity information, and device identity information of the first receiving terminal. Preferably, the first terminal network information must include QP information for the first receiving terminal and at least one of IP address, RDMA transfer identity information, and device identity information. By the arrangement, the switch can judge whether the data message is matched with the first receiving terminal or not more accurately.
Optionally, the second terminal network information includes at least one of IP address, QP information, RDMA transfer identity information, and device identity information of the second receiving terminal. Preferably, the second terminal network information must include QP information for the second receiving terminal and at least one of an IP address, RDMA transfer identification information, and device identification information. By the arrangement, the accuracy of copying and editing the data message by the linear table can be ensured.
It can be seen that this optional embodiment gives a specific way for the switch to create the multicast member table according to the first terminal network information and the second terminal network information, so that the created multicast member table can be used for copy editing and forwarding of the data packet.
In an alternative embodiment, the switch forwards the data message to the first receiving terminal and the second receiving terminal, including:
The switch replicates the data messages by traversing the linear table to obtain a plurality of replicated messages with the number being the sum of the numbers of the first receiving terminal and the second receiving terminal;
the switch correspondingly modifies the target terminal network information in the plurality of copying messages into first terminal network information of a first receiving terminal and second terminal network information corresponding to all second receiving terminals respectively to obtain a plurality of modified copying messages;
the switch forwards the plurality of modified copy messages to the first receiving terminal and the second receiving terminal respectively.
It can be seen that this alternative embodiment gives a specific way for the switch to forward data messages to the first receiving terminal and the second receiving terminal, thus enabling a one-to-many RDMA data transfer.
In an optional embodiment, after the step 103, the data transmission method based on RDMA multicast further includes:
And triggering the timeout retransmission by the sending terminal under the condition that the timeout retransmission condition is met.
Wherein the timeout retransmission condition includes: and in a preset time period after the switch forwards the data message to the first receiving terminal and the second receiving terminal, the sending terminal does not receive the ACK message corresponding to the data message.
It can be seen that by implementing this alternative embodiment, in case of packet loss, the lossless transmission of data RDMA can be guaranteed by triggering retransmission.
Example two
Referring to fig. 2, fig. 2 is a flow chart of a data transmission method based on RDMA multicast according to an embodiment of the present invention. The method is applied to an RDMA multicast system, and the RDMA multicast system comprises a switch, a sending terminal, a first receiving terminal and a second receiving terminal; the sending terminal establishes a bidirectional connection with the first receiving terminal, and the sending terminal establishes a unidirectional connection with the second receiving terminal. As shown in fig. 2, the RDMA multicast-based data transmission method may include the following operations:
201. the transmitting terminal transmits the data message to the switch.
202. The switch judges whether target terminal network information in the data message is matched with the first terminal network information according to the prestored first terminal network information of the first receiving terminal, and a first judging result is obtained.
203. If the first judgment result is yes, the switch forwards the data message to the first receiving terminal and the second receiving terminal.
204. The first receiving terminal and all the second receiving terminals send the ACK messages corresponding to the data messages to the exchanger.
205. For any ACK message, the switch judges whether the source terminal network information in the ACK message is matched with the terminal network information of the corresponding receiving terminal according to the pre-stored first terminal network information, and a second judging result is obtained.
206. If the second judgment result is yes, the ACK count value of the switch is increased by 1.
207. When the ACK count value is judged to be equal to the sum of the numbers of the first receiving terminal and all the second receiving terminals, the switch determines one of the plurality of ACK messages as a corrected ACK message, changes the source terminal network information of the corrected ACK message into first terminal network information of the first receiving terminal, and sends the corrected ACK message to the sending terminal.
The specific technical details and the explanation of the nouns in the steps 201 to 203 may refer to the descriptions of the steps 101 to 103 in the first embodiment, and are not repeated here.
Optionally, the source terminal network information includes at least one of IP address, QP information, RDMA transmission identification information, and device identification information of the source terminal of the ACK message. Preferably, the source terminal network information must include QP information for the source terminal with an ACK message, and at least one of IP address, RDMA transfer identification information, and device identification information. By this arrangement, the data transmission process can be made more reliable.
In a specific embodiment of the RDMA multicast-based data transmission method described in the present invention, the first terminal network information includes QP information and IP address of the first receiving terminal, and the second terminal network information includes QP information and IP address of the second receiving terminal, so that in a forward process of data transmission from the sending terminal to the receiving terminal, as shown in fig. 4, a QP Number field and an IP field in an RDMA header are selected as keys of an accurate table. Firstly, the exchanger analyzes an IP field of a network layer in a message sent by a sending terminal and a QP Number field in an RDMA message, so that matching is carried out, and after the QP information and the IP address are confirmed to be matched, a corresponding linear table is found. And traversing the linear table when editing and copying the data message.
As further shown in fig. 5, in the process of transmitting the ACK acknowledgement message in the reverse direction, the switch performs the precision table analysis according to the QP Number and the IP field in the ACK message, and after confirming that the QP information and the IP address match, obtains the QP information and the IP address of the first receiving end from the linear table, so as to send the corrected ACK message to the sending terminal.
It can be seen that, after receiving the data message sent by the sending terminal, the receiving terminal returns the respective acknowledgement message indicating that the data message is received correctly to the switch, and the switch edits and sends one of the acknowledgement messages to the sending terminal according to the multicast member table, so that the sending terminal can determine the lossless transmission of the data without receiving the acknowledgement message sent by each receiving terminal.
Example III
Referring to fig. 3, fig. 3 is a schematic structural diagram of a data transmission system based on RDMA multicast according to an embodiment of the present invention. The data transmission system comprises a switch, a sending terminal, a first receiving terminal and a second receiving terminal; the sending terminal establishes a bidirectional connection with the first receiving terminal, and the sending terminal establishes a unidirectional connection with the second receiving terminal. The RDMA multicast data transmission system is as shown in FIG. 3:
the sending terminal is used for sending the datagram Wen Zhijiao for switching machine;
The switch is used for judging whether target terminal network information in the data message is matched with the first terminal network information according to the first terminal network information of the first receiving terminal stored in advance, and obtaining a first judging result;
and the switch is also used for forwarding the data message to the first receiving terminal and the second receiving terminal when the first judging result is yes.
It can be seen that, by implementing the embodiment shown in fig. 3, through the cooperation of the switch, the sending terminal, the first receiving terminal and the second receiving terminal, according to the binding relationship between the sender and the receiver under the RDMA multicast frame, the data sender only needs to send one data message in the data transmission process, and the message is copied and forwarded to multiple receivers by using the copying capability of the switch, so that the network bandwidth can be saved and the data transmission can be quickened.
In an alternative embodiment, the transmitting terminal is further configured to transmit its QP information to the first receiving terminal and the second receiving terminal;
the second receiving terminal is used for binding QP information of the sending terminal with QP information corresponding to the second receiving terminal so as to establish unidirectional connection with the sending terminal;
the first receiving terminal is used for binding QP information of the first receiving terminal with QP information of the sending terminal and sending the QP information of the first receiving terminal to the sending terminal;
The sending terminal is further configured to receive QP information of the first receiving terminal, and bind the QP of the sending terminal with the QP information of the first receiving terminal, so as to establish bidirectional connection between the sending terminal and the first receiving terminal.
It can be seen that this optional embodiment gives a specific way for the sending terminal to establish a bidirectional connection with the first receiving terminal and for the second receiving terminal to connect to the sending terminal in one way, and by implementing this optional embodiment, the bidirectional connection between the sending terminal and the first receiving terminal and the unidirectional connection between the sending terminal and the second receiving terminal are completed, so that the subsequent sending terminal only needs to send one time of datagram Wen Bianke to implement RDMA data transmission with the first receiving terminal and the second receiving terminal.
In an alternative embodiment, the first receiving terminal and the second receiving terminal are further configured to send terminal network information to the switch; the terminal network information comprises second terminal network information of a second receiving terminal and first terminal network information of a first receiving terminal;
The switch is also configured to create a multicast member table based on the terminal network information.
It can be seen that by implementing this optional embodiment, the switch may create the multicast member table in advance according to the network information of the first receiving terminal and the second receiving terminal, so that after receiving the data packet of the sending terminal, the switch may copy the packet according to the members of the multicast member table and forward the packet to all the members of the multicast member table, thereby implementing one-to-many data transmission.
In an alternative embodiment, the multicast member table includes an exact table and a linear table; the specific mode of the switch for creating the multicast member table according to the terminal network information comprises the following steps:
The switch creates an accurate table according to the first terminal network information of the first receiving terminal, wherein the accurate table is used for judging whether the target terminal network information in the data message is matched with the first terminal network information of the first receiving terminal;
The switch creates a linear table according to the first terminal network information of the first receiving terminal and the second terminal network information of all the second receiving terminals, the linear table is used for copying and editing the data message, and the linear table is linked to the accurate table.
It can be seen that this optional embodiment gives a specific way for the switch to create the multicast member table according to the first terminal network information and the second terminal network information, so that the created multicast member table can be used for copy editing and forwarding of the data packet.
In an alternative embodiment, the switch is further configured to replicate the data packet by traversing the linear table to obtain a plurality of replicated packets having a number equal to a sum of the number of the first receiving terminal and the number of the second receiving terminal;
The switch is further configured to correspondingly modify target terminal network information in the multiple replication messages into first terminal network information of the first receiving terminal and second terminal network information corresponding to all the second receiving terminals, so as to obtain multiple modified replication messages;
The switch is further configured to forward the plurality of modified duplicate messages to the first receiving terminal and the second receiving terminal, respectively.
It can be seen that this alternative embodiment gives a specific way for the switch to forward data messages to the first receiving terminal and the second receiving terminal, thus enabling a one-to-many RDMA data transfer.
In an alternative embodiment, the first terminal network information includes at least one of IP address, QP information, RDMA transfer identity information, and device identity information of the first receiving terminal;
And/or the second terminal network information includes at least one of IP address, QP information, RDMA transmission identification information, and device identification information of the second receiving terminal;
And/or the destination terminal network information includes at least one of IP address, QP information, RDMA transfer identity information, and device identity information of a destination transfer terminal of the data packet.
It can be seen that by implementing this alternative embodiment, the first terminal network information, the second terminal network information, and the target terminal network information may respectively include one or more addresses or information of the present terminal that can be distinguished from other terminals, so as to ensure a more reliable data transmission process.
In an alternative embodiment, the first receiving terminal and all the second receiving terminals are configured to send respective ACK messages corresponding to the data messages to the switch;
The switch is further configured to determine, according to the pre-stored first terminal network information, whether source terminal network information in any ACK packet is matched with terminal network information of a corresponding receiving terminal, to obtain a second determination result, and if the second determination result is yes, trigger an ACK count value to be increased by 1;
The switch is further configured to determine one of the plurality of ACK messages as a corrected ACK message when the ACK count value is determined to be equal to a sum of the numbers of the first receiving terminal and all the second receiving terminals, change source terminal network information of the corrected ACK message to first terminal network information of the first receiving terminal, and send the corrected ACK message to the transmitting terminal.
Therefore, by implementing the alternative embodiment, after the receiving terminal receives the data message sent by the sending terminal, the receiving terminal can return the confirmation message to the sending terminal, which indicates that the receiving terminal confirms the receiving of the data message, thereby improving the reliability of data transmission.
In an alternative embodiment, the source terminal network information includes at least one of an IP address, QP information, RDMA transfer identity information, and device identity information of the source terminal of the ACK message.
It can be seen that by implementing this alternative embodiment, the source terminal network information may include one or more addresses or information of the present terminal that can be distinguished from other terminals, so as to ensure that the data transmission process is more reliable.
In an alternative embodiment, the sending terminal is further configured to trigger a timeout retransmission if a timeout retransmission condition is met;
Wherein the timeout retransmission condition includes: and in a preset time period after the switch forwards the data message to the first receiving terminal and the second receiving terminal, the sending terminal does not receive the ACK message corresponding to the data message.
It can be seen that by implementing this alternative embodiment, in case of packet loss, the lossless transmission of data RDMA can be guaranteed by triggering retransmission.
The apparatus embodiments described above are merely illustrative, wherein the modules illustrated as separate components may or may not be physically separate, and the components shown as modules may or may not be physical, i.e., may be located in one place, or may be distributed over a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.
From the above detailed description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course by means of hardware. Based on such understanding, the foregoing technical solutions may be embodied essentially or in part in the form of a software product that may be stored in a computer-readable storage medium including read-only memory (ROM), random access memory (Random Access Memory, RAM), programmable read-only memory (Programmable Read-only memory, PROM), erasable programmable read-only memory (Erasable Programmable Read Only Memory, EPROM), one-time programmable read-only memory (OTPROM), electrically erasable programmable read-only memory (EEPROM), compact disc read-only memory (Compact Disc Read-only memory, CD-ROM) or other optical disc memory, magnetic disc memory, tape memory, or any other medium that can be used for computer-readable carrying or storing data.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.
Finally, it should be noted that: the embodiment of the invention discloses a data transmission method and a system based on RDMA multicast, which are only disclosed as a preferred embodiment of the invention, and are only used for illustrating the technical scheme of the invention, but not limiting the technical scheme; although the invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art will understand that; the technical scheme recorded in the various embodiments can be modified or part of technical features in the technical scheme can be replaced equivalently; such modifications and substitutions do not depart from the spirit and scope of the corresponding technical solutions.

Claims (9)

1. The data transmission method based on RDMA multicast is characterized in that the method is applied to an RDMA multicast system, and the RDMA multicast system comprises a switch, a sending terminal, a first receiving terminal and a second receiving terminal; the sending terminal establishes bidirectional connection with the first receiving terminal, and the sending terminal establishes unidirectional connection with the second receiving terminal; the method comprises the following steps:
the sending terminal sends a data message to the switch;
The switch judges whether target terminal network information in the data message is matched with the first terminal network information according to the prestored first terminal network information of the first receiving terminal, and a first judging result is obtained;
If the first judgment result is yes, the switch forwards the data message to the first receiving terminal and the second receiving terminal;
And before the sending terminal sends the data message to the switch, the method further comprises:
the transmitting terminal transmits QP information thereof to the first receiving terminal and the second receiving terminal;
the second receiving terminal binds the QP corresponding to each second receiving terminal with QP information of the sending terminal so as to establish unidirectional connection with the sending terminal;
the QP of the first receiving terminal is bound with QP information of the sending terminal, and the QP information of the first receiving terminal is sent to the sending terminal;
the sending terminal receives QP information of the first receiving terminal, and binds the QP of the sending terminal with QP information of the first receiving terminal so as to establish bidirectional connection between the sending terminal and the first receiving terminal.
2. The method of claim 1, wherein after the sending terminal receives the QP information for the first receiving terminal and binds the QP for the sending terminal with the QP information for the first receiving terminal to establish a bi-directional connection between the sending terminal and the first receiving terminal and before the sending terminal sends a data message to the switch, the method further comprises:
the first receiving terminal and the second receiving terminal send terminal network information to the switch; the terminal network information comprises second terminal network information of the second receiving terminal and first terminal network information of the first receiving terminal;
And the switch creates a multicast member table according to the terminal network information.
3. The method of claim 2, wherein the multicast member table comprises an exact table and a linear table; the switch creates a multicast member table according to the terminal network information, and the method comprises the following steps:
The switch creates the accurate table according to the first terminal network information of the first receiving terminal, wherein the accurate table is used for judging whether the target terminal network information in the data message is matched with the first terminal network information of the first receiving terminal;
The switch creates the linear table according to the first terminal network information of the first receiving terminal and the second terminal network information of all the second receiving terminals, and the linear table is used for copying and editing data messages and linking the linear table to the precise table.
4. A method according to claim 3, wherein the switch forwarding the data message to the first receiving terminal and the second receiving terminal comprises:
the switch replicates the data message by traversing the linear table to obtain a plurality of replicated messages with the number being the sum of the numbers of the first receiving terminal and the second receiving terminal;
The switch correspondingly modifies the target terminal network information in the plurality of replication messages into first terminal network information of the first receiving terminal and second terminal network information corresponding to all the second receiving terminals respectively to obtain a plurality of modified replication messages;
And the switch forwards the plurality of modified copy messages to the first receiving terminal and the second receiving terminal respectively.
5. The method according to any one of claims 2 to 4, wherein the first terminal network information includes at least one of an IP address, QP information, RDMA transfer identity information, and device identity information of the first receiving terminal;
And/or the second terminal network information includes at least one of IP address, QP information, RDMA transfer identity information, and device identity information of the second receiving terminal;
And/or the target terminal network information comprises at least one of IP address, QP information, RDMA transmission identification information and equipment identification information of a target transmission terminal of the data message.
6. The method of claim 1, wherein after the switch forwards the data message to the first receiving terminal and the second receiving terminal, the method further comprises:
The first receiving terminal and all the second receiving terminals send respective ACK messages corresponding to the data messages to the switch;
For any one of the ACK messages, the switch judges whether the source terminal network information in the ACK message is matched with the terminal network information of the corresponding receiving terminal according to the pre-stored first terminal network information to obtain a second judging result;
if the second judgment result is yes, adding 1 to the ACK count value of the switch;
When the ACK count value is judged to be equal to the sum of the numbers of the first receiving terminal and all the second receiving terminals, the switch determines one of a plurality of ACK messages as a corrected ACK message, changes the source terminal network information of the corrected ACK message into first terminal network information of the first receiving terminal, and sends the corrected ACK message to the sending terminal.
7. The method of claim 6, wherein the source terminal network information comprises at least one of IP address, QP information, RDMA transfer identity information, and device identity information of a source terminal of the ACK message.
8. The method of claim 1, wherein after the switch forwards the data message to the first receiving terminal and the second receiving terminal, the method further comprises:
under the condition that the overtime retransmission condition is met, triggering overtime retransmission by the sending terminal;
Wherein the timeout retransmission condition includes: and in a preset time period after the switch forwards the data message to the first receiving terminal and the second receiving terminal, the sending terminal does not receive the ACK message corresponding to the data message.
9. The data transmission system based on RDMA multicast is characterized by comprising a switch, a sending terminal, a first receiving terminal and a second receiving terminal; the sending terminal establishes bidirectional connection with the first receiving terminal, and the sending terminal establishes unidirectional connection with the second receiving terminal; wherein:
The sending terminal is used for sending a data message to the switch;
The switch is used for judging whether target terminal network information in the data message is matched with the first terminal network information according to the prestored first terminal network information of the first receiving terminal, so as to obtain a first judgment result;
The switch is further configured to forward the data packet to the first receiving terminal and the second receiving terminal when the first determination result is yes;
the sending terminal is further configured to send QP information thereof to the first receiving terminal and the second receiving terminal;
The second receiving terminal is used for binding QPs corresponding to the second receiving terminal with QP information of the sending terminal so as to establish unidirectional connection with the sending terminal;
The first receiving terminal is used for binding QP information of the first receiving terminal with QP information of the sending terminal and sending the QP information of the first receiving terminal to the sending terminal;
The sending terminal is further configured to receive QP information of the first receiving terminal, and bind the QP of the sending terminal with the QP information of the first receiving terminal, so as to establish bidirectional connection between the sending terminal and the first receiving terminal.
CN202210414928.9A 2022-04-20 2022-04-20 RDMA multicast-based data transmission method and system Active CN114944966B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210414928.9A CN114944966B (en) 2022-04-20 2022-04-20 RDMA multicast-based data transmission method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210414928.9A CN114944966B (en) 2022-04-20 2022-04-20 RDMA multicast-based data transmission method and system

Publications (2)

Publication Number Publication Date
CN114944966A CN114944966A (en) 2022-08-26
CN114944966B true CN114944966B (en) 2024-04-19

Family

ID=82906561

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210414928.9A Active CN114944966B (en) 2022-04-20 2022-04-20 RDMA multicast-based data transmission method and system

Country Status (1)

Country Link
CN (1) CN114944966B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103441937A (en) * 2013-08-21 2013-12-11 曙光信息产业(北京)有限公司 Sending method and receiving method of multicast data
CN109586931A (en) * 2018-10-18 2019-04-05 招商证券股份有限公司 Method of multicasting and terminal device
CN112448826A (en) * 2020-11-13 2021-03-05 恒生电子股份有限公司 Multicast message communication method and device, readable medium and electronic equipment
CN113961139A (en) * 2020-07-02 2022-01-21 华为技术有限公司 Method for processing data by using intermediate device, computer system and intermediate device
WO2022048762A1 (en) * 2020-09-04 2022-03-10 Huawei Technologies Co., Ltd. Devices and methods for remote direct memory access

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7478138B2 (en) * 2004-08-30 2009-01-13 International Business Machines Corporation Method for third party, broadcast, multicast and conditional RDMA operations

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103441937A (en) * 2013-08-21 2013-12-11 曙光信息产业(北京)有限公司 Sending method and receiving method of multicast data
CN109586931A (en) * 2018-10-18 2019-04-05 招商证券股份有限公司 Method of multicasting and terminal device
CN113961139A (en) * 2020-07-02 2022-01-21 华为技术有限公司 Method for processing data by using intermediate device, computer system and intermediate device
WO2022048762A1 (en) * 2020-09-04 2022-03-10 Huawei Technologies Co., Ltd. Devices and methods for remote direct memory access
CN112448826A (en) * 2020-11-13 2021-03-05 恒生电子股份有限公司 Multicast message communication method and device, readable medium and electronic equipment

Also Published As

Publication number Publication date
CN114944966A (en) 2022-08-26

Similar Documents

Publication Publication Date Title
US8243643B2 (en) Active multicast information protocol
US6999465B2 (en) Methods for reliably sending IP multicast packets to multiple endpoints of a local area network
KR100935933B1 (en) Reliable multicast data retransmission method by grouping wireless terminal in wireless communication, and apparatus thereof
KR100683813B1 (en) Multicast data transfer
DE602004010851T2 (en) METHOD AND DEVICES FOR DUPLICATE PACKET IDENTIFICATION DURING A HANDOVER
ZA200608906B (en) Data repair enhancements for multicast/broadcast data distribution
KR20040098553A (en) Reliable delivery of multi-cast conferencing data
US10505677B2 (en) Fast detection and retransmission of dropped last packet in a flow
KR100883576B1 (en) Data repair enhancements for multicast/broadcast data distribution
WO2017185212A1 (en) Multicast delay diagnosis method and apparatus
Sabata et al. Transport protocol for reliable multicast: TRM
CN112448826B (en) Multicast message communication method and device, readable medium and electronic equipment
CN114553799B (en) Multicast forwarding method, device, equipment and medium based on programmable data plane
JP2006074132A (en) Multicast communication method and gateway device
KR19990053163A (en) Packet Error Controller for Multicast Communication and Packet Error Control Method Using the Same
CN114944966B (en) RDMA multicast-based data transmission method and system
KR100382360B1 (en) Method and apparatus for transmitting explict multicast data packet over ethernet
JP2009212796A (en) Transmitter, data transfer system, data transfer method, and data transfer program
KR100281643B1 (en) Session Multicast Method for Multi-Party Information Transmission
Lee et al. Realization of a Scalable and Reliable Multicast Transport Protocol for Many‐to‐Many Sessions
CN115442318A (en) Method, device, storage medium and equipment for realizing reliable multicast of RDMA (remote direct memory Access) network
Krishnamurthy Distributed reliable multicast protocol for the SOME-Bus network
Yoon et al. Throughput analysis of tree-based protocols for many-to-many reliable multicast
Wilson Reliable Multicast A Technical Paper Review (CPSC 609.31)
KR20080055202A (en) System for transfering a large-sized digital content to multi-point using ip-multicast and method thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Country or region after: China

Address after: 518053 501, 502, 601 and 602, building D, wisdom Plaza, Qiaoxiang Road, Gaofa community, Shahe street, Nanshan District, Shenzhen, Guangdong

Applicant after: China Southern Power Grid Digital Platform Technology (Guangdong) Co.,Ltd.

Address before: 518053 501, 502, 601 and 602, building D, wisdom Plaza, Qiaoxiang Road, Gaofa community, Shahe street, Nanshan District, Shenzhen, Guangdong

Applicant before: China Southern Power Grid Shenzhen Digital Power Grid Research Institute Co.,Ltd.

Country or region before: China

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant