CN114201313A - Message transmission system and message transmission method - Google Patents

Message transmission system and message transmission method Download PDF

Info

Publication number
CN114201313A
CN114201313A CN202111485397.4A CN202111485397A CN114201313A CN 114201313 A CN114201313 A CN 114201313A CN 202111485397 A CN202111485397 A CN 202111485397A CN 114201313 A CN114201313 A CN 114201313A
Authority
CN
China
Prior art keywords
rdma
host
information
cache
cache region
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111485397.4A
Other languages
Chinese (zh)
Inventor
蒋汶达
杨从毅
蒋国强
刘志军
魏耀武
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Era Yitong Software Ltd By Share Ltd
Original Assignee
Hangzhou Era Yitong Software Ltd By Share Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Era Yitong Software Ltd By Share Ltd filed Critical Hangzhou Era Yitong Software Ltd By Share Ltd
Priority to CN202111485397.4A priority Critical patent/CN114201313A/en
Publication of CN114201313A publication Critical patent/CN114201313A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/546Message passing systems or structures, e.g. queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/20Handling requests for interconnection or transfer for access to input/output bus
    • G06F13/28Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access DMA, cycle steal

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computer And Data Communications (AREA)

Abstract

The invention discloses a message transmission system, which comprises a first cache area, a first RDMA process and a first scheduling center, wherein the first cache area, the first RDMA process and the first scheduling center are positioned in a first host; and a second cache, a second RDMA process, and a second dispatch center located within a second host; the first RDMA process is communicatively connected with the second RDMA process through RDMA technology so as to enable the first host to be communicatively connected with the second host; the first scheduling center is used for acquiring subscription information of an application in the first host and sending the subscription information to the second cache region from the first cache region through the first RDMA process; and the second scheduling center is used for sending the actual information to the first cache region from the second cache region through a second RDMA process when the application in the second host generates the actual information corresponding to the subscription information. The invention also provides a message transmission method, which also has the beneficial effects.

Description

Message transmission system and message transmission method
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a message delivery system and a message delivery method.
Background
Message queues have gradually become a core means of internal communication in enterprise IT systems. There are many mainstream message middleware on the market today, such as rockmq, Kafka, Nats, etc., and these technologies are well-established and have a series of functions such as reliable delivery, broadcast, flow control, low coupling, final consistency, etc. However, when they relate to network data Transmission between servers, most of the used Transmission protocols are designed based on TCP/IP (Transmission Control Protocol/Internet Protocol ) architecture, and are limited by the times of their birth, and these systems have the problems of large network Transmission delay, multiple copies among data processes, interrupt processing, and the like.
Compared with the traditional network technology, the RDMA (Remote-Direct Memory Access) Remote Memory Direct Access technology has the advantages of high bandwidth, low time delay, less occupied system resources and the like. As a new high-speed network technology proposed by Infiniband for high-performance technology, currently, there are IB (InfiniBand), RoCE (RDMA over converted Ethernet), iWARP (internet Wide Area RDMA protocol) which are network protocols supporting RDMA. Where RoCE is a network protocol that allows RDMA over Ethernet with the lower network header being an Ethernet header, this allows RDMA to be used on a standard Ethernet infrastructure, i.e., switches, with only the network card being special. RDMA has numerous applications in the areas of high-performance computing, distributed storage, and the like. For example, the Spark over RDMA scheme of IBM modifies the underlying network framework of Spark, fully utilizes the reliable broadcast function of mellonox 100G RoCE, and greatly improves the system performance. There has been a trend to improve network transmission efficiency using RDMA technology.
The message middleware system designed based on the TCP/IP architecture is limited by the protocol itself and the device hardware, and is difficult to further improve the performance indexes such as the delay and the throughput. The RDMA technique can greatly improve the data transmission efficiency between the underlying networks, thereby improving the above performance problems. But programming is not easy on the existing RDMA basis. RDMA communication primitives are many different from socket communication primitives that are commonly used by applications. RDMA is message-based and sockets are byte-stream based. Therefore, how to improve the message communication system by fully utilizing the efficient transmission characteristics of RDMA is an urgent problem to be solved by those skilled in the art.
Disclosure of Invention
The invention aims to provide a message transmission system which can realize the communication between hosts based on RDMA; another object of the present invention is to provide a message passing method, which can implement RDMA-based communication between hosts.
In order to solve the technical problem, the invention provides a message transmission system, which comprises a first cache region, a first RDMA process and a first scheduling center, wherein the first cache region, the first RDMA process and the first scheduling center are positioned in a first host; and a second cache, a second RDMA process, and a second dispatch center located within a second host;
the first RDMA process communicatively coupled with the second RDMA process via RDMA techniques to communicatively couple the first host with the second host;
the first scheduling center is used for acquiring subscription information of an application in the first host and sending the subscription information to the second cache region from the first cache region through the first RDMA process;
the second scheduling center is used for sending the actual information to the first cache region from the second cache region through the second RDMA process when the application in the second host generates the actual information corresponding to the subscription information.
Optionally, a first shared memory including the first cache region is disposed in the first host, and a second shared memory including the second cache region is disposed in the second host;
the first scheduling center is used for acquiring the subscription information through the first shared memory;
the second scheduling center is configured to obtain the actual information through the second shared memory.
Optionally, the first shared memory and the second shared memory are both shared memories based on a lock-free ring queue.
Optionally, the first buffer area and the second buffer area are both multi-channel buffer areas.
Optionally, the first cache region has a plurality of first channels, the plurality of first channels correspond to information of a plurality of lengths, and the first cache region processes the information of the corresponding length through the first channels.
Optionally, the second cache area has a plurality of second channels, and the plurality of second channels correspond to information of a plurality of lengths; and the second cache region processes the information with the corresponding length through the second channel.
Optionally, the first RDMA process is reliably connected with the second RDMA process.
Optionally, the first RDMA process is configured to release a memory occupied by the sent information every time a first preset number of pieces of information are sent;
and the second RDMA process is used for releasing the memory occupied by the sent information every time a second preset number of pieces of information are sent.
The invention also provides a message transmission method, which is applied to the first dispatching center and comprises the following steps:
acquiring subscription information of an application in a first host; a first cache region, a first RDMA process and a first scheduling center are arranged in the first host; a second cache region, a second RDMA process and a second scheduling center are arranged in the second host; the first RDMA process communicatively coupled with the second RDMA process via RDMA techniques to communicatively couple the first host with the second host;
and sending the subscription information to the second cache region from the first cache region through the first RDMA process, so that when the application in the second host generates actual information corresponding to the subscription information, the second scheduling center sends the actual information to the first cache region from the second cache region through the second RDMA process.
The invention also provides a message transmission method, which is applied to a second dispatching center and comprises the following steps:
acquiring actual information of application generation corresponding to subscription information in the second host; a first cache region, a first RDMA process and a first scheduling center are arranged in the first host; a second cache region, a second RDMA process and a second scheduling center are arranged in the second host; the first RDMA process communicatively coupled with the second RDMA process via RDMA techniques to communicatively couple the first host with the second host; the subscription information is generated by the application in the first host acquired by the first scheduling center and is sent to the subscription information of the second cache area from the first cache area;
sending the actual information from the second cache to the first cache via the second RDMA process.
The invention provides a message transmission system, which comprises a first cache region, a first RDMA process and a first scheduling center, wherein the first cache region, the first RDMA process and the first scheduling center are positioned in a first host; and a second cache, a second RDMA process, and a second dispatch center located within a second host; the first RDMA process is communicatively connected with the second RDMA process through RDMA technology so as to enable the first host to be communicatively connected with the second host; the first scheduling center is used for acquiring subscription information of an application in the first host and sending the subscription information to the second cache region from the first cache region through the first RDMA process; and the second scheduling center is used for sending the actual information to the first cache region from the second cache region through a second RDMA process when the application in the second host generates the actual information corresponding to the subscription information.
By correspondingly arranging the first RDMA process and the second RDMA process on the two hosts, the two hosts can be in communication connection through the RDMA technology, and efficient data transmission is realized. By setting the first buffer zone to transmit and receive the information transmitted by the first RDMA process in a targeted manner and setting the second buffer zone to transmit and receive the information transmitted by the second RDMA process in a targeted manner, the confirmation of data information during communication based on the RDMA technology between hosts can be facilitated, and the replacement of traditional TCP/IP by the RDMA is facilitated.
The invention also provides a message transmission method, which also has the beneficial effects and is not repeated herein.
Drawings
In order to more clearly illustrate the embodiments or technical solutions of the present invention, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained based on these drawings without creative efforts.
Fig. 1 is a schematic structural diagram of a messaging system according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a specific messaging system according to an embodiment of the present invention;
fig. 3 is a flowchart of a message delivery method according to an embodiment of the present invention;
fig. 4 is a flowchart of another message delivery method according to an embodiment of the present invention.
In the figure: 11. the system comprises a first buffer zone, 12, a second buffer zone, 21, a first RDMA process, 22, a second RDMA process, 31, a first dispatch center, 32, a second dispatch center, 41, a first shared memory, and 42, a second shared memory.
Detailed Description
The core of the invention is to provide a message transmission system. In the prior art, a message middleware system designed based on a TCP/IP architecture is limited by a protocol and device hardware, and is difficult to further improve performance indexes such as time delay and throughput. The RDMA technique can greatly improve the data transmission efficiency between the underlying networks, thereby improving the above performance problems. But programming is not easy on the existing RDMA basis. RDMA communication primitives are many different from socket communication primitives that are commonly used by applications. RDMA is message-based, while sockets are byte-stream based and cannot be directly replaced.
The message transmission system provided by the invention comprises a first cache region, a first RDMA process and a first scheduling center, wherein the first cache region, the first RDMA process and the first scheduling center are positioned in a first host; and a second cache, a second RDMA process, and a second dispatch center located within a second host; the first RDMA process is communicatively connected with the second RDMA process through RDMA technology so as to enable the first host to be communicatively connected with the second host; the first scheduling center is used for acquiring subscription information of an application in the first host and sending the subscription information to the second cache region from the first cache region through the first RDMA process; and the second scheduling center is used for sending the actual information to the first cache region from the second cache region through a second RDMA process when the application in the second host generates the actual information corresponding to the subscription information.
By correspondingly arranging the first RDMA process and the second RDMA process on the two hosts, the two hosts can be in communication connection through the RDMA technology, and efficient data transmission is realized. By setting the first buffer zone to transmit and receive the information transmitted by the first RDMA process in a targeted manner and setting the second buffer zone to transmit and receive the information transmitted by the second RDMA process in a targeted manner, the confirmation of data information during communication based on the RDMA technology between hosts can be facilitated, and the replacement of traditional TCP/IP by the RDMA is facilitated.
In order that those skilled in the art will better understand the disclosure, the invention will be described in further detail with reference to the accompanying drawings and specific embodiments. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, fig. 1 is a schematic structural diagram of a message delivery system according to an embodiment of the present invention.
Referring to fig. 1, in an embodiment of the present invention, a messaging system includes a first cache 11 located in a first host, a first RDMA process 21, and a first dispatch center 31; and a second cache 12, a second RDMA process 22, and a second dispatch center 32 located within a second host; the first RDMA process 21 and the second RDMA process 22 are communicatively connected by RDMA techniques to communicatively connect the first host and the second host; the first dispatch center 31 is configured to obtain subscription information of an application in the first host, and send the subscription information from the first cache 11 to the second cache 12 through the first RDMA process 21; the second dispatch center 32 is configured to send actual information corresponding to the subscription information from the second cache 12 to the first cache 11 via the second RDMA process 22 when the application in the second host generates the actual information.
The first host and the second host, i.e. the hosts that need to communicate with each other, are similar in arrangement to the two hosts, wherein the first host is provided with a first buffer 11, a first RDMA process 21 and a first dispatch center 31, and the second host is provided with a second buffer 12, a second RDMA process 22 and a second dispatch center 32.
The first RDMA process 21 described above is a process in the first host for implementing data transceiving by RDMA technology, and the corresponding second RDMA process 22 is a process in the second host for implementing data transceiving by RDMA technology. In an embodiment of the invention, the first RDMA process 21 is communicatively connected to the second RDMA process 22 by RDMA technology, so that the first host is communicatively connected to the second host by RDMA technology.
In the embodiment of the invention, a network transmission model is established by adopting an RDMA (remote direct memory access) technology based on a RoCEv2 protocol, so that the problem of time delay of host data processing in network application is solved. The RDMA technology supports two ways of zero copy network technology and kernel memory bypass technology by solidifying a protocol on an intelligent network card to hardware, so that a software architecture is fully optimized, memory bandwidth and CPU period can be released, and system performance is greatly improved.
The first buffer 11 is used for transceiving information transmitted by the first RDMA process 21, that is, the first RDMA process 21 needs to select information from the first buffer 11 to send to other hosts, and when receiving the information, the first RDMA process 21 first stores the information in the first buffer 11, so that the first buffer 11 is used for transceiving the information transmitted by the first RDMA process 21. Accordingly, the second buffer 12 is used for transceiving information transmitted by the second RDMA process 22, that is, the second RDMA process 22 needs to select information from the second buffer 12 to send to other hosts, and the second RDMA process 22 first stores the information in the second buffer 12 when receiving the information, so that the second buffer 12 is used for transceiving information transmitted by the second RDMA process 22.
RDMA itself has no data caching capability and requires the upper layer applications to explicitly register and manage the send and receive buffers, which can become troublesome in high concurrency scenarios. Aiming at the problems, the embodiment of the invention establishes an RDMA data caching mechanism on the basis of RDMA bilateral operation according to service requirements, and transmits and receives data transmitted by an RDMA process by detecting a special cache region. Details of the information transmission and reception will be described in detail in the following embodiments of the invention, and will not be described herein again.
It should be noted that the first buffer 11 and the second buffer 12 can be divided into a sending buffer and a receiving buffer, where the sending buffer is used to store data to be sent, and the receiving buffer is used to store data just received.
The first dispatch center 31, i.e. the first host, is used for handling the transfer of information such as subscription and publication between hosts, and is mainly used for implementing the transfer of information through the first RDMA process 21. Accordingly, the second dispatch center 32, i.e., the second host, is configured to handle the transfer of information such as subscription and publication between hosts, and is mainly configured to implement the transfer of information through the second RDMA process 22.
Specifically, in this embodiment of the present invention, the first dispatch center 31 is configured to obtain subscription information of an application in the first host, and send the subscription information from the first cache 11 to the second cache 12 through the first RDMA process 21. That is, when the first host needs to subscribe to information generated in the second host, the application in the first host first generates subscription information, and then the first dispatch center 31 stores the subscription information and notifies the first RDMA process 21. The first RDMA process 21 may then call the subscription information from the first cache 11 to the second RDMA process 22, thereby sending the subscription information to the second cache 12 of the second host. At this time, the second dispatch center 32 stores the subscription information so as to transmit the stone-based information back when the second host generates the actual information corresponding to the subscription information.
Accordingly, in this embodiment of the present invention, the second dispatch center 32 is configured to send the actual information corresponding to the subscription information from the second cache 12 to the first cache 11 through the second RDMA process 22 when the application in the second host generates the actual information. That is, when the second host generates the actual information corresponding to the subscription information, the second RDMA process 22 will generally receive the actual information through the second cache 12, and then query the second dispatch center 32 as to which host the subscription information corresponding to the actual information is published. When determined to be the first host, the second RDMA process 22 will send the actual information to the first host, and in particular will first save to the first cache 11 to enable the transfer of the information.
The message transmission system provided by the embodiment of the invention comprises a first cache region 11, a first RDMA process 21 and a first scheduling center 31, wherein the first cache region 11, the first RDMA process and the first scheduling center are positioned in a first host; and a second cache 12, a second RDMA process 22, and a second dispatch center 32 located within a second host; the first RDMA process 21 is communicatively connected with the second RDMA process 22 by RDMA technology to communicatively connect the first host with the second host; the first scheduling center 31 is configured to obtain subscription information of an application in the first host, and send the subscription information from the first cache 11 to the second cache 12 through the first RDMA process 21; the second dispatch center 32 is configured to send the actual information from the second cache 12 to the first cache 11 via the second RDMA process 22 when the actual information corresponding to the subscription information is generated by the application in the second host.
By correspondingly arranging the first RDMA process 21 and the second RDMA process 22 on the two hosts, the two hosts can be in communication connection through the RDMA technology, and efficient data transmission is realized. By setting the first buffer 11 to transmit and receive the information transmitted by the first RDMA process 21 in a targeted manner, and setting the second buffer 12 to transmit and receive the information transmitted by the second RDMA process 22 in a targeted manner, the acknowledgement of data information during communication based on RDMA technology between hosts can be facilitated, and the replacement of traditional TCP/IP by RDMA is facilitated.
The details of a messaging system provided by the present invention will be described in more detail in the following embodiments of the invention.
Referring to fig. 2, fig. 2 is a schematic structural diagram of a specific messaging system according to an embodiment of the present invention.
Different from the above embodiment of the invention, the embodiment of the invention is further limited to the structure of the message delivery system based on the above embodiment of the invention. The rest of the contents are already described in detail in the above embodiments of the present invention, and are not described herein again.
Referring to fig. 2, in the embodiment of the present invention, a first shared memory 41 including the first cache region 11 is disposed in the first host, and a second shared memory 42 including the second cache region 12 is disposed in the second host; the first scheduling center 31 is configured to obtain the subscription information through the first shared memory 41; the second scheduling center 32 is configured to obtain the actual information through the second shared memory 42.
That is, in the embodiment of the present invention, the shared memory in the first host and the first cache region 11 may be integrated to be the first shared memory 41. At this time, the first shared memory 41 in the first host at least needs to include the first buffer 11, and usually the first shared memory 41 is identical to the first buffer 11. In the embodiment of the present invention, the first shared memory 41 is further configured to implement data transmission between processes in the first host, that is, in the embodiment of the present invention, data transmission in the first host and data transmission between different hosts are implemented by the same first shared memory 41.
Accordingly, in the embodiment of the present invention, the shared memory in the second host and the second cache region 12 may be integrated to be the second shared memory 42. At this time, the second shared memory 42 in the second host at least needs to include the second buffer 12, and usually the second shared memory 42 is identical to the second buffer 12. In the embodiment of the present invention, the second shared memory 42 is further configured to implement data transmission between processes in the second host, that is, in the embodiment of the present invention, data transmission in the second host and data transmission between different hosts are implemented by using the same second shared memory 42.
In addition to network communication between hosts, process communication within a host is also important. If the inter-process communication in the host is also allowed to pass through the RDMA network card, the delay and the throughput of the RDMA network card are much worse than those of the shared memory in the host under the limitation of the PCIe bus. In view of the above problems, the embodiment of the present invention integrates inter-process communication in the host and inter-network communication, and shares one memory, i.e. the first shared memory 41 and the second shared memory 42. The upper layer application directly operates the RDMA transceiving cache region to carry out communication between processes and networks, and the transmission efficiency can be improved. The memory is registered to the RDMA network card once only during initialization and is used as a buffer area for receiving and transmitting the RDMA network. At the moment, only one copy of data exists in the whole transmission process, and the zero copy characteristic can be ensured. Therefore, in the embodiment of the present invention, the first scheduling center 31 may specifically obtain the subscription information through the first shared memory 41; the corresponding second dispatch center 32 can obtain the actual information through the second shared memory 42.
In general, the first scheduling center 31 and the second scheduling center 32 may be divided into a first scheduling sub-center and a second scheduling sub-center. The first scheduling sub-center is mainly used for processing information communicated between processes in each host, and the second scheduling sub-center is mainly used for processing information communicated between hosts.
Further, in the embodiment of the present invention, the first shared memory 41 and the second shared memory 42 are both shared memories based on a lockless ring queue. Compared with the queue with the lock, the lock-free algorithm reduces the cost caused by lock competition among threads And does not have the cost caused by frequent scheduling among the threads, And has better performance compared with a lock-based mode. Therefore, in the embodiment of the present invention, the shared memory based on the lockless circular queue is used as the first shared memory 41 and the second shared memory 42, so that the performance of the entire system can be effectively improved.
Further, in the embodiment of the present invention, the first buffer area 11 and the second buffer area 12 are both multi-channel buffer areas. That is, a plurality of channels are provided in both the first buffer area 11 and the second buffer area 12, so that data can be transmitted in parallel, and the performance of the system can be effectively improved.
Further, in this embodiment of the present invention, the first buffer 11 has a plurality of first channels, the plurality of first channels correspond to information with a plurality of lengths, and the first buffer 11 processes the information with the corresponding length through the first channels. Correspondingly, in the embodiment of the present invention, the second cache region 12 has a plurality of second channels, and the plurality of second channels correspond to information with a plurality of lengths; the second buffer 12 processes information of corresponding length through the second channel.
That is, the plurality of first channels corresponding to the first buffer 11 may specifically correspond to information of a plurality of lengths in common, and in this embodiment of the present invention, the first buffer 11 specifically processes information of its corresponding length through the first channels. Accordingly, a plurality of second channels corresponding to the second buffer 12 may correspond to information of a plurality of lengths together, and in the embodiment of the present invention, the second buffer 12 processes the information of the length corresponding to itself through the second channels.
Because the receiving mechanism of the data in the embodiment of the present invention does not know the length of the data before the data is received, a memory which is large enough and is registered with the network card must be allocated for storing the data from the RDMA network. If each link lane processes N messages at most simultaneously, each message being of maximum possible size MAXS ize, then memory is allocated to at least that lane with the buffer _ s I ZE ═ N × MAXS I ZE. And storing the pointer positions of the N blocks of memories through the ring-shaped queue, thereby constructing a buffer area. When the RDMA application is started, the cache region registers to the RDMA network card once without repeated registration.
In the embodiment of the present invention, it can be found through the receiving mechanism that the memory size inevitably increases for each block of data to be reached to which the maximum length of memory is allocated. For the problem, a multi-channel mechanism is adopted for data with different sizes, and data with various lengths are correspondingly processed through multiple channels, for example, data with sizes from 0KB to 1KB, from 1KB to 10KB, and from 10KB to 1MB are respectively processed, so that the memory utilization rate can be effectively improved.
Specifically, in the embodiment of the present invention, the first RDMA process 21 is reliably connected to the second RDMA process 22. The RDMA transport types fall into two broad categories: the former supports two transmission types of reliable connection (Re i ab l e connection i on, RC) and unreliable connection (un Re i ab l e connection i on, UC). In the embodiment of the present invention, a transmission type of reliable connection is specifically selected to transmit data between the first host and the second host.
Specifically, in the data sending mechanism in the embodiment of the present invention, the first RDMA process 21 is configured to release the memory occupied by the sent information every time a first preset number of pieces of information are sent; the second RDMA process 22 is configured to release the memory occupied by the sent information every time a second preset number of pieces of information are sent.
In practical situations, the i bv _ post _ send function at the sending end of RDMA returns success, which only means that a sending request is successfully submitted to the network card and it is not guaranteed that data is really sent out, and the application program at the receiving end of RDMA needs to declare a message to be received in advance and designate a receiving memory area. Since the i bv _ post _ send function returns success, it only means that the sending request is successfully submitted to the network card, and the memory area where the data is located cannot be modified immediately.
Therefore, in the buffer constructed in the embodiment of the invention, the nth data can be marked when the N data are sent, and the data are synchronously waited to be sent out really. Because the RDMA reliable connection mode adopted by the system can ensure the data sequence, when the Nth data is confirmed, the former N-1 data is indicated to reach the opposite end, and the memory where the data is sent can be released. Therefore, the security of the memory area where the data is located can be ensured, and the efficient receiving and transmitting capability of RDMA is utilized to the maximum extent. Correspondingly, in the embodiment of the present invention, it is necessary that the first RDMA process 21 releases the memory occupied by the sent information every time the first predetermined number of pieces of information are sent; and it is necessary that the second RDMA process 22 releases the memory occupied by the sent information every time when sending a second preset number of pieces of information.
It should be noted that RDMA provides two types of communication primitives, the first type is a two-sided (two-s i ded) operation, similar to a socket, the sending end calls send, the receiving end calls recv, and the other type is a one-s i ded) operation, which provides a primitive of shared memory, i.e. directly reading and writing remote memory, or performing atomic operation on remote memory. In the embodiment of the present invention, a bilateral primitive is generally used as a specific data transfer mechanism between the first RDMA process 21 and the second RDMA process 22.
Because the RDMA technology is complex, the cost for upper developers to learn and master the RDMA technology is high. The embodiment of the invention provides a complete message middleware system based on the RDMA technology, and the complete message middleware system is provided for a user as an independent application process. RDMA networking technology is part of the system components and is not directly exposed to users. The problems of memory registration, cache management and the like faced during RDMA use are all solved in the system. Therefore, the user only needs to pay attention to basic functions of message subscription, publishing and the like most commonly used by the message middleware, does not sense details of underlying network transmission, and can greatly reduce development cost.
According to the message transmission system provided by the embodiment of the invention, the first RDMA process 21 and the second RDMA process 22 are correspondingly arranged on the two hosts, so that the two hosts can be in communication connection through the RDMA technology, and efficient data transmission is realized. By setting the first buffer 11 to transmit and receive the information transmitted by the first RDMA process 21 in a targeted manner, and setting the second buffer 12 to transmit and receive the information transmitted by the second RDMA process 22 in a targeted manner, the acknowledgement of data information during communication based on RDMA technology between hosts can be facilitated, and the replacement of traditional TCP/IP by RDMA is facilitated.
In the following, a message delivery method provided by an embodiment of the present invention is introduced, and the message delivery method described below and the message delivery system described above may be referred to correspondingly.
It should be noted that the message delivery method provided in the embodiment of the present invention is specifically applied to a client, that is, the first host, in a message delivery system, where the client is generally a client.
Referring to fig. 3, fig. 3 is a flowchart of a message delivery method according to an embodiment of the present invention. The message passing method provided by the embodiment of the present invention is applied to the first host, and is specifically applied to the first scheduling center 31.
Referring to fig. 3, in the embodiment of the present invention, a message delivery method includes:
s101: subscription information of an application in a first host is obtained.
In the embodiment of the present invention, a first cache region 11, a first RDMA process 21 and a first scheduling center 31 are provided in the first host; a second cache region 12, a second RDMA process 22 and a second scheduling center 32 are arranged in the second host; the first RDMA process 21 is communicatively coupled with the second RDMA process 22 via RDMA techniques to communicatively couple the first host with the second host. The detailed structure of the message passing system has been described in detail in the above embodiments of the invention, and will not be described herein again.
S102: and sending the subscription information to the second cache region from the first cache region through the first RDMA process, so that the second dispatching center sends the actual information to the first cache region from the second cache region through the second RDMA process when the application in the second host generates the actual information corresponding to the subscription information.
The detailed description of the communication method between the first host and the second host has been described in the above embodiments of the present invention, and will not be repeated herein.
According to the message transmission method provided by the embodiment of the invention, the first RDMA process 21 and the second RDMA process 22 are correspondingly arranged on the two hosts, so that the two hosts can be in communication connection through the RDMA technology, and efficient data transmission is realized. By setting the first buffer 11 to transmit and receive the information transmitted by the first RDMA process 21 in a targeted manner, and setting the second buffer 12 to transmit and receive the information transmitted by the second RDMA process 22 in a targeted manner, the acknowledgement of data information during communication based on RDMA technology between hosts can be facilitated, and the replacement of traditional TCP/IP by RDMA is facilitated.
In the following, a message delivery method provided by an embodiment of the present invention is introduced, and the message delivery method described below and the message delivery system described above may be referred to correspondingly.
It should be noted that another message delivery method provided by the embodiment of the present invention is specifically applied to the end sending the actual information in the message delivery system, which is usually the server end, that is, the second host.
Referring to fig. 4, fig. 4 is a flowchart of another message delivery method according to an embodiment of the present invention. Another message passing method provided in the embodiment of the present invention is applied to the second host, and is specifically applied to the second scheduling center 32.
Referring to fig. 4, in the embodiment of the present invention, a message delivery method includes:
s201: and acquiring actual information of the application generation corresponding to the subscription information in the second host.
In the embodiment of the present invention, a first cache region 11, a first RDMA process 21 and a first scheduling center 31 are provided in the first host; a second cache region 12, a second RDMA process 22 and a second scheduling center 32 are arranged in the second host; the first RDMA process 21 and the second RDMA process 22 are communicatively connected by RDMA techniques to communicatively connect the first host and the second host; the subscription information is generated by the application in the first host acquired by the first scheduling center 31 and sent from the first cache region 11 to the second cache region 12. The detailed structure of the message passing system has been described in detail in the above embodiments of the invention, and will not be described herein again.
S202: the actual information is sent from the second cache to the first cache via a second RDMA process.
The detailed description of the communication method between the first host and the second host has been described in the above embodiments of the present invention, and will not be repeated herein.
According to the message transmission method provided by the embodiment of the invention, the first RDMA process 21 and the second RDMA process 22 are correspondingly arranged on the two hosts, so that the two hosts can be in communication connection through the RDMA technology, and efficient data transmission is realized. By setting the first buffer 11 to transmit and receive the information transmitted by the first RDMA process 21 in a targeted manner, and setting the second buffer 12 to transmit and receive the information transmitted by the second RDMA process 22 in a targeted manner, the acknowledgement of data information during communication based on RDMA technology between hosts can be facilitated, and the replacement of traditional TCP/IP by RDMA is facilitated.
The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts among the embodiments are referred to each other.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The above provides a detailed description of a messaging system and a messaging method according to the present invention. The principles and embodiments of the present invention are explained herein using specific examples, which are presented only to assist in understanding the method and its core concepts. It should be noted that, for those skilled in the art, it is possible to make various improvements and modifications to the present invention without departing from the principle of the present invention, and those improvements and modifications also fall within the scope of the claims of the present invention.

Claims (10)

1. A messaging system comprising a first cache located within a first host, a first RDMA process, and a first dispatch center; and a second cache, a second RDMA process, and a second dispatch center located within a second host;
the first RDMA process communicatively coupled with the second RDMA process via RDMA techniques to communicatively couple the first host with the second host;
the first scheduling center is used for acquiring subscription information of an application in the first host and sending the subscription information to the second cache region from the first cache region through the first RDMA process;
the second scheduling center is used for sending the actual information to the first cache region from the second cache region through the second RDMA process when the application in the second host generates the actual information corresponding to the subscription information.
2. The system according to claim 1, wherein a first shared memory including the first cache region is provided in the first host, and a second shared memory including the second cache region is provided in the second host;
the first scheduling center is used for acquiring the subscription information through the first shared memory;
the second scheduling center is configured to obtain the actual information through the second shared memory.
3. The system of claim 2, wherein the first shared memory and the second shared memory are both lock-free ring queue based shared memories.
4. The system of claim 1, wherein the first cache region and the second cache region are both multi-channel cache regions.
5. The system of claim 4, wherein the first buffer has a plurality of first channels, a plurality of the first channels correspond to information of a plurality of lengths, and the first buffer processes the information of the corresponding lengths through the first channels.
6. The system of claim 4, wherein the second buffer has a plurality of second channels, and the plurality of second channels correspond to information of a plurality of lengths; and the second cache region processes the information with the corresponding length through the second channel.
7. The system of claim 1, wherein the first RDMA process is reliably connected with the second RDMA process.
8. The system of claim 7, wherein the first RDMA process is configured to release memory occupied by a first preset number of messages sent each time the first preset number of messages are sent;
and the second RDMA process is used for releasing the memory occupied by the sent information every time a second preset number of pieces of information are sent.
9. A message passing method applied to a first dispatch center, comprising:
acquiring subscription information of an application in a first host; a first cache region, a first RDMA process and a first scheduling center are arranged in the first host; a second cache region, a second RDMA process and a second scheduling center are arranged in the second host; the first RDMA process communicatively coupled with the second RDMA process via RDMA techniques to communicatively couple the first host with the second host;
and sending the subscription information to the second cache region from the first cache region through the first RDMA process, so that when the application in the second host generates actual information corresponding to the subscription information, the second scheduling center sends the actual information to the first cache region from the second cache region through the second RDMA process.
10. A message passing method applied to a second scheduling center, comprising:
acquiring actual information of application generation corresponding to subscription information in the second host; a first cache region, a first RDMA process and a first scheduling center are arranged in the first host; a second cache region, a second RDMA process and a second scheduling center are arranged in the second host; the first RDMA process communicatively coupled with the second RDMA process via RDMA techniques to communicatively couple the first host with the second host; the subscription information is generated by the application in the first host acquired by the first scheduling center and is sent to the subscription information of the second cache area from the first cache area;
sending the actual information from the second cache to the first cache via the second RDMA process.
CN202111485397.4A 2021-12-07 2021-12-07 Message transmission system and message transmission method Pending CN114201313A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111485397.4A CN114201313A (en) 2021-12-07 2021-12-07 Message transmission system and message transmission method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111485397.4A CN114201313A (en) 2021-12-07 2021-12-07 Message transmission system and message transmission method

Publications (1)

Publication Number Publication Date
CN114201313A true CN114201313A (en) 2022-03-18

Family

ID=80650993

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111485397.4A Pending CN114201313A (en) 2021-12-07 2021-12-07 Message transmission system and message transmission method

Country Status (1)

Country Link
CN (1) CN114201313A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1474568A (en) * 2002-08-06 2004-02-11 华为技术有限公司 Direct internal storage access system and method of multiple path data
CN103530167A (en) * 2013-09-30 2014-01-22 华为技术有限公司 Virtual machine memory data migration method and relevant device and cluster system
CN109471816A (en) * 2018-11-06 2019-03-15 西安微电子技术研究所 A kind of PCIE bus dma controller and data transfer control method based on descriptor
CN109491809A (en) * 2018-11-12 2019-03-19 西安微电子技术研究所 A kind of communication means reducing high-speed bus delay

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1474568A (en) * 2002-08-06 2004-02-11 华为技术有限公司 Direct internal storage access system and method of multiple path data
CN103530167A (en) * 2013-09-30 2014-01-22 华为技术有限公司 Virtual machine memory data migration method and relevant device and cluster system
CN109471816A (en) * 2018-11-06 2019-03-15 西安微电子技术研究所 A kind of PCIE bus dma controller and data transfer control method based on descriptor
CN109491809A (en) * 2018-11-12 2019-03-19 西安微电子技术研究所 A kind of communication means reducing high-speed bus delay

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
IBM官方技术文档: "Shared Memory Communications overRemote Direct Memory Access", pages 1, Retrieved from the Internet <URL:https://www.ibm.com/docs/en/zos/2.1.0?topic=bts-shared-memory-communications-over-remote-direct-memory-access> *
微软官方技术文档: "Register Buffer、Buffer Descriptor V1 Structure", pages 2, Retrieved from the Internet <URL:https://learn.microsoft.com/pdf?url=https://learn.microsoft.com/en-us/openspecs/windows_protocols/ms-smbd/toc.json> *

Similar Documents

Publication Publication Date Title
CN101015187B (en) Apparatus and method for supporting connection establishment in an offload of network protocol processing
US20190335010A1 (en) Systems and methods for providing messages to multiple subscribers
CN111277616B (en) RDMA-based data transmission method and distributed shared memory system
EP3482298B1 (en) Multicast apparatuses and methods for distributing data to multiple receivers in high-performance computing and cloud-based networks
CN110602156A (en) Load balancing scheduling method and device
EP1883240B1 (en) Distributed multi-media server system, multi-media information distribution method, program thereof, and recording medium
CN102546612B (en) Remote procedure call implementation method based on remote direct memory access (RDMA) protocol in user mode
CN110134534B (en) System and method for optimizing message processing for big data distributed system based on NIO
CN111404931B (en) Remote data transmission method based on persistent memory
CN112291293B (en) Task processing method, related equipment and computer storage medium
CN107093138A (en) Auction Ask-Bid System and its operation method based on distributed clog-free asynchronous message tupe
CN102831018B (en) Low latency FIFO messaging system
CN109547519B (en) Reverse proxy method, apparatus and computer readable storage medium
CN110336702A (en) A kind of system and implementation method of message-oriented middleware
US10609125B2 (en) Method and system for transmitting communication data
CN110535811B (en) Remote memory management method and system, server, client and storage medium
US8135851B2 (en) Object request broker for accelerating object-oriented communications and method
Yu et al. High performance and reliable NIC-based multicast over Myrinet/GM-2
Mamidala et al. Efficient SMP-aware MPI-level broadcast over InfiniBand's hardware multicast
CN115834660B (en) Non-blocking RDMA connection establishment method and device
CN114201313A (en) Message transmission system and message transmission method
Mansley Engineering a user-level TCP for the CLAN network
CN101123567A (en) Method and system for processing network information
US8176117B2 (en) Accelerator for object-oriented communications and method
Qi et al. X-IO: A High-performance Unified I/O Interface using Lock-free Shared Memory Processing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination