WO2023058232A1 - 通信システム、中間装置、通信方法、および、プログラム - Google Patents

通信システム、中間装置、通信方法、および、プログラム Download PDF

Info

Publication number
WO2023058232A1
WO2023058232A1 PCT/JP2021/037380 JP2021037380W WO2023058232A1 WO 2023058232 A1 WO2023058232 A1 WO 2023058232A1 JP 2021037380 W JP2021037380 W JP 2021037380W WO 2023058232 A1 WO2023058232 A1 WO 2023058232A1
Authority
WO
WIPO (PCT)
Prior art keywords
intermediate device
buffer
request
unit
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/JP2021/037380
Other languages
English (en)
French (fr)
Japanese (ja)
Inventor
智也 日比
潤紀 市川
暢 間野
和也 穴澤
幸男 築島
健司 清水
秀樹 西沢
耕一 高杉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NTT Inc
Original Assignee
Nippon Telegraph and Telephone Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nippon Telegraph and Telephone Corp filed Critical Nippon Telegraph and Telephone Corp
Priority to JP2023552663A priority Critical patent/JP7671008B2/ja
Priority to US18/697,928 priority patent/US20240414092A1/en
Priority to PCT/JP2021/037380 priority patent/WO2023058232A1/ja
Publication of WO2023058232A1 publication Critical patent/WO2023058232A1/ja
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/30Flow control; Congestion control in combination with information about buffer occupancy at either end or at transit nodes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/28Flow control; Congestion control in relation to timing considerations
    • H04L47/283Flow control; Congestion control in relation to timing considerations in response to processing delays, e.g. caused by jitter or round trip time [RTT]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/90Buffering arrangements
    • H04L49/9005Buffering arrangements using dynamic buffer space allocation

Definitions

  • the present invention relates to a communication system, an intermediate device, a communication method, and a program.
  • a system in which a device such as a transponder installed in an optical transmission system is installed in a client system such as a server in a new network is under consideration.
  • an electric signal reaches a communication partner server or the like without being subjected to electric-optical conversion during transmission.
  • RDMA Remote Direct Memory Access
  • Infiniband is a communication protocol that performs high-speed and highly reliable data transfer between communication terminals over long distances.
  • RDMA enables high-speed communication because direct memory access is made from the memory area of the transmitter to the memory area of the receiver.
  • RDMA has the problem that the longer the network becomes, the lower the transfer performance becomes. This is because extended lines require more transfer time, and connection-oriented protocols cannot send the next packet until a packet indicating data transfer completion is received.
  • the present invention has been made in view of the above circumstances, and an object of the present invention is to realize high-speed and highly reliable data transfer even if the transfer destination of RDMA is long distance.
  • one aspect of the present invention is a communication system comprising a first intermediate device and a second intermediate device, wherein the first intermediate device and the second intermediate device are remote direct disposed between a first device and a second device that transfer data using memory access, wherein the first intermediate device receives delay information about a network delay between the second intermediate device and the second intermediate device; a buffer management unit for determining a buffer size and securing a buffer of said buffer size; a transfer unit that transfers a request including data to the second intermediate device; and a generation unit that generates a pseudo-response to the request and returns it to the first device, wherein the second intermediate device comprises: a buffer management unit that determines a buffer size based on the delay information and secures a buffer of the buffer size; and a buffer management unit that transfers the request to the second device, stores the request in the buffer, and obtains its own credit. and a discarding unit that discards the response to the request received from the second device, deletes the request stored in the buffer, and
  • One aspect of the present invention is an intermediate device disposed between a first device and a second device that transfer data using remote direct memory access, the network between the intermediate device to which data is transferred.
  • a buffer management unit that determines a buffer size based on delay information and secures a buffer of the buffer size; If the request is small, the transfer unit transfers the request including the data to the transfer destination intermediate device, and the generation unit generates a pseudo-response to the request and returns it to the first device.
  • One aspect of the present invention is a communication method performed by a communication system comprising a first intermediate device and a second intermediate device, wherein the first intermediate device and the second intermediate device perform remote direct memory access. is placed between a first device and a second device that transfer data using a and reserving a buffer of said buffer size; and if data to be transferred from said first device to said second device is less than the credit of said second intermediate device, sending a request containing said data; transferring to the second intermediate device; generating a pseudo-response to the request and returning it to the first device; determining a size and reserving a buffer of said buffer size; forwarding said request to said second device, storing said request in said buffer and updating its own credit; and discarding the response to the request received from the device, deleting the request stored in the buffer, and updating the credit.
  • One aspect of the present invention is a program that causes a computer to function as the intermediate device.
  • FIG. 1 is a diagram for explaining an RDMA communication model.
  • FIG. 2 is a diagram for explaining RDMA SEND.
  • FIG. 3 is a diagram showing an example of the configuration of the communication system according to the first embodiment.
  • FIG. 4 is a diagram illustrating a configuration example of the first intermediate device.
  • FIG. 5 is a diagram showing a configuration example of the second intermediate device.
  • FIG. 6 is a diagram showing a local configuration example.
  • FIG. 7 is a sequence diagram showing an example of the processing flow of the communication system shown in FIG.
  • FIG. 8 is a diagram for explaining an example of a method of creating a table and resolving the destination QPN of the response.
  • FIG. 1 is a diagram for explaining an RDMA communication model.
  • FIG. 2 is a diagram for explaining RDMA SEND.
  • FIG. 3 is a diagram showing an example of the configuration of the communication system according to the first embodiment.
  • FIG. 4 is a diagram illustrating a configuration example of the first intermediate
  • FIG. 9 is a diagram for explaining an example of a method of notifying the source QPN and resolving the destination QPN of the response.
  • FIG. 10 is a diagram showing an example of the configuration of a communication system according to the second embodiment.
  • FIG. 11 is a diagram illustrating a configuration example of an orchestrator.
  • FIG. 12 is a diagram illustrating a configuration example of a transmission device.
  • 13 is a sequence diagram showing an example of the flow of processing of the communication system shown in FIG. 10.
  • FIG. FIG. 14 is a diagram illustrating an example of the configuration of a communication system according to modification 1 of the second embodiment.
  • FIG. 15 is a sequence diagram showing an example of the processing flow of the communication system according to Modification 2 of the second embodiment.
  • FIG. 10 is a diagram showing an example of the configuration of a communication system according to the second embodiment.
  • FIG. 11 is a diagram illustrating a configuration example of an orchestrator.
  • FIG. 12 is a diagram illustrating a configuration example of
  • FIG. 16 is a diagram illustrating an example of the configuration of a communication system according to modification 3 of the second embodiment.
  • FIG. 17 is a diagram illustrating an example of the configuration of a communication system according to the third embodiment;
  • FIG. 18 is a diagram showing an example of the configuration of a communication system that does not pass through an intermediate device according to the third embodiment.
  • FIG. 19 is a diagram showing an example of the configuration of another communication system according to the third embodiment. It is a hardware configuration example.
  • RDMA is a communication protocol that provides direct memory access from the memory area of the transmitting device to the memory area of the receiving device. In addition to having a credit-based flow control function, RDMA performs Completion control to confirm the completion of data transfer and proceed with processing, so highly reliable communication is possible. RDMA is also used as a transport method for host-to-device and device-to-device data communication between SSDs (Solid State Drives) and GPUs (Graphics Processing Units).
  • SSDs Solid State Drives
  • GPUs Graphics Processing Units
  • RDMA is a communication model in which a QP (Queue Pair) is configured between a local device and a remote device and data is transferred using the QP.
  • QP is a set of SQ (Send Queue) and RQ (Receive Queue).
  • the communication unit of RDMA is a communication request called WR (Work Request), which is loaded on SQ/RQ in units of WQE (Work Queue Element).
  • WR includes Send WR, which is a request to send, and Receive WR, which is a request to receive.
  • SendWR the memory area of the data to be sent is specified in WQE and loaded in SQ.
  • Receive WR specify the memory area where you want to receive data in WQE and load it in RQ.
  • WQEs as the queue size of SQ/RQ can be accumulated in SQ/RQ by FIFO (First-In-First-Out).
  • FIFO First-In-First-Out
  • CQ Completion Queue Entry
  • CQ Charge Queue
  • RDMA service types are roughly divided into four types: RC (Reliable Connection), RD (Reliable Datagram), UC (Unreliable Connection), and UD (Unreliable Datagram), according to Reliable/Unreliable and Connection/Datagram.
  • RC and UD are commonly used.
  • RC guarantees the order and reachability of messages by means of acknowledgment of communication success/abnormality and retransmission by ACK/NAK.
  • RC is also connection-oriented, providing one-to-one communication between local-remote QPs.
  • UD does not have a mechanism for acknowledgment and retransmission
  • many-to-many communication such as transmission to multiple QPs and reception from multiple QPs is possible by specifying a destination for each communication.
  • RDMA WRITE WITH Immediate
  • RDMA READ RDMA READ
  • ATOMIC Operations All of these are available in RC. Only SEND can be used in UD.
  • Retransmission control in RDMA is classified into three patterns: when ACK/NAK is not returned, when RNR (Receiver-Not-Ready) NAK is returned, and when Out-Of-sequence NAK is returned. . If ACK or NAK is not returned from the remote side within a certain period of time, the local side will time out and retransmit. Also, the remote side returns RNR NAK when WQE cannot be prepared by RQ. If RNR NAK is returned from the remote side, the local side will resend after a certain period of time. Also, the remote side returns Out-of-sequence NAK when the PSN (Packet Sequence Number) of the received packet is out of order. If an Out-Of-sequence NAK is returned from the remote side, the local side will resend without waiting.
  • PSN Packet Sequence Number
  • FIG. 2 is an explanatory diagram for explaining RDMA SEND.
  • SEND is the basic send/receive model of RDMA, sending data from local to remote.
  • the local When communication is ready, the local sends data with SEND. When the remote successfully receives the data, it loads CQE in CQ, releases WQE in RQ, and returns ACK to the local. When the local receives the ACK, it loads the CQ with the CQE and releases the WQE in the SQ.
  • SEND has a special operation, SEND w/Imm (SEND with Immediate).
  • SEND w/Imm a special field (imm_data) can be set in the WQE of the local SQ, and imm_data can be sent simultaneously when data is sent from the local to the remote.
  • imm_data can be sent simultaneously when data is sent from the local to the remote.
  • the remote When the remote successfully receives the data, it loads the CQ with a CQE containing imm_data.
  • the contents of imm_data can be known remotely by referring to the CQE.
  • Intermediate devices 10A and 10B are placed between local 30 and remote 50 that transfer data using RDMA. More specifically, the intermediate device 10A is arranged in front of the long-distance network 9 (network) on the local 30 side, and the intermediate device 10B is arranged in front of the network 9 on the remote 50 side.
  • the intermediate device 10A receives a request (SEND, etc.) from the local 30 and returns a pseudo-response to the request to the local 30.
  • the intermediate device 10B transfers the request transmitted from the intermediate device 10A to the remote 50 and discards the response (ACK or the like) from the remote 50 .
  • the intermediate devices 10A and 10B implement credit-based flow control and periodically transmit their own credits (free buffer capacity) to the counterpart device at a predetermined timing.
  • the communication system may also include local 30 and remote 50 that transfer data using RDMA.
  • a local 30 (first device) is a data transfer source device.
  • the remote 50 (second device) is a data transfer destination device.
  • FIG. 4 shows a configuration example of the intermediate device 10A (first intermediate device).
  • the intermediate device 10A includes a transfer unit 11, a generation unit 12A, a network state measurement unit 13, a buffer 14 (temporary data storage unit), a buffer management unit 15, a credit management unit 16, and a communication unit 17. .
  • the transfer unit 11, the generation unit 12A, and the network state measurement unit 13 are implemented on software (CPU, memory, storage, etc.), but they may be implemented on the NIC. .
  • the buffer 14, the buffer management unit 15, the credit management unit 16, and the communication unit 17 are implemented by NIC, but part of them may be implemented by software.
  • the intermediate device 10A may be implemented as a virtual machine or a container.
  • the transfer unit 11 receives requests from the local 30 and transfers them to the remote 50 .
  • This request is, for example, the aforementioned SEND, SEND w/Imm, RDMA WRITE, RDMA WRITE w/Imm, or ATOMIC Command.
  • a request includes data or an operation on data to be sent from the local 30 to the remote 50 .
  • Transfer unit 11 transmits the request via communication unit 17 .
  • the transfer unit 11 of this embodiment transfers the request including the data to the intermediate device 10B. do.
  • the credit management unit 16 receives credits periodically transmitted from the intermediate device 10B.
  • the transfer unit 11 transmits data smaller than the credit of the intermediate device 10B. As long as it is within the estimated credit range, it is possible to transmit data having a size exceeding the credit notified from the intermediate device 10B. Further, when it is estimated that the retransmission cost is small, or when it is estimated that the buffer of the intermediate device 10B will be empty by the time the communication data reaches the intermediate device 10B, the transfer unit 11 receives the data from the intermediate device 10B. You may send data whose size exceeds the notified credit. This makes it possible to improve the efficiency of data communication in credit-based flow control.
  • the transfer unit 11 queues (stores) the request in the buffer 14 . , waits until the credit of the intermediate device 10B increases. In this case, the transfer unit 11 updates the credit of its own buffer 14 .
  • the generation unit 12A Upon transferring the request to the intermediate device 10B, the generation unit 12A generates a pseudo-response for the request and returns it to the local 30. Specifically, the generator 12A picks up a request sent from the local 30 and flagged as Only or Last, and generates a pseudo-response using the PSN included in the request. The generator 12A returns the generated pseudo-response to the local 30.
  • the local 30 When the local 30 receives the pseudo-response, it recognizes it as a response from the remote 50, adds CQE to the CQ, and completes normally. This allows the WQE of the local 30 SQ to be forcibly released.
  • the network state measuring unit 13 measures the state of the network 9 between the intermediate devices 10A and 10B. Specifically, the network state measuring unit 13 measures delay information regarding the delay of the network 9 by transmitting packets or the like. The delay information includes the transmission delay and transmission capacity of the network 9 (path between intermediate devices 10A and 10B).
  • the buffer management unit 15 acquires delay information of the network 9 between the local 30 and the remote 50, determines the buffer size of the buffer 14 based on the delay information, and secures the buffer 14 of the determined size. For example, the buffer management unit 15 may calculate a value equal to or greater than two times the product of the transmission delay of the delay information and the transmission capacity (bandwidth) as the buffer size. Data is temporarily stored in the buffer 14 .
  • the credit management unit 16 implements credit-based flow control.
  • the credit management unit 16 manages data related to flow control such as credit.
  • the credit management unit 16 periodically receives a credit signal from the intermediate device 10B via the communication unit 17, and updates the credit stored in the credit management unit 16 using the credit signal and a predetermined credit update method. do.
  • the credit signal received from the intermediate device 10B includes a credit indicating the free buffer size that can be received by the intermediate device 10B (Kung & Morris, Credit-based flow control for ATM networks, IEEE Network, 9(2), 40-48 (1995) ).
  • the credit management unit 16 may periodically transmit the credit (empty buffer size) of its own buffer 14 to the intermediate device 10B via the communication unit 17 .
  • the communication unit 17 is a network interface for communicating with other devices such as the local device 30 and the intermediate device 10B.
  • FIG. 5 shows the configuration of the intermediate device 10B (second intermediate device).
  • the intermediate device 10B includes a transfer unit 11, a discarding unit 12B, a network state measurement unit 13, a buffer 14, a buffer management unit 15, a credit management unit 16, and a communication unit 17.
  • the transfer unit 11 transfers the request sent by the local 30 to the remote 50 via the intermediate device 10A.
  • the transfer unit 11 of this embodiment transfers the request to the remote 50, queues the request in the buffer 14, and updates its own credit.
  • the discarding unit 12B discards the response to the request received from the remote 50. This prevents duplicate reception of responses at the local 30 . Furthermore, if RNR or out-of-sequence NAK transmitted from the remote 50 arrives at the local 30, it may cause a malfunction, so the discarding unit 12B also discards these NAKs.
  • the discarding unit 12B of this embodiment discards the response to the request received from the remote 50, deletes the request queued in the buffer 14, and updates the credit.
  • Network state measurement unit 13, buffer 14, buffer management unit 15, credit management unit 16 and communication unit 17 of intermediate device 10B are connected to network state measurement unit 13, buffer 14, buffer management unit 15 and credit management unit 16 of intermediate device 10A. and communication unit 17 .
  • the configuration of the local 30 is shown in FIG.
  • the local 30 includes an application unit 31 , a queue management unit 32 , a network state management unit 33 , a temporary data storage unit 34 , a determination unit 35 , a packet allocation unit 36 and a communication unit 37 .
  • the application unit 31 transmits and receives requests including transfer data to and from the remote 50 using RDMA communication.
  • the queue management unit 32 manages each queue (CQ, SQ) of the temporary data storage unit 34.
  • the queue management unit 32 may calculate the required queue size based on the delay information of the network 9 and determine the queue depth (QueueDepth) of the temporary data storage unit 34 .
  • the temporary data storage unit 34 of the local 30 has CQ and SQ as queues.
  • the network status management unit 33 manages the status of the network 9. Specifically, the network status management unit 33 may acquire delay information about the delay of the network 9 from another device, or may measure the delay information by transmitting packets or the like.
  • the delay information includes transmission delay, transmission capacity, and the like.
  • the distribution unit 36 distributes the data (packets) received via the communication unit 37 to the corresponding QPs of the temporary data storage unit 34 . If the received data is delay information, it is distributed to the network state management section 33 or the queue management section 32 .
  • the determination unit 35 may determine whether to transmit the request via the intermediate devices 10A and 10B based on the delay information.
  • the queue management unit 32 may determine the depth of the queue in the temporary data storage unit 34 based on the delay information when determining not to pass through the intermediate devices 10A and 10B.
  • the local 30 of this embodiment does not have to include the network state management section 33 and the determination section 35 .
  • the communication unit 37 is a network interface for communicating with other devices such as the intermediate device 10A.
  • the queue management unit 32 and the network status management unit 33 may be implemented as NIC functions.
  • the application unit 31 is not limited to being implemented in the CPU and memory of the local 30, and may be implemented in a hardware accelerator such as a GPU, FPGA, or NIC connected via another internal bus.
  • the configuration of the remote 50 is similar to that of the local 30 shown in FIG. However, the temporary data storage unit 34 of the remote 50 has CQ and RQ.
  • each intermediate device 10A, 10B measures the delay information (transmission delay, transmission capacity) of the network 9 and reserves the buffer 14 based on the measurement result. Specifically, the intermediate devices 10A and 10B measure the state of the network 9 by sending packets or the like. The intermediate devices 10A and 10B calculate the required buffer size based on the delay information, and secure the buffer 14 of the buffer size. For example, the intermediate devices 10A and 10B may calculate a buffer size that is twice or more the product of the transmission delay time and the transmission capacity (bandwidth).
  • the intermediate devices 10A and 10B may exchange communication modes that can be used by themselves with the counterpart device based on the measurement results in step S10, and determine the communication mode to be used.
  • step S20 the local 30 and the remote 50 transmit and receive data via the intermediate devices 10A and 10B. Specifically, the local 30 stores WQE in SQ and transmits a request to the remote 50 (step S21).
  • a case where a SEND request is transmitted will be described as an example.
  • the intermediate device 10A compares the data size of the request received from the local 30 and the credit received from the intermediate device 10B. If the data size of the request is smaller than the credit of the intermediate device 10B, that is, if the credit remains, the intermediate device 10A transmits the request to the intermediate device 10B (step S22). At this time, the intermediate device 10A may queue the request received from the local 30 in a buffer and update its own credit.
  • the intermediate device 10A generates a pseudo-response (pseudo-ACK) using the PSN included in the request, and returns the pseudo-response to the local 30 (step S23).
  • the local 30 receives the pseudo-response, it loads the CQE with the CQE and releases the WQE of the SQ.
  • the intermediate device 10B transmits the request transmitted from the intermediate device 10A to the remote 50 (step S24) and queues the request in the buffer 14. and update the credit (step S25).
  • the intermediate device 10B transmits NACK to the intermediate device 10A (step S26) and receives the request resent from the intermediate device 10A (step S27). If no packet loss has occurred in the resent request, the intermediate device 10B proceeds to S24.
  • step S21 If the data size of the request in step S21 is greater than or equal to the credit of the intermediate device 10B, that is, if the credit of the intermediate device 10B is insufficient, the intermediate device 10A queues the request in the buffer 14 and It waits until the credit of 10B increases (step S28). If the credit of the intermediate device 10B becomes larger than the data size of the request, the intermediate device 10A proceeds to S22.
  • the remote 50 When the remote 50 successfully receives the request in step S24, it sends an ACK response to the intermediate device 10B (step S29).
  • the intermediate device 10B discards the response received from the remote 50, releases the request queued in the buffer 14 in step S25, and updates the credit (step S30).
  • step S24 if the request in step S24 cannot be received normally, the remote 50 transmits a NACK response to the intermediate device 10B (step S31), and receives the request resent from the intermediate device 10B (step S32). If the resent request is normally received, the remote 50 proceeds to step S29, and the intermediate device 10B performs step S30.
  • the intermediate device 10B periodically transmits its own credits to the intermediate device 10A at a predetermined timing using a timer (not shown) (step S30).
  • the root node 10A acquires the credit of the root node 10B and updates the credit of the root node 10B stored in the credit management unit 16.
  • QP has different QPN for each endpoint.
  • the SQ/RQ recognizes the QPN of the opposite side, and includes the destination QPN in the header when generating the RDMA packet.
  • the QPN of the source is not included in the header.
  • the intermediate device 10A When the intermediate device 10A generates a pseudo-response, the destination of the pseudo-response is unknown because the received request does not contain information indicating the QPN of the transmission source. Therefore, in this embodiment, the destination of the pseudo-response is specified by the following two methods.
  • the first method is to inspect the exchange of the original RDMA request and response and store the QPN combination in a table.
  • the same PSN is used for RDMA packet Only or Last requests and ACKs. Therefore, the intermediate device 10A inspects the passing requests and responses, and adds the destination QPN of each header of the Only or Last request and ACK having the same PSN to the table as a combination.
  • the destination QPNs of the request and response headers with the same PSN are 0x000020 and 0x000010, respectively, so add the combination of 0x000010 and 0x000020 to the table.
  • the local 30 constitutes a QP between each of the remote 50A and the remote 50B.
  • the intermediate device 10A When the intermediate device 10A generates a pseudo-response, it acquires a combination of QPNs including the destination QPN of the request from the table, and sets the other QPN of the combination as the destination QPN of the pseudo-response. For example, when receiving a request with a destination QPN of 0x000020, the intermediate device 10A acquires a combination of 0x000010 and 0x000020 including 0x000020 from the table, and sets the destination QPN of the pseudo-response to 0x000010.
  • the second method is to put the Source QPN on the RDMA packet.
  • WQE has a 32-bit immDt (immediate Date) field, and any 32-bit information can be written in the immDt field only for SEND with immediate or RDMA WRITE with immediate.
  • the local 30 has an insertion unit 38, and the insertion unit 38 writes the QPN of the local 30 side SQ into the immDt field of the WQE of the local 30 side SQ.
  • the intermediate device 10A When the intermediate device 10A generates a pseudo-response, it sets the QPN written in the immDt field of the received request as the destination QPN of the pseudo-response.
  • the embodiment described above is a communication system comprising the intermediate device 10A and the intermediate device 10B.
  • the intermediate device 10A determines the buffer size based on the delay information about the delay of the network 9 between the intermediate device 10B and the buffer management unit 15 which secures the buffer 14 of the buffer size, and the local device 30 to the remote 50 a transfer unit 11 that transfers a request including the data to the intermediate device 10B when the data to be transferred is smaller than the credit of the intermediate device 10B; a generation unit 12A that generates a pseudo response to the request and returns it to the local 30; Prepare.
  • the intermediate device 10B determines the buffer size based on the delay information, transfers the request to the remote 50, stores the request in the buffer 14, and stores the request in the buffer management unit 15 that secures the buffer 14 of the buffer size. and a discarding unit 12B that discards the response to the request received from the remote 50, deletes the request stored in the buffer 14, and updates the credit.
  • the local 30 releases the WQE of SQ in response to a pseudo-response from the intermediate device 10A, so even if the RTT (Round Trip Time) between the local 30 and remote 50 is large, the High-bandwidth data transfer can be realized without waiting for
  • intermediate device 10A transmits data in consideration of the amount of data that intermediate device 10B can receive. Therefore, it is possible to prevent transmission of data exceeding the capacity of the intermediate device 10B and avoid data loss on the remote side. Therefore, in this embodiment, even if the RDMA transfer destination is long distance, high-speed and highly reliable data transfer can be realized, and communication between the intermediate devices 10A and 10B can be guaranteed.
  • data loss can be dealt with by providing the intermediate devices 10A and 10B with a retransmission function for NACK.
  • the orchestrator 70 calculates or measures the delay when setting the optical path (transmission line) of the network 9 and notifies the intermediate devices 10A and 10B.
  • FIG. 10 is a diagram showing an example of the configuration of the communication system of this embodiment.
  • the communication system of this embodiment includes intermediate devices 10A and 10B and an orchestrator 70 .
  • the intermediate devices 10A and 10B of the present embodiment are the same as the intermediate devices 10A and 10B (FIGS. 4 and 5) of the first embodiment. However, the network state measurement units 13 of the intermediate devices 10A and 10B of this embodiment acquire delay information (transmission delay, transmission capacity, optical path information, etc.) from the orchestrator 70 without measuring the state of the network 9. do.
  • delay information transmission delay, transmission capacity, optical path information, etc.
  • the orchestrator 70 monitors and manages the entire network 9, such as an optical transport network, and centrally controls it.
  • the orchestrator 70 operates between the upper layer computer (local 30, remote 50) and the lower layer optical transport network, based on the communication requirements from the computer and the state of the optical transmission path of the optical transport network.
  • An optical path is constructed in the optical transport network by setting and controlling the optical network equipment. This makes it possible to automatically optimally control the optical transport network in response to requests from geographically distributed computers without human intervention.
  • FIG. 9 is a diagram showing an example of the configuration of the orchestrator 70.
  • the orchestrator 70 of this embodiment sets an optical path in the network 9 (optical transport network) according to communication requirements from the local 30, and calculates or measures delay information of the optical path.
  • the illustrated orchestrator 70 includes a communication request reception unit 71, an ACK transmission unit 72, an ACK reception unit 73, a result output/transmission unit 74, a scheduler unit 75, a design unit 76, and a topology information storage unit 77. , a node information storage unit 78 and a state monitoring/management unit 79 .
  • the communication request receiving unit 71 receives a communication request including communication requirements regarding communication from the local 30 (or remote 50).
  • Communication requirements include the type of communication application, required bandwidth, total amount of data, allowable delay time, task completion time, bit error rate (BER), power, and the like.
  • the communication request is to complete this task within 10 msec, to back up data, to transfer 1 Tbite of data from the local 30 to the remote 50, and the like.
  • the ACK transmission unit 72 After the ACK transmission unit 72 receives a setting information setting completion response from each of the transmission devices 20A and 20B (optical NW devices) in the network 9, it transmits an optical path setting completion notification to the local 30 and the remote 50.
  • FIG. the ACK transmission unit 72 sends an ACK indicating the completion of the construction of the optical path to the requesting source as soon as the local 30 of the request source and the remote 50 of the request destination are ready for communication after the construction of the optical path is completed in the network 9. notifies the local 30 of
  • the ACK receiving unit 73 receives, from each of the transmission devices 20A and 20B, a setting completion response of setting information for the optical NW device included in the device.
  • the result output/transmission unit 74 transmits the setting information for the optical network device of each transmission device 20A, 20B to each transmission device 20A, 20B. In other words, the result output/transmission unit 74 transmits the following setting information designed and selected for optimum control of the network 9 to the optical network device group constituting the network 9 .
  • ⁇ Optical path between computers ⁇ OEO (optical-electro-optical) conversion point ( DSP insertion position) ⁇ (i) modulation method, (ii) baud rate, (iii) transmission power, and (iv) FEC (Forward Error Correction) overhead to be set in the transceiver of each transmission device 20A, 20B More specifically, the following setting information is assumed.
  • modulation method (ii) baud rate, (i) modulation method, (ii) baud rate, ( iii) Transmission power, (iv) FEC (overhead) settings ⁇ Optical paths between computers, OEO conversion points, and transmission equipment 20A designed and selected to minimize resource usage of the entire optical transport network , (i) modulation method, (ii) baud rate, (iii) transmission power, and (iv) FEC (overhead) settings for the group of optical network devices existing in 20B.
  • the optical NW devices include transceivers (transponders), optical cross connects (OXCs), optical add/drop multiplexers (ROADMs), amplifiers, and the like.
  • the scheduler unit 75 manages and schedules communication requests received from the local 30 . Since it takes a certain amount of time to complete the design and opening of an optical path, when a plurality of communication requests coexist, the scheduler unit 75 schedules communication requests with higher priority first.
  • the design unit 76 determines an optical path to be used for communication of the local 30 based on the communication requirements received from the local 30 and the state of the optical transmission line of the network 9. In order to construct the optical path, each transmission device 20A, The setting information to be set in the optical NW device of 20B is calculated respectively.
  • the design unit 76 includes an optical path design/selection unit 761, a required bandwidth calculation unit 762, a transmission mode selection unit 763, a power calculation unit 764, an OSNR calculation unit 765, a BER calculation unit 766, and a multiflow determination unit. 767 , a delay calculator 768 , and a task completion time calculator 769 .
  • the optical path design/selection unit 761 lists a set of candidate optical paths and selects the optimum optical path based on the communication requirements received from the local 30 .
  • Optimal means, for example, the case where the delay, task completion time, required bandwidth, resource usage of the entire network 9, and power consumption of the entire network 9 are optimal.
  • the optical path design/selection unit 761 cooperates with the necessary bandwidth calculation unit 762 to the delay measurement unit 770, and the network 9 stored in the topology information storage unit 77 to the state monitoring/management unit 79. See information about Although there is a trade-off relationship between the transmission capacity and the distance, references 1 to 3, which will be described later, can be used for methods of selecting the optimum optical transmission mode according to the distance in consideration of the relationship.
  • the required bandwidth calculation unit 762 calculates the required bandwidth based on the communication requirements received from the local 30 in communication between the local 30 and the remote 50 (request source, request destination).
  • the transmission mode selection unit 763 lists candidate transmission modes based on the bandwidth required for communication between the local 30 and the remote 50 and the type of application.
  • the transmission mode selector 763 calculates the modulation scheme, baud rate, FEC, etc. to be set for each transceiver for a candidate transmission mode on a candidate optical path between the local 30 and the remote 50 .
  • reference 4 WO2020/031514 A1
  • a power calculator 764 calculates the appropriate transmit power to be set for each transceiver on a candidate optical path between the local 30 and the remote 50 .
  • the OSNR calculator 765 calculates the OSNR on a candidate optical path between the local 30 and the remote 50 .
  • the OSNR calculator 765 uses the network state of an optical path between the local 30 and the remote 50 as an input value, and the OSNR on the input optical path as an output value.
  • Reference 1 can be used for the calculation method of OSNR.
  • Reference 1 is "A. Ferrari, et al., ⁇ GNPy: an open source application for physical layer aware open optical networks'', in Journal of Optical Communications and Networking, vol.12, no.6, 2020, p. .C31-p.C40".
  • a BER calculator 766 calculates the BER based on the OSNR and margin on a certain optical path. For example, referring to Reference Document 2, BER p, ⁇ when an optical path p is established at a wavelength ⁇ can be calculated by Equation (1).
  • BER p, ⁇ ⁇ (OSNR p, ⁇ - M T ( ⁇ ) - M d ( ⁇ )) (1) Note that ⁇ (.) is the BER formula for a particular modulation format for a particular optical path. M T ( ⁇ ) is the system margin based on aging deterioration. M d ( ⁇ ) is the design margin.
  • Reference 2 is "P. Soumplis, 4 others, “Multi-period planning with actual physical and traffic conditions”, in IEEE/OSA Journal of Optical Communications and Networking, vol.10, no.1, 2018, p. .A144-p.A153”.
  • the multi-flow determination unit 767 determines whether or not multi-flow is required on a section constituting the optical path on a candidate optical path between the local 30 and the remote 50 based on the communication requirements received from the local 30. judge.
  • Reference 3 can be used for technology related to multi-flow communication.
  • a delay calculator 768 calculates the transmission delay time on an optical path between the local 30 and the remote 50 .
  • the delay calculator 768 takes the optical path between the local 30 and the remote 50 and the topology information of the network 9 as an input value, and the transmission delay on the input optical path as an output value.
  • the transmission delay time is basically determined by the distance between nodes.
  • the task completion time calculation unit 769 calculates the task completion time required to complete the communication task based on the selected candidate lightpath.
  • the delay measurement unit 770 may actually measure the transmission delay time on an optical path between the local 30 and the remote 50 using each transmission device 20A, 20B.
  • the transmission delay time may be calculated by the delay calculation unit 768 or measured by the delay measurement unit 770 .
  • the topology information storage unit 77 holds topology information regarding connections and distances between the transmission devices 20A and 20B.
  • the topology information is required for optical path design, multiflow determination, and the like.
  • the node information storage unit 78 holds node information regarding the type and number of transceivers present in the transmission devices 20A and 20B.
  • the node information is necessary for optical path design because reception sensitivity (required OSNR) differs depending on the type of transceiver.
  • the node information storage unit 78 also holds node information regarding the type and number of amplifiers present in each transmission device 20A, 20B.
  • the node information is necessary because the noise figure (NF) differs depending on the type of amplifier, such as Raman amplification and EDFA (Erbium Doped Fiber Amplifier).
  • the node information storage unit 78 also holds node information related to the types and configurations (number of ports, etc.) of optical nodes (OXC, ROADM) present in each of the transmission devices 20A, 20B.
  • the status monitoring/management unit 79 holds monitoring/management information regarding the usage status and status of amplifiers, transceivers, and optical nodes present in each transmission device 20A, 20B.
  • the state monitor/manager 79 also holds monitor/manager information on optical signal loss, such as wavelength usage in each link, deterioration over time, splices and connector locations in each link.
  • the monitoring and management information is necessary when selecting optical paths (wavelength paths) because nonlinear effects (especially XPM (Cross Phase Modulation) and FWM (four-wave mixing)) affect loss and BER depending on the wavelength usage conditions. become information.
  • the status monitor/manager 79 updates the stored information as the optical path is added/deleted.
  • the state monitoring/managing unit 79 is connected to the optical transmission line measuring unit of each of the transmission devices 20A and 20B, periodically receives transmission line information regarding the state and margin of the transmission line, and can update the transmission line information sequentially. store in
  • FIG. 12 is a diagram showing an example of the configuration of each transmission device 20A, 20B.
  • the transmission devices 20A and 20B are nodes forming the network 9.
  • FIG. Each transmission device 20A, 20B includes a result reception unit 21, a control unit 22, an optical NW device 23, an ACK transmission unit 24, and an optical transmission path measurement unit 25.
  • the result receiving unit 21 receives setting information for the optical network device 23 of its own node from the orchestrator 70 . That is, the result receiving unit 21 receives setting information to be set in the optical network device 23 from the orchestrator 70 and passes the setting information to the control unit 22 .
  • the control unit 22 sets and controls the setting information received from the orchestrator 70 to the optical network device 23 of its own node. That is, the control unit 22 sets and controls the optical network device 23 based on the setting information received from the orchestrator 70 as follows.
  • ⁇ Set transmission mode (modulation method, baud rate, FEC, etc.) and transmission power for each transceiver (transponder)
  • ⁇ Set add-drop/through wavelengths to ROADM and OXC ⁇ Allocation of wavelengths to be amplified to amplifiers
  • Controller 22 notifies the ACK transmission unit 24 of the completion of the setting and control.
  • the optical NW device 23 is a device that constitutes the network 9, such as transceivers, OXCs, ROADMs, and amplifiers.
  • the ACK transmission unit 24 transmits a setting completion response to the orchestrator 70 after completing the setting of the setting information to the optical NW device 23 . That is, the ACK transmission unit 24 notifies the orchestrator 70 of ACK indicating the completion of the setting and control of the optical network device 23 as soon as the setting and control of the optical network device 23 are completed.
  • the optical transmission line measurement unit 25 measures the state/margin of the optical transmission line (optical bus) of the network 9 and transmits it to the orchestrator 70 .
  • the optical transmission line measuring unit 25 is a measuring device that measures the actual state/margin of the optical transmission line. For example, according to Reference 5, the optical transmission line measurement unit 25 estimates and calculates the state of the optical transmission line periodically or upon request based on the signal received by the coherent DSP, and calculates the state of the optical transmission line. Send state information to the orchestrator 70 .
  • the transmission line measurement unit 25 may measure delay information related to delay such as transmission delay time and transmission capacity of the optical transmission line.
  • Reference 5 is "T. Sasai, 8 others, ⁇ Simultaneous Detection of Anomaly Points and Fiber types in Multi-Span Transmission Links Only by Receiver-side Digital Signal Processing'', in Optical Fiber Communication Conference (pp. Th1F-1 ), Optical Society of America, 2020.
  • the application unit 31 of the local 30 of this embodiment transmits a communication request including communication requirements for communication to the orchestrator 70. Further, the network status management unit 33 receives from the orchestrator 70 a setting completion notification (ACK) of the optical path constructed in the network 9 and delay information of the set optical path.
  • the delay information includes transmission delay time, transmission capacity, information on optical paths, and the like.
  • Local 30 may comprise a tunable transceiver capable of dynamically switching wavelengths. Remote 50 is similar to local 30 .
  • FIG. 13 is a sequence diagram showing the operation of the communication system of this embodiment.
  • the local 30 transmits a communication request (path setting request) specifying communication requirements to the orchestrator 70 (step S51).
  • a communication request is a transfer of data or a file, transmission or distribution of video, or the like.
  • the communication requirements include the type of application that performs the communication, required bandwidth, total amount of data, allowable delay time, task completion time, BER, power, and the like.
  • the local 30 may send a communication request to the intermediate device 10A or the transmission device 20A. In that case, the intermediate device 10A or the transmission device 20A transmits a communication request to the orchestrator 70.
  • the orchestrator 70 determines an optical path to be used for communication of the local 30 based on the communication requirements from the local 30 and the state of the network 9 previously received from each transmission device arranged in the network 9, and Setting information to be set in the optical NW device 23 of each transmission device 20 for building a path is calculated (step S52).
  • the orchestrator 70 sets (i) modulation scheme, (ii) symbol rate, (iii) transmission power, ( iv) Compute configuration information for FEC (overhead).
  • the orchestrator 70 performs calculations so that the delay, task completion time, required bandwidth, resource usage of the entire network 9, and power consumption of the entire network 9 are optimized.
  • the orchestrator 70 may calculate configuration information to satisfy the above communication requirements, calculate configuration information to minimize resource usage of the entire network 9, and/or Calculate configuration information so that the amount is minimal.
  • the orchestrator 70 transmits the setting information to the transmission device 20A and the transmission device 20B, respectively, and instructs the setting of the optical bus (steps S53 and S54).
  • Each of the transmission device 20A and the transmission device 20B sets and controls the above setting information in the optical NW device 23 of its own node. For example, based on the setting information, the transmission device 20A and the transmission device 20B set the transmission mode (modulation method, symbol rate, FEC, etc.) and transmission power for each transceiver, and select wavelengths to add-drop/through. Set to ROADM or OXC to allocate wavelengths to be amplified to amplifiers.
  • the transmission device 20A and the transmission device 20B respectively notify the orchestrator 70 of ACK indicating the completion (steps S55 and S56).
  • the orchestrator 70 transmits delay information including the set transmission delay of the optical path, the transmission capacity, the information of the optical path, etc. to the intermediate devices 10A and 10B (steps S57 and S58).
  • the transmission device 20A and the transmission device 20B calculate the buffer size based on the delay information transmitted from the orchestrator 70, and secure the buffer 14 of this size (steps S59, S60). For example, the transmission device 20A and the transmission device 20B calculate a buffer size that is at least twice the product of transmission delay and transmission capacity.
  • the orchestrator 70 After receiving ACKs from all the transmission devices 20, the orchestrator 70 notifies the requesting local 30 of an ACK indicating the completion of the lightpath setting (step S61). After receiving the ACK from the orchestrator 70 , the local 30 starts communication with the remote 50 via the optical path built in the network 9 .
  • the embodiment described above includes the orchestrator 70 that calculates or measures the delay information and notifies the intermediate devices 10A and 10B.
  • the intermediate devices 10A and 10B actually transmit and receive packets and measure the delay information.
  • the delay information is notified to the intermediate devices 10A and 10B when the path is established.
  • the intermediate devices 10A and 10B can shorten the time required to start communication without measuring delay information.
  • the orchestrator 70 may read delay information calculated or measured in advance and notify the intermediate devices 10A and 10B. Also, the orchestrator 70 may notify the intermediate devices 10A and 10B of the delay information periodically, or when the network 9 is changed.
  • FIG. 14 is a diagram showing the configuration of Modification 1 of the second embodiment.
  • the intermediate devices 10A and 10B obtain the optical path delay information from the orchestrator 70, but in the modified example 1, the intermediate devices 10A and 10B acquire optical path delay information from the transmission devices 20A and 20B. Get information.
  • the orchestrator 70 may notify the transmission devices 20A and 20B of the calculated delay information in steps S53 and S54 of FIG.
  • Modification 2 of Second Embodiment In Modification 1, the transmission devices 20A and 20B notify the intermediate devices 10A and 10B of the optical path delay information acquired from the orchestrator 70 . In Modified Example 2, the transmission devices 20A and 20B measure the optical path delay information set by the orchestrator 70 and notify the intermediate devices 10A and 10B.
  • the configuration of this modification is the same as the configuration of modification 1 shown in FIG.
  • FIG. 15 is a sequence diagram showing the operation of the communication system of Modification 2.
  • the transmission device 20A receives a request from the local device 30 (or the intermediate device 10A), it cooperates with the transmission device 20B to determine the transmission mode of the optical path set between the local device 30 and the remote device 50, and transmits the delay information. Measure (step S71).
  • the transmission devices 20A and 20B transmit the measured delay information to the intermediate devices 10A and 10B (steps S72 and S73).
  • the intermediate device 10A measures the delay information up to the transmission device 20A, and the intermediate device 10B measures the delay information up to the transmission device 20B (steps S74, S75).
  • the intermediate device 10A calculates the buffer size based on the delay information of step S71 and the delay information of step S74, and secures the buffer 14 of this size (step S76).
  • the intermediate device 10B calculates the buffer size based on the delay information of step S71 and the delay information of step S75, and secures the buffer 14 of this size (step S77).
  • the intermediate devices 10A, 10B may omit S74 and S75, calculate the buffer size based on the delay information between the transmission devices 20A and 20B, and secure the buffer 14 of this size.
  • FIG. 16 is a diagram showing the configuration of Modification 3 of the second embodiment.
  • the intermediate device has the function of a transmission device.
  • the intermediate device 10C is a device that the intermediate device 10A of the second embodiment has the function of the transmission device 20A
  • the intermediate device 10D is a device that the intermediate device 10B of the second embodiment performs transmission. It is a device having the functions of the device 20B.
  • the functions of the transmission devices 20A and 20B implemented in the intermediate devices 10C and 10D also measure delay information when wavelengths are selected by the transceivers (transponders). .
  • the intermediate devices 10C, 10D determine buffer sizes based on the measured delay information.
  • the local 30 may request the orchestrator 70 for an optical path that meets the communication requirements, and adjust the queue depth of the local 30 from the optical path delay information obtained from the orchestrator 70 .
  • the queue depth may be maintained without the intermediate devices 10A and 10B.
  • the local 30 and remote 50 cannot increase the queue size without limit due to restrictions such as hardware and protocol. Therefore, the local 30 of this embodiment communicates via the intermediate devices 10A and 10B using the communication requirements of the application and the delay information acquired from the orchestrator 70, or adjusts the depth of the queue. determine whether or not to
  • FIG. 17 is a diagram showing an example of a communication system according to the third embodiment.
  • the system of this embodiment includes a local 30, a remote 50, intermediate devices 10A and 10B, an orchestrator 70, and network devices 40A and 40B.
  • the intermediate devices 10A, 10B and the orchestrator 70 of this embodiment are similar to the intermediate devices 10A, 10B and the orchestrator 70 of the second embodiment.
  • the local 30 and remote 50 are the same as the local 30 and remote 50 of the first embodiment.
  • the determination unit 35 of the local device 30 of this embodiment determines whether or not to communicate via the intermediate devices 10A and 10B based on the delay information. For example, if the determination unit 35 can secure a value equal to or more than twice the product of the transmission delay and the transmission capacity of the delay information as the queue size of the temporary data storage unit 34, the intermediate devices 10A and 10B must be used. judge.
  • the queue management unit 32 of the local 30 determines the queue size and queue depth of the temporary data storage unit 34 based on the delay information. The queue depth is determined based on the queue size and packet size.
  • the determination unit 35 of the remote 50 also determines whether or not to pass through the intermediate devices 10A and 10B based on the delay information. , the queue size and queue depth of the temporary data storage unit 34 are determined based on the delay information.
  • the network device 40A routes requests sent by the local 30 according to instructions from the local 30. Specifically, the network device 40A determines whether the transfer destination of the request is the intermediate device 10A or the network device 40B from the destination information of the request using a routing table or the like.
  • the network device 40B determines whether the transfer destination of the response to the request is the intermediate device 10B or the network device 40A.
  • FIG. 18 is a diagram showing the configuration of a communication system that does not pass through intermediate devices 10A and 10B.
  • Local 30 and remote 50 determine not to pass through intermediate devices 10A and 10B based on the delay information obtained from orchestrator 70, and adjust the queue depth based on the delay information.
  • the local 30 then transmits the request to the remote 50 via the network device 40A (not shown).
  • the remote 50 transmits the response to the local 30 via the network device 40B (not shown).
  • the local 30 and the remote 50 may rewrite the destination of the request or response without the network devices 40A and 40B.
  • the determination units 35 of the local 30 and remote 50 may control the transfer destination of the request or response by rewriting the destination information of the header.
  • the local 30 and the remote 50 determine whether or not to communicate via the intermediate devices 10A and 10B based on the delay information. In this way, the local 30 and the remote 50 autonomously select whether to go through the intermediate devices 10A and 10B or adjust the depth of the queues of the local 30 and the remote 50 according to the delay information. communication can be performed. Therefore, even if the optical path or communication requirements are dynamically changed, it is possible to automatically select the optimum communication system according to the changed optical path or delay information of the communication requirements.
  • the local 30 and the remote 50 automatically construct the intermediate devices 10A and 10B in the high-performance units of the transmission devices 20A and 20B, If the product of the transmission delay and the transmission capacity of the delay information is smaller than a predetermined value, the depth of the queue may be adjusted.
  • the settings of the intermediate devices 10A and 10B constructed in the transmission devices 20A and 20B, or the transmission devices 20A and 20B that build the intermediate devices 10A and 10B may be specified on the terminal side of the local 30 and the remote 50, or may be specified by the orchestrator. 70 may judge and decide based on conditions such as the distance from the terminal, the route of the optical path, and the congestion of the line.
  • the local 30 and the remote 50 may acquire communication requests and optical path changes from the orchestrator, or may be monitored by the application unit 31, or may be monitored by another application such as ping for delay. may be used to determine whether or not to pass through the intermediate devices 10A and 10B.
  • Hardware Configuration For each of the local 30, remote 50, intermediate devices 10A and 10B, transmission devices 20A and 20B, and orchestrator 70 described above, for example, a general-purpose computer system as shown in FIG. 20 can be used.
  • the illustrated computer system includes a CPU (Central Processing Unit, processor) 901, a memory 902, a storage 903 (HDD: Hard Disk Drive, SSD: Solid State Drive), a communication device 904, an input device 905, and an output device. 906.
  • Memory 902 and storage 903 are storage devices.
  • the functions of the local 30, the remote 50, the intermediate devices 10A and 10B, the transmission devices 20A and 20B, and the orchestrator 70 are implemented by the CPU 901 executing a predetermined program loaded on the memory 902. be done.
  • these devices may be implemented by one computer or may be implemented by multiple computers. These devices may also be virtual machines implemented on computers. Programs for these devices can be stored on computer-readable recording media such as HDDs, SSDs, USB (Universal Serial Bus) memories, CDs (Compact Discs), DVDs (Digital Versatile Discs), or distributed via networks. You can also
  • the present invention is not limited to the above embodiments, and many modifications are possible within the scope of the gist.
  • the intermediate devices 10A, 10B and/or transmission devices 20A, 20B of the above embodiments may be implemented as local 30 or remote 50 NICs or applications.
  • the local 30 and the remote 50 may measure the distance to the intermediate devices 10A and 10B and set the queue depth of the temporary data storage unit according to the distance.
  • the present invention may combine at least two of the first to third embodiments.
  • 10A, 10B, 10C, 10D intermediate device 11: transfer unit 12A: generation unit 12B: discard unit 13: network state measurement unit 14: buffer 15: buffer management unit 16: credit management unit 17: communication unit 20A, 20B: transmission Device 30: Local 30 31: Application section 32: Queue management section 33: Network state management section 34: Temporary data storage section 35: Judgment section 36: Packet distribution section 36 37: communication unit 40A, 40B: network device 50: remote 70: orchestrator

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
PCT/JP2021/037380 2021-10-08 2021-10-08 通信システム、中間装置、通信方法、および、プログラム Ceased WO2023058232A1 (ja)

Priority Applications (3)

Application Number Priority Date Filing Date Title
JP2023552663A JP7671008B2 (ja) 2021-10-08 2021-10-08 通信システム、中間装置、通信方法、および、プログラム
US18/697,928 US20240414092A1 (en) 2021-10-08 2021-10-08 Communication system, intermediate apparatus, communication method, and program
PCT/JP2021/037380 WO2023058232A1 (ja) 2021-10-08 2021-10-08 通信システム、中間装置、通信方法、および、プログラム

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2021/037380 WO2023058232A1 (ja) 2021-10-08 2021-10-08 通信システム、中間装置、通信方法、および、プログラム

Publications (1)

Publication Number Publication Date
WO2023058232A1 true WO2023058232A1 (ja) 2023-04-13

Family

ID=85804076

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2021/037380 Ceased WO2023058232A1 (ja) 2021-10-08 2021-10-08 通信システム、中間装置、通信方法、および、プログラム

Country Status (3)

Country Link
US (1) US20240414092A1 (https=)
JP (1) JP7671008B2 (https=)
WO (1) WO2023058232A1 (https=)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116743673A (zh) * 2023-08-15 2023-09-12 中移(苏州)软件技术有限公司 Rdma工作队列参数的调整方法、装置、设备及存储介质
WO2025017785A1 (ja) * 2023-07-14 2025-01-23 日本電信電話株式会社 中間装置および通信方法
US12218855B2 (en) * 2023-02-10 2025-02-04 Meta Platforms, Inc. RDMA transmit flow scheduling and pacing scheme for congestion management in high-performance AI/ML networks

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006287293A (ja) * 2005-03-31 2006-10-19 Nec Corp データ転送効率化方法及びその方法を用いたシステム
JP2014057170A (ja) * 2012-09-11 2014-03-27 Fujitsu Ltd 転送装置、転送方法および転送プログラム
US20140207896A1 (en) * 2012-04-10 2014-07-24 Mark S. Hefty Continuous information transfer with reduced latency
JP2018182628A (ja) * 2017-04-19 2018-11-15 富士通株式会社 情報処理装置、情報処理方法および情報処理プログラム
US20200089649A1 (en) * 2018-09-13 2020-03-19 Microsoft Technology Licensing, Llc Transport Protocol and Interface for Efficient Data Transfer Over RDMA Fabric

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6052726A (en) * 1997-06-30 2000-04-18 Mci Communications Corp. Delay calculation for a frame relay network
US7190699B2 (en) * 2002-05-31 2007-03-13 International Business Machines Corporation Method and apparatus for implementing multiple credit levels over multiple queues
US7636022B2 (en) * 2005-06-10 2009-12-22 Symmetricom, Inc. Adaptive play-out buffers and clock operation in packet networks
US20240394215A1 (en) * 2021-09-27 2024-11-28 Nippon Telegraph And Telephone Corporation Intermediate apparatus, communication method, and program

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006287293A (ja) * 2005-03-31 2006-10-19 Nec Corp データ転送効率化方法及びその方法を用いたシステム
US20140207896A1 (en) * 2012-04-10 2014-07-24 Mark S. Hefty Continuous information transfer with reduced latency
JP2014057170A (ja) * 2012-09-11 2014-03-27 Fujitsu Ltd 転送装置、転送方法および転送プログラム
JP2018182628A (ja) * 2017-04-19 2018-11-15 富士通株式会社 情報処理装置、情報処理方法および情報処理プログラム
US20200089649A1 (en) * 2018-09-13 2020-03-19 Microsoft Technology Licensing, Llc Transport Protocol and Interface for Efficient Data Transfer Over RDMA Fabric

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US12218855B2 (en) * 2023-02-10 2025-02-04 Meta Platforms, Inc. RDMA transmit flow scheduling and pacing scheme for congestion management in high-performance AI/ML networks
WO2025017785A1 (ja) * 2023-07-14 2025-01-23 日本電信電話株式会社 中間装置および通信方法
CN116743673A (zh) * 2023-08-15 2023-09-12 中移(苏州)软件技术有限公司 Rdma工作队列参数的调整方法、装置、设备及存储介质
CN116743673B (zh) * 2023-08-15 2023-11-03 中移(苏州)软件技术有限公司 Rdma工作队列参数的调整方法、装置、设备及存储介质

Also Published As

Publication number Publication date
JP7671008B2 (ja) 2025-05-01
US20240414092A1 (en) 2024-12-12
JPWO2023058232A1 (https=) 2023-04-13

Similar Documents

Publication Publication Date Title
JP7671008B2 (ja) 通信システム、中間装置、通信方法、および、プログラム
US11405265B2 (en) Methods and systems for detecting path break conditions while minimizing network overhead
KR101143172B1 (ko) 웹 서비스를 위한 신뢰성 있는 메시징 프로토콜을 이용한메시지의 효율적인 전송
CN104995881A (zh) 替换现有网络通信路径
US8838782B2 (en) Network protocol processing system and network protocol processing method
JP5039677B2 (ja) エッジノードおよび帯域制御方法
JP3893247B2 (ja) データ配信管理装置
JP5087595B2 (ja) エッジノード、ウィンドウサイズ制御方法およびプログラム
JP7640868B2 (ja) 光伝送システム、オーケストレータ、制御方法、及び、制御プログラム
JP5672385B2 (ja) 伝送システム、ルーティング制御装置および通信装置、並びにルーティング制御方法および通信方法
Argibay-Losada et al. Using stop-and-wait to improve TCP throughput in fast optical switching (FOS) networks over short physical distances
JP5662779B2 (ja) 通信システム及びノード装置
WO2024234750A1 (zh) 拥塞控制方法、装置及系统
Wang et al. NetRT: Enhancing RDMA with Retransmission Offloading in Data Center Networks
JP5216830B2 (ja) データ転送装置及び方法
Minakhmetov Cross-layer hybrid and optical packet switching
JP7506335B2 (ja) 通信装置、中継装置、通信システム、通信方法およびプログラム
WO2025017785A1 (ja) 中間装置および通信方法
JP4797033B2 (ja) Tcpフローレート制御エッジノードにおけるフローレート制御方法及びエッジノード
Sodhatar et al. Throughput based comparison of different variants of TCP in optical burst switching (OBS) network
WO2023169938A1 (fr) Procédé de gestion d'une retransmission de données échangées sur un chemin établi entre un premier équipement de communication et un deuxième équipement de communication au moyen d'une valeur d'un paramètre de performance intermédiaire déterminée par un nœud intermédiaire appartenant audit chemin
Kwak et al. Retransmission in OBS networks with fiber delay lines
Padmanabhan et al. Tcp over optical burst switching (OBS): to Split or not to Split?
MINAKHMETOV Commutation de paquets optique et hybride multicouches
EP2843967B1 (en) Method for scheduling data through an optical burst switching network

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21959972

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2023552663

Country of ref document: JP

WWE Wipo information: entry into national phase

Ref document number: 18697928

Country of ref document: US

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21959972

Country of ref document: EP

Kind code of ref document: A1