WO2024109174A1

WO2024109174A1 - Method for controlling data retransmission in application layer, and related product

Info

Publication number: WO2024109174A1
Application number: PCT/CN2023/112582
Authority: WO
Inventors: 孙咏哲
Original assignee: 上海寒武纪信息科技有限公司
Priority date: 2022-11-25
Filing date: 2023-08-11
Publication date: 2024-05-30
Also published as: CN118101138A

Abstract

Provided in the present disclosure are a method for controlling data retransmission in an application layer, and a related product, wherein the method can be realized in a combined processing apparatus. The method comprises: pre-sending data at a sending end, the data comprising a payload and a count value for the payload; in response to the data being in a pre-sent state, recording first sending-end information, the first sending-end information representing the first amount of pre-sent data; receiving and recording first receiving-end information, the first receiving-end information representing the second amount of data received and fed back by a receiving end, wherein the second amount is not greater than the first amount; and according to the first sending-end information and the first receiving-end information, determining whether to retransmit the data.

Description

A method for controlling data retransmission at the application layer and related products

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims priority to a Chinese patent application filed on November 25, 2022, with application number 202211496930.1 and titled “A method and related products for controlling data retransmission at the application layer”.

Technical Field

The present disclosure relates to the field of chips, and more specifically, to the field of inter-chip communication.

Background technique

As the scale of neural networks increases, a single machine environment cannot independently complete training tasks better. For example, the memory of a single machine is insufficient to hold the entire neural network model, or the amount of data is too large to load. In this case, it is usually necessary to introduce distributed training, that is, multiple training devices work together and complete collaborative tasks as a whole system through communication with each other.

However, the foundation of distributed systems is communication, and communication needs to be built on communication media. Some communication media are reliable transmissions, such as Sockets, PCIe (Peripheral Component Interconnect Express), etc. The application layer only needs to issue communication requirements, and the transmission on the link is reliable. Reliable transmission means that the receiving application receives complete and correct data. However, there are also unreliable transmissions. For example, the IP protocol itself is an unreliable network layer transmission protocol. After the data is sent, there may be packet loss, errors, and lost responses. How to perform reliable transmission on unreliable links is a problem that all communications must face.

The current mainstream practice is to implement reliable communication protocols at the communication transport layer to achieve reliable transmission. A typical example is the TCP (Transmission Control Protocol) protocol, which ensures transmission reliability through four main mechanisms: confirmation of response and sequence number, timeout retransmission, flow control, and congestion control. On other links, NVIDIA uses Nvlink to ensure reliable transmission at the link layer and protocol layer. The patent application with publication number CN 110278094 discloses a timeout retransmission device based on link delay to reduce the probability of packet loss during retransmission of aggregated links. The patent application with publication number CN 112312513 B improves the performance of retransmission recovery in a device for link failure recovery.

The applicant found that the above existing solutions have the following technical defects:

1. Retransmission occurs at the transport layer, and more redundant information needs to be maintained at the transport layer to mark the location of the retransmission;

2. The transport layer requires additional buffers to store sent information for retransmission, which increases the consumption of storage resources;

3. When the transport layer retransmits, since the application layer is not aware of this behavior, large-volume communication may still be required, which requires the flow control module to coordinate and handle.

Summary of the invention

The purpose of the present disclosure is to solve the defects caused by transport layer retransmission in the prior art and to provide a solution that can perform retransmission at the application layer.

According to a first aspect of the present disclosure, a method for controlling data retransmission at an application layer is provided, comprising: pre-sending data, the data comprising a payload and a count value for the payload; in response to the data being in a pre-sending state, recording first sending end information, the first sending end information indicating a first amount of the data being pre-sent; receiving and recording first receiving end information, the first receiving end information indicating a second amount of data received and fed back by a receiving end, wherein the second amount is not greater than the first amount; and determining whether to retransmit the data based on the first sending end information and the first receiving end information.

According to a second aspect of the present disclosure, a device for controlling data retransmission at an application layer is provided, comprising: a sending unit, configured to pre-send data, wherein the data comprises a payload and a count value for the payload; a sending end information updating unit, configured to record first sending end information in response to the data being in a pre-sending state, wherein the first sending end information Indicates a first quantity of the data pre-sent; a receiving unit, used to receive and record first receiving end information, wherein the first receiving end information indicates a second quantity of data received and fed back by the receiving end, wherein the second quantity is not greater than the first quantity; a retransmission judgment unit, used to determine whether to retransmit the data based on the first sending end information and the first receiving end information.

According to a third aspect of the present disclosure, a method for controlling data retransmission at an application layer in a communication system is provided, the communication system comprising: a transmitting end, a receiving end and a communication link connecting the transmitting end and the receiving end, the method comprising: at the transmitting end, pre-sending data to the receiving end, the data comprising a payload and a count value for the payload; in response to the data being in a pre-sending state, recording first transmitting end information, the first transmitting end information indicating a first quantity of the data being pre-sent; transmitting the data through the communication link; at the receiving end, if the data is received, recording the count value; generating a second quantity based on the count value, the second quantity indicating the quantity of the data received and fed back by the receiving end; feeding back first receiving end information to the transmitting end, the first receiving end information including a second quantity, and the second quantity is not greater than the first quantity; at the transmitting end, receiving and recording the first receiving end information; determining whether to retransmit the data based on the first transmitting end information and the first receiving end information.

According to a fourth aspect of the present disclosure, a communication system for controlling data retransmission at an application layer is provided, the communication system comprising: a transmitting end, a receiving end and a communication link connecting the transmitting end and the receiving end, wherein the transmitting end pre-transmits data to the receiving end, the data comprising a payload and a count value for the payload; the transmitting end records first transmitting end information in response to the data being in a pre-transmitting state, the first transmitting end information indicating a first quantity of the data being pre-transmitted; the communication link is used to transmit the data; if the receiving end receives the data, the receiving end records the count value; the receiving end generates a second quantity according to the count value, the second quantity indicating a second quantity of the data received and fed back by the receiving end; the receiving end feeds back first receiving end information to the transmitting end, the first receiving end information including the second quantity, and the second quantity is not greater than the first quantity; the transmitting end receives and records the first receiving end information; the transmitting end determines whether to retransmit the data according to the first transmitting end information and the first receiving end information.

According to a fifth aspect of the present disclosure, an electronic device is provided, comprising: one or more processors; and a memory, wherein the memory stores computer executable instructions, and when the computer executable instructions are executed by the one or more processors, the electronic device executes the method described above.

According to a sixth aspect of the present disclosure, a computer-readable storage medium is provided, comprising computer-executable instructions, and when the computer-executable instructions are executed by one or more processors, the method described above is executed.

According to the technical solution disclosed in the present invention, the beneficial effects brought about include at least: performing software-level retransmission design at the application layer can reduce the hardware design of the transmission layer, thereby speeding up development and reducing hardware customization costs.

BRIEF DESCRIPTION OF THE DRAWINGS

By reading the detailed description below with reference to the accompanying drawings, the above and other purposes, features and advantages of the exemplary embodiments of the present disclosure will become readily understood. In the accompanying drawings, several embodiments of the present disclosure are shown in an exemplary and non-limiting manner, and the same or corresponding reference numerals represent the same or corresponding parts, wherein:

FIG1 is a schematic diagram showing the structure of a board according to an embodiment of the present disclosure;

FIG2 is a schematic diagram showing a combined processing device of this embodiment;

FIG3 is a schematic diagram showing the internal structure of a computing device;

Figure 4 shows the internal architecture of the processing core;

FIG5 shows a schematic diagram of a communication system including a transmitting end and a receiving end according to an embodiment of the present disclosure;

FIG6 shows a method for controlling data retransmission at an application layer in a communication system according to an embodiment of the present disclosure;

FIG7 is a schematic diagram showing a normal communication process according to an embodiment of the present disclosure;

FIG8 is a schematic diagram showing a packet loss/error retransmission process;

FIG9 shows a situation where the ACK fed back by the receiving end is lost;

FIG10 shows a flow chart of a method for controlling data retransmission at the application layer according to another aspect of the present disclosure; as well as

FIG. 11 shows a device for controlling data retransmission at an application layer according to one aspect of the present disclosure.

Detailed ways

The following will be combined with the drawings in the embodiments of the present disclosure to clearly and completely describe the technical solutions in the embodiments of the present disclosure. Obviously, the described embodiments are part of the embodiments of the present disclosure, not all of the embodiments. Based on the embodiments in the present disclosure, all other embodiments obtained by those skilled in the art without creative work are within the scope of protection of the present disclosure.

It should be understood that the terms "first", "second", "third", "fourth", etc. in the claims, specifications and drawings of the present disclosure are used to distinguish different objects rather than to describe a specific order. "First", "second", "third", "fourth", etc. do not just mean one, but may also mean multiple. The terms "include" and "comprise" used in the specification and claims of the first disclosure indicate the presence of the described features, wholes, steps, operations, elements and/or components, but do not exclude the presence or addition of one or more other features, wholes, steps, operations, elements, components and/or their collections.

It should also be understood that the terms used in this disclosure are only for the purpose of describing specific embodiments and are not intended to limit the disclosure. As used in this disclosure and claims, the singular forms of "a", "an", and "the" are intended to include the plural forms unless the context clearly indicates otherwise. It should also be further understood that the term "and/or" used in this disclosure and claims refers to any combination of one or more of the associated listed items and all possible combinations, including these combinations.

As used in this specification and claims, the term "if" may be interpreted as "when" or "upon" or "in response to determining" or "in response to detecting," depending on the context. Similarly, the phrase "if it is determined" or "if [described condition or event] is detected" may be interpreted as meaning "upon determination" or "in response to determining" or "upon detection of [described condition or event]" or "in response to detecting [described condition or event]," depending on the context.

The specific implementation of the present disclosure is described in detail below with reference to the accompanying drawings.

Today's semiconductor manufacturing process starts with a complete wafer. Wafers are circular sheets made of pure silicon, generally divided into 6-inch, 8-inch, 12-inch and other specifications. Wafers are cut into small pieces, which are called dies. Each die is mounted with a chip and wired to achieve specific electrical functions. Then the die is packaged into a particle. The purpose of packaging is to place, fix, seal, protect the chip and enhance the electrical and thermal performance. At the same time, the contacts of the chip are connected to the pins of the package shell with wires, and a chip package structure is completed.

The memory is used to temporarily store the computing data required by the system on chip and the data exchanged with the external memory. In this embodiment, the memory can be a high-bandwidth memory (HBM), which is a high-performance DRAM (Dynamic Random Access Memory) made based on a 3D stacking process and is suitable for applications with high memory bandwidth requirements, such as graphics processors, online switching and forwarding equipment (such as routers, switches), etc.

System on Chip (Soc) refers to a technology that integrates a complete system on a single chip and packages all or part of the necessary electronic circuits. In this embodiment, the system on chip is assembled on a board. Figure 1 shows a schematic diagram of the structure of a board 10 of the embodiment of the present disclosure. As shown in Figure 1, the board 10 includes a combined processing device 101, which is an artificial intelligence computing unit to support various deep learning and machine learning algorithms to meet the intelligent processing needs in complex scenarios in the fields of computer vision, speech, natural language processing, data mining, etc. In particular, deep learning technology is widely used in the field of cloud intelligence. A notable feature of cloud intelligence applications is the large amount of input data, which has high requirements on the storage capacity and computing power of the platform. The board 10 of this embodiment is suitable for cloud intelligence applications and has huge off-chip storage, on-chip storage and a large amount of computing power.

The combined processing device 101 is connected to an external device 103 via an external interface device 102. The external device 103 is, for example, It is a server, a computer, a camera, a display, a mouse, a keyboard, a network card or a Wifi interface, etc. The data to be processed can be transmitted from the external device 103 to the combined processing device 101 through the external interface device 102. The calculation result of the combined processing device 101 can be transmitted back to the external device 103 via the external interface device 102. According to different application scenarios, the external interface device 102 can have different interface forms, such as a PCIe interface, etc.

The board 10 also includes an external memory 104 for storing data, which includes one or more storage units 105. The external memory 104 is connected to the control device 106 and the combined processing device 101 through a bus and transmits data. The control device 106 in the board 10 is configured to control the state of the combined processing device 101. To this end, in an application scenario, the control device 106 may include a single chip microcomputer, also known as a micro control unit (MCU).

FIG2 is a schematic diagram showing the combined processing device 101 of this embodiment. As shown in FIG2 , the combined processing device 101 includes a computing device 201, an interface device 202, a processing device 203, and a DRAM 204. In one application scenario, the computing device 201, the interface device 202, and the processing device 203 are integrated into the aforementioned system on chip. In another application scenario, the computing device 201 itself is the aforementioned system on chip.

The computing device 201 is configured to execute user-specified operations, and is mainly implemented as a single-core intelligent processor or a multi-core intelligent processor to perform deep learning or machine learning calculations. It can interact with the processing device 203 through the interface device 202 to jointly complete the user-specified operations.

The interface device 202 is used to transmit data and control instructions between the computing device 201 and the processing device 203. For example, the computing device 201 can obtain input data from the processing device 203 via the interface device 202 and write it into the storage device on the computing device 201 chip. Further, the computing device 201 can obtain control instructions from the processing device 203 via the interface device 202 and write them into the control cache on the computing device 201 chip. Alternatively or optionally, the interface device 202 can also read data in the storage device of the computing device 201 and transmit it to the processing device 203.

The processing device 203, as a general processing device, performs basic controls including but not limited to data handling, starting and/or stopping the computing device 201, etc. According to different implementations, the processing device 203 can be a central processing unit, a graphics processing unit, or one or more types of processors in other general and/or special processors, which include but are not limited to digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc., and their number can be determined according to actual needs. As mentioned above, only with respect to the computing device 201 disclosed in the present invention, it can be regarded as having a single-core structure or a homogeneous multi-core structure. However, when the computing device 201 and the processing device 203 are integrated and considered together, the two are regarded as forming a heterogeneous multi-core structure.

DRAM 204 is the aforementioned high-bandwidth memory, which is used to store data to be processed. Its size is usually 16G or larger and is used to save data of the computing device 201 and/or the processing device 203.

3 shows a schematic diagram of the internal structure of a computing device 201. The computing device 201 is used to process input data such as computer vision, speech, natural language, and data mining. The computing device 201 in the figure adopts a multi-core hierarchical structure design, which includes an external storage controller 301, a peripheral communication module 302, an on-chip interconnect module 303, a synchronization module 304, and multiple clusters 305.

There may be multiple external storage controllers 301, and two are shown in the figure as an example, which are used to respond to access requests issued by the processor core and access external storage devices, such as DRAM 204 in Figure 2, so as to read data from outside the chip or write data. The peripheral communication module 302 is used to receive control signals from the processing device 203 through the interface device 202 to start the computing device 201 to perform tasks. The on-chip interconnect module 303 connects the external storage controller 301, the peripheral communication module 302 and multiple clusters 305 to transmit data and control signals between each module. The synchronization module 304 is a global synchronization barrier controller (Global Barrier Controller, GBC), which is used to coordinate the work progress of each cluster and ensure the synchronization of information. Multiple clusters 305 are the computing cores of the computing device 201. Four are shown in the figure as an example. With the development of hardware, the computing device 201 disclosed in the present invention can also include 8, 16, 64, or even more clusters 305. The cluster 305 is used to efficiently execute the deep learning algorithm.

Each cluster 305 includes a plurality of processor cores (IPU Cores) 306 and a memory core (MEM Core) 307 .

The figure shows four processor cores 306 as an example, and the present disclosure does not limit the number of processor cores 306. Its internal architecture is shown in FIG4. Each processor core 306 includes three modules: a control module 41, a computing module 42, and a storage module 43.

The control module 41 is used to coordinate and control the operation of the operation module 42 and the storage module 43 to complete the deep learning task, and includes an instruction fetch unit (IFU) 411 and an instruction decode unit (IDU) 412. The instruction fetch unit 411 is used to obtain instructions from the processing device 203, and the instruction decode unit 412 decodes the obtained instructions and sends the decoding results to the operation module 42 and the storage module 43 as control information.

The operation module 42 includes a vector operation unit 421 and a matrix operation unit 422. The vector operation unit 421 is used to perform vector operations and can support complex operations such as vector multiplication, addition, and nonlinear transformation; the matrix operation unit 422 is responsible for the core calculation of the deep learning algorithm, namely matrix multiplication and convolution.

The storage module 43 is used to store or transfer related data, including a neuron RAM (NRAM) 431, a weight RAM (WRAM) 432, an input/output direct memory access module (IODMA) 433, and a transfer direct memory access module (MVDMA) 434. NRAM 431 is used to store input and output data and intermediate results for calculation by the processor core 306; WRAM 432 is used to store the weights of the deep learning network; IODMA 433 controls the memory access between NRAM 431/WRAM 432 and DRAM 204 through the broadcast bus 309; MVDMA 434 is used to control the memory access between NRAM 431/WRAM 432 and SRAM 308.

Returning to FIG. 3 , the storage core 307 is mainly used for storage and communication, that is, to store shared data or intermediate results between the processor cores 306, and to perform communication between the cluster 305 and the DRAM 204, between the clusters 305, and between the processor cores 306. In other embodiments, the storage core 307 has the ability of scalar operations and is used to perform scalar operations.

The storage core 307 includes a shared memory unit (SRAM) 308, a broadcast bus 309, a cluster direct memory access module (cluster Direct Memory Access, CDMA) 310 and a global direct memory access module (Global Direct Memory Access, GDMA) 311. The SRAM 308 plays the role of a high-performance data transfer station. The data reused between different processor cores 306 in the same cluster 305 does not need to be obtained from the DRAM 204 by each processor core 306, but is transferred between the processor cores 306 through the SRAM 308. The storage core 307 only needs to quickly distribute the reused data from the SRAM 308 to multiple processor cores 306, so as to improve the efficiency of inter-core communication and greatly reduce on-chip and off-chip input/output access.

Broadcast bus 309, CDMA 310 and GDMA 311 are used to perform communication between processor cores 306, communication between clusters 305 and data transmission between clusters 305 and DRAM 204, respectively. They will be described below.

The broadcast bus 309 is used to complete high-speed communication between the processor cores 306 in the cluster 305. The broadcast bus 309 of this embodiment supports inter-core communication modes including unicast, multicast and broadcast. Unicast refers to point-to-point (i.e., single processor core to single processor core) data transmission, multicast is a communication mode of transmitting a copy of data from SRAM 308 to specific processor cores 306, and broadcast is a communication mode of transmitting a copy of data from SRAM 308 to all processor cores 306, which is a special case of multicast.

CDMA 310 is used to control memory access from SRAM 308 between different clusters 305 in the same computing device 201. GDMA 311 cooperates with the external memory controller 301 to control memory access from SRAM 308 of cluster 305 to DRAM 204, or to read data from DRAM 204 to SRAM 308.

FIG5 shows a schematic diagram of a communication system including a transmitting end and a receiving end according to an embodiment of the present disclosure.

The communication system can be divided into components according to functions, and is divided into a transmitting end 1 and a receiving end 2. The transmitting end 1 may include a first storage module 11, a first instruction execution module 12, a first sending module 13, a first receiving module 14 and a first control module 15, and the receiving end 2 includes a second storage module 21, a second instruction execution module 22, a second sending module 23, a second receiving module 24 and a second control module 25.

For the transmitter 1, the function of each module is:

The first storage module 11 is used to store data and instructions. The data source can be local original data or It can be data sent from an upstream device (in case of multi-stage transmission), or it can be the result data of calculation between data sent from an upstream device and local data.

The first instruction execution module 12 is used to execute the instructions in the first storage module 11 and is responsible for completing the execution processing of the application layer logic. In this communication system, the sending request is initiated by the instruction, that is, the communication behavior is initiated by the application layer. In addition, the first instruction execution module 12 can also complete local operations according to the instructions.

First sending module 13: The sending end 1 is directly connected to the physical link through the first sending module 13. The first sending module 13 is used to send data and control information. The first sending module 13 can count the communication packets sent and receive the response to each packet. If NACK (Negative acknowledgment, meaning abnormal response) is received or ACK (Acknowledgement, meaning positive feedback, indicating that the receiving end has correctly received the corresponding packet) is not received for a long time, the abnormal state is recorded and the interrupt is reported to the first control module 15. If ACK is received within the specified time, ACK can be replied to the ACK.

First receiving module 14: The transmitting end can be directly connected to the physical link through the first receiving module 14. The first receiving module 14 on the transmitting end 1 is only used to receive control information.

The first control module 15 is used to issue tasks, receive the interruption of the first sending module 13, suspend the operation of the first instruction execution module 12, and reissue tasks to let the first instruction execution module 12 retransmit.

For the receiving end 2, the function of each module is:

The second storage module 21 is used to store data and instructions. The data source may be local original data, data sent from an upstream sending end, or result data calculated from data sent from an upstream sending end and local data.

The second instruction execution module 22 is used to execute the instructions in the second storage module 21 and is responsible for completing the execution processing of the application layer logic. In this communication system, the sending request is initiated by the instruction, that is, the communication behavior is initiated by the application layer. In addition, the second instruction execution module 22 can also complete local operations according to the instructions.

Second receiving module 24: The receiving end 2 is directly connected to the physical link through the second receiving module 24. The second receiving module 24 receives the received data packet and completes the memory synchronization operation according to the type of the data packet. The received packet is checked and corrected. If the error cannot be corrected, a NACK is replied to the sending module of the sending end. An ACK is replied for each correct reception of the packet. If the ACK sent by the second receiving module 24 does not receive an ACK response, the abnormal state is recorded and an interrupt is reported to the second control module 25.

Second sending module 23: The receiving end is directly connected to the physical link through the second receiving module 23. The second sending module 23 is only used to send control information on the receiving end.

The second control module 25 is used to issue the task, and according to the interruption sent by the second receiving module 24, the operation of the second instruction execution module 22 is suspended, and the task is reissued to let the second instruction execution module 22 retransmit.

The following is a detailed description of the various specifications of the application layer when performing data sending tasks.

When performing a sending task at the application layer, data and control information need to be sent explicitly. The control information at the sending end 1 means: the number of data fragments that the current sending end 1 has sent; at the receiving end 2, it means the number of data fragments that the current receiving end 2 has processed.

Referring to the data structure of the queue, the control information of the sender can be recorded as Tail, and the control information of the receiver 2 can be recorded as Head. It should be noted that the receiver 2 receiving the data and the receiver 2 processing the data are two different concepts. Even if the receiver 2 only receives data, if the sender 1 does not send it directly to the target storage space of the receiver 2 (recorded as Dstspace), but sends it to the receiving buffer of the receiver 2 (recorded as Recvbuffer), then the receiver 2 also needs to copy the data from Recvbuffer to Dstspace. Only after the receiver 2 confirms that the data is in Dstspace will it update the value of Head. Recvbuffer is the functional name for the cache hardware storage space of the receiver.

Therefore, the Head and Tail need to satisfy the following constraints:

1. When there is no data retransmission, both Head and Tail are monotonically increasing, and each increase is the number of data fragments sent or received this time.

2. At any time, the Head is not greater than the Tail.

3. If the data sent by the sender 1 to the receiver 2 is written into the Recvbuffer, the difference between the head and the tail should not exceed half the size of the Recvbuffer. Assume that the overall capacity of the Recvbuffer at the receiving end is represented as m, that is, at most m data slices with the difference between the tail and the head are allowed. Then it can be imagined that two devices A and B will execute the communication algorithm Allreduce algorithm. The two devices A and B first send their own X data to each other, then obtain the upstream X data from the Recvbuffer, and complete the calculation with the local data; then send X data to each other again. Here, the first constraint is that X must be less than M, because if X is greater than M, data out of bounds will occur. Then, if X=M, if each device performs the second step of "getting X data from upstream from the Recvbuffer, completing calculations with local data, and sending X data to the other party again", it will find that the other party's Recvbuffer has been completely occupied by the first step of "each sending its own X data to the other party", so it cannot perform the second step, and neither device can enter the second stage, which will form a deadlock. Therefore, under the constraint of X<M, after performance testing, X=1/2m is the optimal value, which can balance sending and processing.

4. The destination address of the Head and Tail remains unchanged, and the Head and Tail sent later will overwrite the original values in the device storage space. In other words, multiple Head values or multiple Tail values are not allowed to exist, but only the previous Head value can be overwritten by the new Head value, or the previous Tail value can be overwritten by the new Tail value.

5. Both Head and Tail have memory synchronization semantics. When each Head/Tail is written to the device memory, the Data, Head, and Tail transmitted by the same physical link in the previous sequence of the Head/Tail must be written to the device storage space. The device here, in this article, usually refers to the hardware of the sending device and the receiving device.

In an unreliable link, the following three errors may occur:

1. Data packets are lost or errors occur during the transmission process. The lost data packets here can be the loss of payload data, the loss of the sent tail, or the payload data error or tail error. In this case, a packet sequence number error (PSN Error) of the link will be triggered. When the PSN check at the receiving end fails, a NACK will be sent back to the sending end. In addition, this case may also trigger a timeout mechanism and actively end the communication.

2. Bit errors occur during the transmission process and the error correction at the receiving end cannot correct them. At this time, the receiving end will feedback NACK to the sending end.

3. The sending process is normal, but the ACK returned by the receiving end is lost. At this time, the sending end will still trigger the timeout mechanism.

When the above three situations occur, retransmission may succeed or fail by retransmitting from the abnormal position. Only when the correct communication cannot be completed after multiple (for example, three consecutive) retransmissions, it is considered that an abnormality that cannot be retransmitted has occurred. In this case, the application usually needs to completely stop data transmission in order to check whether the link is abnormal.

In order to handle the above three situations where retransmission is possible, the exception handling method implemented at the application layer needs to have the following key points:

1. Since anomalies on a link are rare events, the exception handling logic at the application layer should interfere less with the normal application layer task logic.

2. Also based on the fact that the occurrence of anomalies is a low-probability event, when anomalies occur and retransmission is performed, the functional correctness of the retransmission algorithm is given priority, and the performance requirements are relatively reduced. In other words, the correctness of retransmission must be guaranteed first, and the performance requirements of the retransmission algorithm execution process are not given priority.

3. When the application layer retransmits, redundant retransmissions can be allowed, that is, the content sent in the retransmission is more than the actual lost/erroneous content.

4. When the application layer retransmits, even if the retransmission at the sender may have more redundancy, the receiver should only process the data packet starting from the error position. This is because the receiver may have performed in-place calculation operations, and the calculation results have already covered the original local data area. If the calculation is repeated again, the result of the application task will be wrong, so it is not allowed.

5. Since Head/Tail is communicated by overwriting, the device (sender and receiver) needs to back up the value of Head/Tail, which can be called checkpoint (checkpoint, backup point) for subsequent retransmission. The checkpoint is used as the starting point for recovery. Therefore, each device should also have a local record of the previously sent location information, and the subsequent retransmission recovery needs to use the previously sent location as the end point of the retransmission.

After introducing the above specifications and constraints, a method for controlling data retransmission at an application layer in a communication system according to an embodiment of the present disclosure is described in detail below.

FIG6 shows a method for controlling data retransmission at an application layer in a communication system according to an embodiment of the present disclosure, wherein the communication system comprises: a transmitting end 1, a receiving end 2 and a communication link 3 connecting the transmitting end 1 and the receiving end 2, wherein the method comprises: at the transmitting end 1, in operation S6110, pre-sending data to the receiving end 2, wherein the data comprises a payload and a count value for the payload; in operation S6120, in response to the data being in a pre-sending state, recording first transmitting end information, wherein the first transmitting end information indicates a first amount of the data being pre-sent; in operation S6130, transmitting the data through the communication link at the receiving end 2, in operation S6210, if the data is received, the count value is recorded; at operation S6220, a second quantity is generated according to the count value, the second quantity indicating the quantity of the data received and fed back by the receiving end; at operation S6230, first receiving end information is fed back to the transmitting end, the first receiving end information includes the second quantity, and the second quantity is not greater than the first quantity; at the transmitting end 1, in operation S6140, the first receiving end information is received and recorded; and, at operation S6150, whether to retransmit the data is determined according to the first transmitting end information and the first receiving end information.

The above-mentioned embodiments of the present disclosure will be explained below in conjunction with a series of drawings.

FIG7 shows a schematic diagram of a normal communication process according to an embodiment of the present disclosure. In FIG7, it is assumed that data is sent from a transmitter 1 to a receiver 2. For the convenience of description, the first sending module 13 in the transmitter 1 is marked as tx, and the second receiving module 24 of the receiver 2 is marked as rx. The address used by the Head/Tail for communication is marked as ptr, the checkpoint of the Head/Tail is marked as ckpt, and the maximum value currently sent by the Head/Tail is marked as cache. According to an embodiment of the present disclosure, the following key variables can be constructed:

1. tx_tail, tail information sent by transmitter 1. The tail information has been introduced above and will not be repeated here.

2. tx_tail_cache: tx_tail that has been sent by the transmitter 1 but has not yet been confirmed to be received by the receiver 2. The tx_tail can be stored in the storage space of the transmitter 1 for sending (for example, in the cache of the Recybuffer transmitter 1). tx_tail_cache can indicate that the transmitter 1 "thought it had been sent but may not have actually been sent to the receiver 2". In the present disclosure, the data that the transmitter 1 "has sent but may not have actually been sent to the receiver 2" can be referred to as pre-sent data. The pre-sent data may include a payload and a count value for the payload. The count value here is the tail information described above. After the transmitter 1 has pre-sent data, the above-mentioned parameter tx_tail_cache can be updated regardless of whether the data is sent successfully (that is, in response to the data being in a pre-sent state, the first transmitter information is updated).

According to one embodiment of the present disclosure, in response to the data being in a pre-sending state, recording the first sending end information may include: in response to the data being placed in the pre-sending state being incremented, the first quantity is incremented accordingly, regardless of whether the data is sent successfully, and/or regardless of whether a response to the data is received.

In this embodiment, every time a data enters the "pre-send" state, the tail value carried by the data is incremented once. The increment of the tail value does not consider whether the data is actually sent successfully, nor does it consider whether the sender 1 receives a response (i.e., head) value for the data.

It should be understood that the first sender information recorded in this article may be recorded whenever the first sender information is updated, or the first sender information with the maximum value may be directly recorded. For example, the sender may poll the maximum value of the current tail at regular intervals. In this case, the maximum tail value or some of the tail values may be directly polled, but the tail value corresponding to the currently sent data may not necessarily be polled.

3.tx_tail_ckpt: tx_tail sent by sender 1 and confirmed by receiver 2. Generally, it means that receiver 2 has confirmed by replying head. If receiver 2 confirms by head, it means that receiver has received the data sent by sender 1. The breakpoint of data retransmission can be determined by parameter tx_tail_ckpt.

4.rx_head: The head information sent by receiver 2 to transmitter 1. When receiver 2 receives the data containing tail, it will record the tail value (i.e. the count value mentioned above). Receiver 2 can reply to transmitter 1 or The head value is fed back to confirm that the data has been received, that is, the head value can reflect the amount of data (ie, tail) received by the receiving end 2.

As the receiving end 2 receives and processes the data, the head value increases accordingly. However, the head value is less than or equal to the received tail value. For example, if the data containing tail values of 1-3 all arrive at the receiving end 2 smoothly, then the head value is updated accordingly after each tail value is received and processed by the receiving end 2, that is, the head value is equal to the tail value; however, if the data containing the tail value of 4 is blocked, that is, the receiving end 2 does not receive the data with a tail value of 4, the maximum head value generated by the receiving end 2 is 3.

5.rx_head_cache: rx_head that has been sent by receiving end 2 but has not yet been confirmed received by sending end 1. Generally speaking, the sending of rx_head does not require additional confirmation from receiving end 1, and it is considered delivered as long as there is no timeout. In this application, the parameter rx_head_cache can be used to represent the first receiving end information, that is, it can represent the second amount of data received by receiving end 2 and feedback has been made.

6.rx_head_ckpt: rx_head sent by receiver 2 and acknowledged by receiver 1.

7.rx_tail_cache: the tail value received by receiver 2 when it accesses DRAM to check the tail, that is, the tail value actually received by receiver 2.

According to one embodiment of the present disclosure, if the data is received, recording the count value includes: overwriting the old count value with the new count value.

According to this embodiment, the receiving end 2 can record each received tail value and overwrite the previous tail value with the latest tail value, so that the receiving end 2 will keep the tail value at the maximum value.

According to an embodiment of the present disclosure, if the data is received, recording the count value includes: searching for a maximum count value in the received data; and recording only the maximum count value.

According to this embodiment, the receiving end 2 can poll the maximum value of the received tail at regular intervals. In this case, there may be a situation where the maximum tail value or some of the tail values are directly polled, and the tail value corresponding to the currently sent data is not necessarily polled. In these cases, the maximum tail value can be directly recorded without recording other tail values. In this way, only the maximum tail value needs to be paid attention to, and all tail values do not need to be recorded.

8.tx_head_cache: the head value received by sender 1 when it accesses DRAM to check the head, that is, the head value actually sent by receiver 2 and actually received by sender 1.

As shown in Fig. 7, a first state table recording the states of various parameters may be stored in the transmitter 1, and the first state table may include at least three parameters, namely tx_tail_cache, tx_tail-ckpt and tx_head_cache. In the initial state, the values of these parameters are all 0, as shown in state 1 in Fig. 7.

Similarly, at the receiving end 2, a second state table recording the states of various parameters may be stored, and the second state table may include at least three parameters, namely rx_head_cache, rx_head_ckpt and rx_tail_cache. In the initial state, the values of these parameters are all 0, as shown in state 1' in Figure 7.

Next, assuming that sender 1 sends 5 data to receiver 2, the value of parameter tx_tail_cache becomes 5, which means that sender 1 has sent 5 data, but whether these 5 data have reached receiver 2 has not yet been confirmed. At the same time, when sender 1 sends data, it also sends the tail of each data to the receiver. Each time a data is sent, the tail value increases. This is shown in state 2 in Figure 7.

At the receiving end 2, when data is received, it can be recorded as parameter rx_tail_cache, which can increase as the tail value increases. When 5 data are received, the value of parameter rx_tail_cache is updated to 5. The parameter rx_tail_cache can also directly record and reflect the maximum value of the received tail, rather than gradually increasing, as shown in state 2' of Figure 7.

After receiving end 2 processes the data, it will feed back the head value to sending end 1, but it is not known whether sending end 1 has received the head value. In this case, the state of parameter rx_head_cache is updated to 5, which means that receiving end 2 has sent 5 head information (rx_head), but sending end 1 has not yet confirmed receipt of the 5 head information.

After the transmitter 1 receives the head information, it sends a feedback signal to the receiver 2. After the receiver 2 receives the feedback signal, it updates the parameter rx_head_ckpt. As shown in state 3' in FIG7, the parameter rx_head_ckpt is updated. is 5.

As shown in state 3 in FIG. 7 , the value of the parameter rx_head_cache may be fed back to the transmitter 1 , so that the parameter tx_head_cache is updated to 5.

Further, as shown in state 3 in FIG. 7 , returning to the transmitter 1 , the parameter tx_tail_ckpt is also updated to 5, which indicates the number of tx_tails that have been confirmed by the receiver 2 after the transmitter 1 has sent 5 data, as shown in state 3 in FIG. 7 .

As can be seen from Figure 7, by checking the parameters tx_tail_cache, tx_tail_ckpt and tx_head_cache in the transmitter 1 and the parameters rx_head_cache, rx_head_ckpt and rx_tail_cache in the receiver 2, the sending and receiving of data, the sending and receiving of tail and head values and the breakpoint situation can be understood.

The above describes the situation when data is transmitted normally in conjunction with FIG. 7 . The following describes the situation when data transmission errors occur in detail in conjunction with other drawings.

Fig. 8 shows a schematic diagram of a packet loss/error retransmission process. It should be understood that, for the sake of simplicity, in Fig. 8, only the sending and receiving directions of data are shown, and the sending module and the receiving module are not shown.

As shown in Fig. 8, a first state table recording the states of various parameters may be stored in the transmitter 1, and the first state table may include at least three parameters, namely tx_tail_cache, tx_tail-ckpt and tx_head_cache. In the initial state, the values of these parameters are all 0, as shown in state 1 in Fig. 8.

Similarly, at the receiving end 2, a second state table recording the states of various parameters may be stored, and the second state table may include at least three parameters, namely rx_head_cache, rx_head_ckpt and rx_tail_cache. In the initial state, the values of these parameters are all 0, as shown in state 1' in Figure 8.

Next, suppose that sender 1 sends 5 data to receiver 2, namely data [1, 2, 3, 4, 5], then the value of parameter tx_tail_cache becomes 5, which means that sender 1 has sent 5 data, but whether these 5 data have reached receiver 2 has not yet been confirmed. At the same time, when sender 1 sends data, it also sends the tail of each data to the receiver. Each time a data is sent, the tail value increases. This is shown in state 2 in Figure 8.

At the receiving end 2, when data is received, the parameter rx_tail_cache increases as the tail value increases. Different from the implementation shown in FIG7, in this implementation, data loss or error occurs from the 4th data, that is, only data [1, 2, 3] is successfully sent to the receiving end 2. Therefore, the value of the parameter rx_tail_cache is updated to 3. As shown in state 2' of FIG8.

Since data is lost or erroneous during transmission, the link can send an interrupt request to the transmitter 1. The control module in the transmitter 1 can determine the cause of the reported interruption and then determine whether to retransmit according to the specific cause of the interruption.

After receiving the data [1,2,3], the receiver 2 sends the head information back to the transmitter 1, but it is not known whether the transmitter 1 has received the head value. Since the receiver 2 has only received 3 data, the receiver 2 sends the information with head values of 1-3. In this case, the state of the parameter rx_head_cache is updated to 3, which means that the receiver 2 has sent 3 head information (rx_head), but the transmitter 1 has not yet confirmed that it has received the 3 head information. This is shown in state 3' in Figure 8.

After the transmitter 1 receives the head information, it sends a feedback signal to the receiver 2. After the receiver 2 receives the feedback signal, it updates the parameter rx_head_ckpt. As shown in state 3' in Figure 8, since the receiver only receives 3 data and the head value fed back by the receiver is 1-3, the parameter rx_head_ckpt is updated to 3.

Returning to the transmitter 1, the receiver 2 feeds back the parameter rx_head_cache to the transmitter 1. After receiving the feedback, the transmitter 1 updates the parameter tx_head_cache to 3, which means that after the transmitter 1 has sent 5 data, the receiver 2 confirms the receipt and feeds back 3 data through the head, as shown in state 3 of Figure 8.

Further, as shown in state 3 in FIG. 8 , the parameter tx_tail_ckpt is updated to 3 accordingly, which indicates that the retransmission should start from the 4th data.

According to one embodiment of the present disclosure, in response to an interruption signal reported from a communication link, it is determined whether to retransmit the data based on the first sending end information and the first receiving end information.

As described above, if the communication link reports an interruption signal, there may be multiple interruption types, some of which require retransmission of data, while other interruption types do not require retransmission of data. According to one embodiment of the present disclosure, determining whether to retransmit the data according to the first transmitting end information and the first receiving end information includes: determining the type of interruption according to the first transmitting end information and the first receiving end information; and determining whether to retransmit the data according to the type of interruption.

According to one embodiment of the present disclosure, determining the type of interruption based on the first transmitting end information and the first receiving end information may include: if the first number is greater than the second number, determining the type of interruption as data loss or error from the transmitting end to the receiving end. According to one embodiment of the present disclosure, the data loss or error from the transmitting end to the receiving end includes the loss of the payload and/or the loss or error of the count value. It should be understood that the data loss here is not only the loss of the payload data, but also the loss of other information such as the tail value. For example, even if the payload data is correctly transmitted from the transmitting end 1 to the receiving end 2, but the tail value carried is lost, it should also be considered as data loss.

It should also be understood that the first receiving end information referred to in this article refers to the data that has been received and has been fed back through the head. Therefore, the receiving end information can reflect the amount of data actually received by the receiving end, and reflect the number of head values that the receiving end has fed back for this reception, which is different from the fed back head value.

Specifically, by comparing the values of the parameter tx_tail_cache and the parameter tx_head_cache, the sender 1 can determine whether to initiate data retransmission. For example, in the example shown in Figure 8, the sender 1 will find that it has sent 5 data (the value of the parameter tx_tail_cache is equal to 5), but the receiver 2 only received 3 data (the value of the parameter tx_head_cache is equal to 3). Therefore, the sender 1 can confirm that the interrupt signal reported by the communication link is due to data loss or error.

According to one embodiment of the present disclosure, determining whether to retransmit the data according to the type of the interruption includes: if the type of the interruption is determined to be data loss from the sending end to the receiving end, determining the breakpoint of the interruption according to the second quantity; and retransmitting the data starting from the breakpoint.

Further, as described above, when the value of the parameter tx_head_cache becomes 3, the value of the parameter tx_tail_ckpt is also updated to 3 accordingly, so the transmitter 1 can know that three data have been successfully sent, and the retransmission should start from the fourth data. It should be understood that the above-mentioned determination of the interruption breakpoint according to the second quantity is essentially to determine the interruption breakpoint according to the breakpoint position determined by the second quantity, and the parameter tx_tail_ckpt can identify the breakpoint position.

Next, as shown in FIG8 , after determining to continue retransmitting the data, the data [4,5] is started to be retransmitted, and at this time, state 4 and state 3 remain the same.

Next, when the data [4, 5] is retransmitted successfully, the value of the parameter rx_tail_cache of the receiving end 2 is updated from 3 to 5, which means that the receiving end 2 has successfully received the data [4, 5], as shown in state 4' in Figure 8.

After receiving data [4,5], receiver 2 will feed back the head signal to the transmitter. Therefore, the value of the parameter rx_head_cache is updated from 3 to 5, as shown in state 5'. As mentioned above, the parameter rx_head_cache indicates the rx_head that has been sent by receiver 2 but has not yet been confirmed by transmitter 1. Generally speaking, the sending of rx_head does not require additional confirmation from receiver 1, and it is considered to have been delivered as long as there is no timeout.

In addition, as shown in state 5' in FIG8 , the value of the parameter rx_head_ckpt is also updated to 5, which means that after the receiving end 2 sends the head signal to the sending end 1, it receives a confirmation from the sending end 1.

Further, at the transmitting end 1, the value of tx_head_cache is updated according to the value of rx_head_cache fed back by the receiving end 2, so that the value of tx_head_cache is updated to 5, and thus, the value of the parameter tx_tail_ckpt is also updated to 5 accordingly, as shown in state 5 of FIG8 . In this case, the value of the parameter tx_tail_cache is the same as the value of the parameter tx_head_cache, which means that the amount of data sent by the transmitting end 1 is the same as the amount of data received by the receiving end 2, so there is no need to retransmit the data.

According to an embodiment of the present disclosure, determining the type of interruption according to the first transmitting end information and the first receiving end information includes: if the first number is equal to the second number, determining the type of interruption as the loss of response to the data fed back from the receiving end to the transmitting end.

The specific situation of the above embodiment is described below in conjunction with Figure 9. Figure 9 shows the situation where the ACK fed back by the receiving end 2 is lost. It should be understood that, for the sake of simplicity, in Figure 9, only the sending and receiving directions of data are shown, and the sending module and the receiving module are not shown.

As shown in Fig. 9, a first state table recording the states of various parameters may be stored in the transmitter 1, and the first state table may include at least three parameters, namely tx_tail_cache, tx_tail-ckpt and tx_head_cache. In the initial state, the values of these parameters are all 0, as shown in state 1 in Fig. 9.

Similarly, at the receiving end 2, a second state table recording the states of various parameters may be stored, and the second state table may include at least three parameters, namely rx_head_cache, rx_head_ckpt and rx_tail_cache. In the initial state, the values of these parameters are all 0, as shown in state 1' in Figure 9.

Next, suppose that sender 1 sends 5 data to receiver 2, namely data [1, 2, 3, 4, 5], then the value of parameter tx_tail_cache becomes 5, which means that sender 1 has sent 5 data, but whether these 5 data have reached receiver 2 has not yet been confirmed. At the same time, when sender 1 sends data, it also sends the tail of each data to the receiver. Each time a data is sent, the tail value increases. This is shown in state 2 in Figure 9.

At the receiving end 2, when data is received, the parameter rx_tail_cache increases as the tail value increases. Different from the implementation shown in FIG8, in this implementation, all data [1, 2, 3, 4, 5] are successfully sent to the receiving end 2. Therefore, the value of the parameter rx_tail_cache is updated to 5. As shown in state 2' of FIG9.

After receiving the data [1,2,3,4,5], the receiver 2 processes the data and sends the head information back to the transmitter 1, but it is not known whether the transmitter 1 has received the head value. Since the receiver 2 receives 5 data, the receiver 2 sends the information with head values of 1-5. In this case, the state of the parameter rx_head_cache is updated to 5, which means that the receiver 2 has sent 5 head information (rx_head), but the transmitter 1 has not yet confirmed that the 5 head information has been received, as shown in state 3' in Figure 9.

However, in the implementation shown in FIG. 9 , the (ACK) information of the fourth data packet is lost, that is, the transmitting end 1 only receives three ACK information, but in fact all the data packets have been received and processed at the receiving end 2 .

When the receiving end 2 processes the data, rx_head_cache is updated to 5 and sent to the transmitting end 1. After the transmitting end 1 receives the head and sends a feedback signal, the receiving end updates the parameter rx_head_ckpt. As shown in state 3' in Figure 9, the parameter rx_head_ckpt is updated to 5, as shown in state 3' in Figure 9.

As the sender 1 does not receive the ACK information of data packets 4 and 5, the timeout mechanism will be triggered and the control module will restart the task. After the new task of sender 1 is started, it checks the head value received from receiver 2 and updates tx_head_cache to 5, which means that after sender 1 has sent 5 data, receiver 2 confirms that it has received 5 data. Therefore, sender 1 confirms that its previous timeout was caused by the loss of ACK. It does not need to resend any data and can end directly, as shown in state 3 of Figure 9.

As further shown in state 3 in FIG. 9 , the parameter tx_head_ckpt is updated to 5 accordingly.

As above, the value of rx_head_ckpt in state 3' of Figure 9 should be 5.

Therefore, according to one embodiment of the present disclosure, determining the type of interruption based on the first sending end information and the first receiving end information includes: if the first number is equal to the second number, determining the type of interruption as the loss of response to the data fed back from the receiving end to the sending end.

In this case, the value of the parameter tx_tail_cache is the same as the value of the parameter tx_head_cache, which means that the amount of data sent by the transmitter 1 is the same as the amount of data received by the receiver 2, so the transmitter 1 can realize that the type of interruption is the loss of the response to the data fed back from the receiver to the transmitter. According to one embodiment of the present disclosure, if it is determined that the type of interruption is the loss of the response to the data fed back from the receiver to the transmitter, it is determined that there is no need to retransmit the data.

According to an embodiment of the present disclosure, the data to be retransmitted is stored in the buffer area of the application layer. According to this embodiment, the transport layer no longer needs additional space for retransmission, and the application layer directly reuses the buffer area (such as Recvbuffer) directly used by the application layer, and no additional space is needed, thereby saving the storage overhead of the transport layer.

According to one embodiment of the present disclosure, a retransmission indication is sent through the application layer to retransmit the data.

According to this embodiment, an indication of data retransmission can be provided at the application layer of the sending end, so that the application layer retransmission can correctly handle the link error situation and ensure data reliability through retransmission.

Figure 10 shows a flowchart of a method for controlling data retransmission at the application layer according to another aspect of the present disclosure, comprising: in operation S1010, pre-sending data, the data comprising a payload and a count value for the payload; in operation S1020, in response to the data being in a pre-sending state, recording first transmitting end information, the first transmitting end information indicating a first amount of the data being pre-sent; in operation S1030, receiving and recording first receiving end information, the first receiving end information indicating a second amount of data received and fed back by the receiving end, wherein the second amount is not greater than the first amount; and, in operation S1040, determining whether to retransmit the data based on the first transmitting end information and the first receiving end information.

FIG. 10 shows the operations performed by the transmitting end, which have been described above in conjunction with FIGS. 6-9 and will not be repeated here.

According to an embodiment of the present disclosure, pre-sending data includes: placing the data in a storage space of a sending end for preparation for sending. The storage space here refers to the underlying hardware storage space located at the sending end.

According to one embodiment of the present disclosure, in response to the data being in a pre-sending state, recording the first sending end information includes: in response to the data being placed in the pre-sending state being incremented, the first quantity is incremented accordingly, regardless of whether the data is sent successfully, and/or regardless of whether a response to the data is received.

According to an embodiment of the present disclosure, receiving and recording the first receiving end information includes: updating the second quantity as the first receiving end information changes.

According to an embodiment of the present disclosure, receiving and recording the first receiving end information includes: recording the second receiving end information having the largest second number.

According to one embodiment of the present disclosure, determining whether to retransmit the data based on the first sending end information and the first receiving end information includes: determining the type of interruption based on the first sending end information and the first receiving end information; and determining whether to retransmit the data based on the type of interruption.

According to an embodiment of the present disclosure, determining the type of interruption according to the first transmitting end information and the first receiving end information includes: if the first number is greater than the second number, determining the type of interruption as data loss or error from the transmitting end to the receiving end.

According to one embodiment of the present disclosure, the data loss or error from the transmitting end to the receiving end includes the loss or error of the effective load and/or the loss or error of the counting value.

According to one embodiment of the present disclosure, determining whether to retransmit the data according to the type of the interruption includes: if the type of the interruption is determined to be data loss from the sending end to the receiving end, determining the breakpoint of the interruption according to a second quantity; and retransmitting the data starting from the breakpoint.

According to one embodiment of the present disclosure, determining the type of interruption based on the first transmitting end information and the first receiving end information includes: if the first number is equal to the second number, determining the type of interruption as the loss of response to the data fed back from the receiving end to the transmitting end.

According to one embodiment of the present disclosure, determining whether to retransmit the data according to the type of the interruption includes: if it is determined that the type of the interruption is a loss of a response to the data fed back from the receiving end to the sending end, determining that there is no need to retransmit the data.

According to an embodiment of the present disclosure, data to be retransmitted is stored in a buffer area (eg, Recvbuffer) of the application layer.

According to one embodiment of the present disclosure, a retransmission indication is sent through an application layer to retransmit the data.

FIG. 11 shows a device for controlling data retransmission at the application layer according to one aspect of the present disclosure, comprising: sending Unit 1110 is used to pre-send data, wherein the data includes a payload and a count value for the payload; a transmitting end information updating unit 1120 is used to record first transmitting end information in response to the data being in a pre-sending state, wherein the first transmitting end information indicates a first quantity of the data being pre-sent; a receiving unit 1130 is used to receive and record first receiving end information, wherein the first receiving end information indicates a second quantity of data received and fed back by the receiving end, wherein the second quantity is not greater than the first quantity; and a retransmission judgment unit 1140 is used to determine whether to retransmit the data based on the first transmitting end information and the first receiving end information.

According to one aspect of the present disclosure, there is also provided a communication system for controlling data retransmission at an application layer, the communication system comprising: a transmitting end, a receiving end and a communication link connecting the transmitting end and the receiving end, wherein the transmitting end pre-transmits data to the receiving end, the data comprising a payload and a count value for the payload; the transmitting end records first transmitting end information in response to the data being in a pre-transmitting state, the first transmitting end information representing a first quantity of the data being pre-transmitted; the communication link is used to transmit the data; if the receiving end receives the data, the receiving end records the count value; the receiving end generates a second quantity based on the count value, the second quantity representing a second quantity of the data received and fed back by the receiving end; the receiving end feeds back first receiving end information to the transmitting end, the first receiving end information comprising a second quantity, and the second quantity is not greater than the first quantity; the transmitting end receives and records the first receiving end information; the transmitting end determines whether to retransmit the data based on the first transmitting end information and the first receiving end information.

According to another aspect of the present disclosure, an electronic device is also provided, including: one or more processors; and a memory, in which computer executable instructions are stored. When the computer executable instructions are executed by the one or more processors, the electronic device executes the method described above.

According to yet another aspect of the present disclosure, a computer-readable storage medium is provided, comprising computer-executable instructions. When the computer-executable instructions are executed by one or more processors, the method described above is executed.

The present disclosure is based on the communication flow of head, tail and data (Data) and the corresponding device, and uses the relative relationship of Head/Tail to perform retransmission of the application layer, thereby ensuring the reliability of communication.

At the same time, compared with the existing transport layer performing retransmission, the beneficial effects of performing data retransmission in the application layer are:

In the present disclosure, the starting point and the ending point of the data can be known through the information of the application layer. Therefore, the communication protocol no longer needs to maintain the starting point and the ending point that need to be retransmitted, and only needs to faithfully execute each transmission request issued by the application layer.

In addition, the transport layer no longer needs additional space for retransmission, and the application layer directly reuses the buffer area (such as Recvbuffer) directly used by the application layer, and no additional storage space is needed.

In the present disclosure, retransmission is performed by the application layer, and the application layer will explicitly perceive the occurrence of retransmission. In this way, when the application layer is retransmitting, it can ensure that the same task will not be sent normally at the same time, avoiding the situation where retransmission traffic and normal task traffic compete on the link at the same time, thereby reducing the probability of link congestion and packet loss.

According to the technical solution disclosed in the present invention, the retransmission design at the software level is performed at the application layer, which can reduce the hardware design of the transmission layer, thereby speeding up the development and reducing the hardware customization cost.

According to different application scenarios, the electronic equipment or device disclosed herein may include servers, cloud servers, server clusters, data processing devices, robots, computers, printers, scanners, tablet computers, smart terminals, PC devices, IoT terminals, mobile terminals, mobile phones, driving recorders, navigators, sensors, cameras, cameras, video cameras, projectors, watches, headphones, mobile storage, wearable devices, visual terminals, automatic driving terminals, transportation, household appliances, and/or medical equipment. The transportation includes airplanes, ships and/or vehicles; the household appliances include televisions, air conditioners, microwave ovens, refrigerators, rice cookers, humidifiers, washing machines, electric lights, gas stoves, and range hoods; the medical equipment includes magnetic resonance imaging, ultrasound machines and/or electrocardiographs. The electronic equipment or device disclosed herein may also be applied to the Internet, IoT, data centers, energy, transportation, public administration, manufacturing, education, power grids, telecommunications, finance, retail, construction sites, and medical fields. Further, the electronic equipment or device disclosed herein may also be used in cloud, edge, and terminal applications related to artificial intelligence, big data, and/or cloud computing. In one or more embodiments, the electronic device or apparatus with high computing power according to the disclosed solution can be applied to a cloud device (such as a cloud server). Electronic devices or apparatuses with low power consumption can be applied to terminal devices and/or edge devices (such as smartphones or cameras). In one or more embodiments, the hardware information of the cloud device and the hardware information of the terminal device and/or the edge device are compatible with each other, so that according to the hardware information of the terminal device and/or the edge device, appropriate hardware resources can be matched from the hardware resources of the cloud device to simulate the hardware resources of the terminal device and/or the edge device, so as to complete the unified management, scheduling and collaborative work of the end-cloud integration or the cloud-edge-end integration.

The embodiments of the present disclosure are described in detail above. Specific examples are used herein to illustrate the principles and implementation methods of the present disclosure. The description of the above embodiments is only used to help understand the method and its core idea of the present disclosure. At the same time, changes or deformations made by those skilled in the art based on the ideas of the present disclosure, the specific implementation methods and the scope of application of the present disclosure, all belong to the scope of protection of the present disclosure. In summary, the content of this specification should not be understood as a limitation on the present disclosure.

The following clauses can better understand the technical solution of the present disclosure.

Clause 1. A method for controlling data retransmission at an application layer, comprising:

Pre-sending data, the data comprising a payload and a count value for the payload;

In response to the data being in a pre-sending state, recording first sending end information, wherein the first sending end information indicates a first amount of the data being pre-sent;

Receiving and recording first receiving end information, wherein the first receiving end information indicates a second amount of data received and fed back by the receiving end, wherein the second amount is not greater than the first amount;

Determine whether to retransmit the data according to the first sending end information and the first receiving end information.

Clause 2. The method according to clause 1, wherein pre-sending data comprises: placing the data in a storage space of the sending end in preparation for sending.

Clause 3. The method according to clause 1 or 2, wherein, in response to the data being in a pre-sending state, recording the first sending end information comprises:

In response to the data placed in the pre-transmission state being incremented, the first number is incremented accordingly, regardless of whether the data is successfully transmitted and/or regardless of whether a response to the data is received.

Clause 4. The method according to any one of clauses 1-3, wherein receiving and recording the first receiving end information includes: updating the second quantity as the first receiving end information changes.

Clause 5. The method according to any one of clauses 1-3, wherein receiving and recording the first receiving end information comprises: recording the second receiving end information having the largest second number.

Clause 6. A method according to any one of clauses 1-5, wherein, in response to an interruption signal reported from a communication link, it is determined whether to retransmit the data based on the first sending end information and the first receiving end information.

Clause 7. The method according to clause 6, wherein determining whether to retransmit the data according to the first transmitting end information and the first receiving end information comprises:

Determine the type of interruption according to the first sending end information and the first receiving end information;

Whether to retransmit the data is determined according to the type of the interruption.

Clause 8. The method of clause 7, wherein determining the type of interruption based on the first transmitting end information and the first receiving end information comprises:

If the first number is greater than the second number, the type of the interruption is determined to be data loss or error from the transmitting end to the receiving end.

Clause 9. The method according to clause 8, wherein the data loss or error from the transmitting end to the receiving end includes the loss or error of the payload and/or the loss or error of the count value.

Clause 10. The method according to clause 8 or 9, wherein determining whether to retransmit the data according to the type of the interruption comprises:

If it is determined that the type of the interruption is data loss from the transmitting end to the receiving end, determining a breakpoint of the interruption according to the second quantity;

The data is retransmitted starting from the breakpoint.

Clause 11. The method according to clause 7, wherein the first transmitting end information and the first receiving end information are used to Identify the types of interruptions including:

If the first number is equal to the second number, it is determined that the type of the interruption is a loss of a response to the data fed back from the receiving end to the transmitting end.

Clause 12. The method according to clause 11, wherein determining whether to retransmit the data according to the type of the interruption comprises:

If it is determined that the type of interruption is a loss of a response to the data fed back from the receiving end to the sending end, it is determined that there is no need to retransmit the data.

Clause 13. The method according to any one of clauses 1 to 12, wherein the data to be retransmitted is stored in a buffer area of the application layer.

Clause 14. The method according to any one of clauses 1 to 13, wherein a retransmission indication is sent through an application layer to perform retransmission of the data.

Clause 15. An apparatus for controlling data retransmission at an application layer, comprising:

a sending unit, configured to pre-send data, wherein the data includes a payload and a count value for the payload;

a sending end information updating unit, configured to record first sending end information in response to the data being in a pre-sending state, wherein the first sending end information indicates a first amount of the data being pre-sent;

A receiving unit, configured to receive and record first receiving end information, wherein the first receiving end information indicates a second amount of data received and fed back by the receiving end, wherein the second amount is not greater than the first amount;

A retransmission determination unit is used to determine whether to retransmit the data according to the first sending end information and the first receiving end information.

Clause 16. A method for controlling data retransmission at an application layer in a communication system, the communication system comprising:

A transmitting end, a receiving end and a communication link connecting the transmitting end and the receiving end, the method comprising:

On the sending side,

Pre-sending data to a receiving end, the data including a payload and a count value for the payload;

transmitting the data via the communication link;

On the receiving end,

If the data is received, recording the count value;

Generate a second number according to the count value, where the second number represents the number of the data received and fed back by the receiving end;

Feedback first receiving end information to the transmitting end, where the first receiving end information includes a second quantity, and the second quantity is not greater than the first quantity;

On the sending side,

Receiving and recording the first receiving end information;

Clause 17. The method according to clause 16, wherein pre-sending data to the receiving end comprises: placing the data in a storage space of the sending end in preparation for sending.

Clause 18. The method according to clause 16 or 17, wherein, in response to the data being in a pre-sending state, recording the first sending end information comprises:

Clause 19. The method of any one of clauses 16-18, wherein if the data is received, recording the count value comprises: overwriting the old count value with the new count value.

Clause 20. The method of any one of clauses 16-18, wherein if the data is received, recording the count value comprises:

searching the received data for the maximum count value; and

Only the largest count value is recorded.

Clause 21. The method according to any one of clauses 16-20, wherein receiving and recording the first receiving end information includes: updating the second quantity as the first receiving end information changes.

Clause 22. The method according to any one of clauses 16-20, wherein receiving and recording the first receiving end information comprises: recording the second receiving end information having the largest second number.

Clause 23. A method according to any one of clauses 16-22, wherein, in response to the occurrence of a specific event, the communication link reports an interrupt signal to the transmitting end.

Clause 24. The method according to clause 23, wherein, in response to data loss or error from the sending end to the receiving end, the communication link reports an interrupt signal to the sending end.

Clause 25. The method according to Clause 24, wherein the data loss or error from the transmitting end to the receiving end includes the loss or error of the payload and/or the loss or error of the count value.

Clause 26. The method according to clause 23, wherein, in response to a loss of an acknowledgement for the data fed back from the receiving end to the sending end, the communication link reports an interruption signal to the sending end.

Clause 27. A method according to any one of clauses 16-26, wherein, in response to an interruption signal reported from a communication link, it is determined whether to retransmit the data based on the first sending end information and the first receiving end information.

Clause 28. The method according to clause 27, wherein determining whether to retransmit the data according to the first transmitting end information and the first receiving end information comprises:

Clause 29. The method of clause 28, wherein determining the type of interruption based on the first transmitting end information and the first receiving end information comprises:

Clause 30. The method according to clause 29, wherein determining whether to retransmit the data according to the type of the interruption comprises:

The data is retransmitted starting from the breakpoint.

Clause 31. The method of clause 28, wherein determining the type of interruption based on the first transmitting end information and the first receiving end information comprises:

Clause 32. The method according to clause 31, wherein determining whether to retransmit the data according to the type of the interruption comprises:

Clause 33. A method according to any one of clauses 16-32, wherein the data to be retransmitted is stored in a buffer area of the application layer.

Clause 34. The method according to any one of clauses 16-33, wherein a retransmission indication is sent by an application layer to perform retransmission of the data.

Clause 35. A communication system for controlling data retransmission at an application layer, the communication system comprising: a transmitting end, a receiving end, and a communication link connecting the transmitting end and the receiving end, wherein:

The transmitting end pre-sends data to the receiving end, the data including a payload and a count value for the payload; the transmitting end records first transmitting end information in response to the data being in a pre-sending state, the first transmitting end information Indicates a first amount of data to be pre-sent;

The communication link is used to transmit the data;

If the receiving end receives the data, the receiving end records the count value;

The receiving end generates a second number according to the count value, where the second number represents a second number of data received and fed back by the receiving end;

The receiving end feeds back first receiving end information to the transmitting end, where the first receiving end information includes a second quantity, and the second quantity is not greater than the first quantity;

The sending end receives and records the first receiving end information;

The sending end determines whether to retransmit the data according to the first sending end information and the first receiving end information.

Clause 36. An electronic device comprising:

one or more processors; and

A memory storing computer executable instructions, which, when executed by the one or more processors, enable the electronic device to execute a method as described in any one of clauses 1-14 or 16-34.

Clause 37. A computer-readable storage medium comprising computer-executable instructions, which, when executed by one or more processors, perform the method as described in any one of clauses 1-14 or 16-34.

Claims

A method for controlling data retransmission at an application layer, comprising:

Pre-sending data, the data comprising a payload and a count value for the payload;

In response to the data being in a pre-sending state, recording first sending end information, wherein the first sending end information indicates a first amount of the data being pre-sent;

Receiving and recording first receiving end information, wherein the first receiving end information indicates a second amount of data received and fed back by the receiving end, wherein the second amount is not greater than the first amount;

Determine whether to retransmit the data according to the first sending end information and the first receiving end information.
The method according to claim 1, wherein pre-sending data comprises: placing the data in a storage space of the sending end for preparation for sending.
The method according to claim 1 or 2, wherein, in response to the data being in a pre-sending state, recording the first sending end information comprises:

In response to the data placed in the pre-transmission state being incremented, the first number is incremented accordingly, regardless of whether the data is successfully transmitted and/or regardless of whether a response to the data is received.
The method according to any one of claims 1-3, wherein receiving and recording the first receiving end information comprises: updating the second quantity as the first receiving end information changes.
The method according to any one of claims 1 to 3, wherein receiving and recording the first receiving end information comprises: recording the second receiving end information having the largest second number.
A method according to any one of claims 1-5, wherein, in response to an interruption signal reported from a communication link, determining whether to retransmit the data is performed based on the first sending end information and the first receiving end information.
The method according to claim 6, wherein determining whether to retransmit the data according to the first transmitting end information and the first receiving end information comprises:

Determine the type of interruption according to the first sending end information and the first receiving end information;

Whether to retransmit the data is determined according to the type of the interruption.
The method according to claim 7, wherein determining the type of interruption according to the first transmitting end information and the first receiving end information comprises:

If the first number is greater than the second number, the type of the interruption is determined to be data loss or error from the transmitting end to the receiving end.
The method according to claim 8, wherein the data loss or error from the transmitting end to the receiving end includes the loss or error of the effective load and/or the loss or error of the count value.
The method according to claim 8 or 9, wherein determining whether to retransmit the data according to the type of the interruption comprises:

If it is determined that the type of the interruption is data loss from the transmitting end to the receiving end, determining a breakpoint of the interruption according to the second quantity;

The data is retransmitted starting from the breakpoint.
The method according to claim 7, wherein determining the type of interruption according to the first transmitting end information and the first receiving end information comprises:

If the first number is equal to the second number, it is determined that the type of the interruption is a loss of a response to the data fed back from the receiving end to the transmitting end.
The method according to claim 11, wherein determining whether to perform the data The retransmission includes:

If it is determined that the type of interruption is a loss of a response to the data fed back from the receiving end to the sending end, it is determined that there is no need to retransmit the data.
The method according to any one of claims 1 to 12, wherein the data to be retransmitted is stored in a cache area of the application layer.
The method according to any one of claims 1 to 13, wherein a retransmission indication is sent through an application layer to retransmit the data.
A device for controlling data retransmission at an application layer, comprising:

a sending unit, configured to pre-send data, wherein the data includes a payload and a count value for the payload;

a sending end information updating unit, configured to record first sending end information in response to the data being in a pre-sending state, wherein the first sending end information indicates a first amount of the data being pre-sent;

A receiving unit, configured to receive and record first receiving end information, wherein the first receiving end information indicates a second amount of data received and fed back by the receiving end, wherein the second amount is not greater than the first amount;

A retransmission determination unit is used to determine whether to retransmit the data according to the first sending end information and the first receiving end information.
A method for controlling data retransmission at an application layer in a communication system, the communication system comprising: a transmitting end, a receiving end and a communication link connecting the transmitting end and the receiving end, the method comprising:

On the sending side,

Pre-sending data to a receiving end, the data including a payload and a count value for the payload;

In response to the data being in a pre-sending state, recording first sending end information, wherein the first sending end information indicates a first amount of the data being pre-sent;

transmitting the data via the communication link;

On the receiving end,

If the data is received, recording the count value;

Generate a second number according to the count value, where the second number represents the number of the data received and fed back by the receiving end;

Feedback first receiving end information to the transmitting end, where the first receiving end information includes a second quantity, and the second quantity is not greater than the first quantity;

On the sending side,

Receiving and recording the first receiving end information;

Determine whether to retransmit the data according to the first sending end information and the first receiving end information.
The method according to claim 16, wherein pre-sending data to the receiving end comprises: placing the data in a storage space of the sending end for preparation for sending.
The method according to claim 16 or 17, wherein, in response to the data being in a pre-sending state, recording the first sending end information comprises:

In response to the data placed in the pre-transmission state being incremented, the first number is incremented accordingly, regardless of whether the data is successfully transmitted and/or regardless of whether a response to the data is received.
The method according to any one of claims 16 to 18, wherein if the data is received, recording the count value comprises: overwriting the old count value with the new count value.
The method according to any one of claims 16 to 18, wherein if the data is received, recording the count value comprises:

searching the received data for the maximum count value; and

Only the largest count value is recorded.
The method according to any one of claims 16-20, wherein receiving and recording the first receiving end information comprises: updating the second quantity as the first receiving end information changes.
The method according to any one of claims 16-20, wherein receiving and recording the first receiving end information comprises: recording the second receiving end information having the largest second number.
The method according to any one of claims 16 to 22, wherein, in response to the occurrence of a specific event, the communication link reports an interrupt signal to the transmitting end.
The method of claim 23, wherein, in response to data loss or error from the transmitting end to the receiving end, the communication link reports an interrupt signal to the transmitting end.
The method according to claim 24, wherein the data loss or error from the transmitting end to the receiving end includes the loss or error of the payload and/or the loss or error of the count value.
The method according to claim 23, wherein, in response to a loss of a response to the data fed back from the receiving end to the sending end, the communication link reports an interrupt signal to the sending end.
A method according to any one of claims 16-26, wherein, in response to an interruption signal reported from a communication link, it is determined whether to retransmit the data based on the first sending end information and the first receiving end information.
The method according to claim 27, wherein determining whether to retransmit the data according to the first transmitting end information and the first receiving end information comprises:

Determine the type of interruption according to the first sending end information and the first receiving end information;

Whether to retransmit the data is determined according to the type of the interruption.
The method according to claim 28, wherein determining the type of interruption according to the first transmitting end information and the first receiving end information comprises:

If the first number is greater than the second number, the type of the interruption is determined to be data loss or error from the transmitting end to the receiving end.
The method according to claim 29, wherein determining whether to retransmit the data according to the type of the interruption comprises:

If it is determined that the type of the interruption is data loss from the transmitting end to the receiving end, determining a breakpoint of the interruption according to the second quantity;

The data is retransmitted starting from the breakpoint.
The method according to claim 28, wherein determining the type of interruption according to the first transmitting end information and the first receiving end information comprises:

If the first number is equal to the second number, it is determined that the type of the interruption is a loss of a response to the data fed back from the receiving end to the transmitting end.
The method according to claim 31, wherein determining whether to retransmit the data according to the type of the interruption comprises:

If it is determined that the type of interruption is a loss of a response to the data fed back from the receiving end to the sending end, it is determined that there is no need to retransmit the data.
A method according to any one of claims 16 to 32, wherein the data to be retransmitted is stored in a cache area of the application layer.
The method according to any one of claims 16 to 33, wherein a retransmission indication is sent through an application layer to retransmit the data.
A communication system for controlling data retransmission at an application layer, the communication system comprising: a transmitting end, a receiving end and a communication link connecting the transmitting end and the receiving end, wherein:

The transmitting end pre-sends data to the receiving end, the data including a payload and a count value for the payload;

In response to the data being in a pre-sending state, the sending end records first sending end information, where the first sending end information indicates a first amount of the data being pre-sent;

The communication link is used to transmit the data;

If the receiving end receives the data, the receiving end records the count value;

The receiving end generates a second number according to the count value, where the second number represents a second number of data received and fed back by the receiving end;

The receiving end feeds back first receiving end information to the transmitting end, where the first receiving end information includes a second quantity, and the second quantity is not greater than the first quantity;

The sending end receives and records the first receiving end information;

The sending end determines whether to retransmit the data according to the first sending end information and the first receiving end information.
An electronic device, comprising:

one or more processors; and

A memory, wherein computer executable instructions are stored in the memory, and when the computer executable instructions are executed by the one or more processors, the electronic device executes the method as described in any one of claims 1-14 or 16-34.
A computer-readable storage medium comprises computer-executable instructions, and when the computer-executable instructions are executed by one or more processors, the method as claimed in any one of claims 1 to 14 or 16 to 34 is executed.