CN110535793A

CN110535793A - The message total order mechanism of distributed system

Info

Publication number: CN110535793A
Application number: CN201810515834.4A
Authority: CN
Inventors: 张霖涛; 白巍; 左格非; 李博杰
Original assignee: Microsoft Technology Licensing LLC
Current assignee: Microsoft Technology Licensing LLC
Priority date: 2018-05-25
Filing date: 2018-05-25
Publication date: 2019-12-03
Also published as: WO2019226367A1

Abstract

The message total order mechanism of distributed system disclosed in the disclosure, realizes the total order control of message by way of timestamp combination message barrier, and is handled using interchanger barrier, effectively improves the treatment effeciency for realizing the control of message total order.

Description

Message full-order mechanism of distributed system

Background

In a distributed system, in Order to meet the requirement of consistency (consistency) between distributed transaction (transactions) processing and distributed storage, it is desirable to implement a full Order (Total Order) message processing mechanism in the distributed system, that is, each receiving end is required to process received messages according to the same message delivery (delivery) Order.

Disclosure of Invention

This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

The disclosure provides a message full-order mechanism of a distributed system, which realizes the full-order control of messages in a mode of combining timestamps with message barriers (barriers), and utilizes a switch to process the message barriers, thereby effectively improving the processing efficiency of realizing the full-order control of the messages.

The foregoing description is only an overview of the technical solutions of the present disclosure, and the embodiments of the present disclosure are described below in order to make the technical means of the present disclosure more clearly understood and to make the above and other objects, features, and advantages of the present disclosure more clearly understandable.

Drawings

FIG. 1 is a block diagram depicting one of the example environments for a message ordering mechanism for a distributed system;

FIG. 2 is a block diagram depicting a second example environment for a message ordering mechanism of a distributed system;

FIG. 3 is a block diagram depicting a third example environment for a message ordering mechanism of a distributed system;

FIG. 4 is a block diagram depicting a fourth example environment for a message ordering mechanism of a distributed system;

FIG. 5 is a block diagram depicting a fifth example environment for a message ordering mechanism of a distributed system;

FIG. 6 is a block diagram depicting one of the message structure variations in an example environment of a message ordering mechanism of a distributed system;

FIG. 7 is a block diagram depicting a second variation of the message structure in an example environment of the message ordering mechanism of the distributed system;

FIG. 8 is a block diagram depicting a third variation of the message structure in an example environment of the message ordering mechanism of a distributed system;

FIG. 9 is a block diagram depicting a fourth variation of the message structure in an example environment of the message ordering mechanism of the distributed system;

FIG. 10 is a block diagram depicting six of an example environment for a message ordering mechanism for a distributed system;

FIG. 11 is a block diagram depicting a seventh example environment for a message ordering mechanism of a distributed system;

FIG. 12 is a block diagram depicting an eighth example environment for a message reordering mechanism for a distributed system;

FIG. 13 is a block diagram schematic of the architecture of one of the implementation examples of a switch implementing a message endianness mechanism;

FIG. 14 is a block diagram showing the structure of a second example of an implementation of a switch implementing a message ordering mechanism;

fig. 15 is a block diagram showing a third example of a switch implementing a message ordering mechanism;

FIG. 16 is a block diagram illustrating the structure of a fourth example of an implementation of a switch implementing a message ordering mechanism;

FIG. 17 is a block diagram showing the structure of a fifth example of an implementation of a switch implementing a message ordering mechanism;

FIG. 18 is a block diagram showing the structure of six examples of an implementation of a switch implementing a message ordering mechanism;

FIG. 19 is a flowchart of one of the illustrative processes for implementing a message ordering mechanism;

FIG. 20 is a flowchart of a second illustrative process for implementing a message ordering mechanism;

FIG. 21 is a flow diagram of a third illustrative process for implementing a message ordering mechanism;

FIG. 22 is a flowchart of four of an illustrative process for implementing a message ordering mechanism;

FIG. 23 is a flow diagram of five of an illustrative process for implementing a message order completion mechanism;

FIG. 24 is a flow diagram of six of an illustrative process for implementing a message ordering mechanism;

FIG. 25 is a block diagram of an exemplary electronic device;

FIG. 26 is a block diagram of an exemplary computing device.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

In this disclosure, the terms "techniques," "mechanisms" may refer to, for example, system(s), method(s), computer-readable instructions, module(s), algorithms, hardware logic (e.g., Field Programmable Gate Array (FPGA)), Application Specific Integrated Circuit (ASIC), Application Specific Standard Product (ASSP), System On Chip (SOC), Complex Programmable Logic Device (CPLD), and/or other technique(s) described above and/or allowed throughout this disclosure.

Overview

In a distributed system, a plurality of hosts are connected through a network. When an event occurs on one of the hosts, the host acting as a sender (hereinafter called sender host) will disseminate (scatter) messages to other multiple hosts acting as receivers (hereinafter called receiver hosts). The scattering here means: the sending end host sends messages to different receiving end hosts with the same timestamp. These messages may be messages of the same content (multicast or broadcast) or messages of different content.

In a distributed system, there are multiple sending end hosts and multiple receiving end hosts, and each receiving end host receives messages broadcast by different sending end hosts. However, due to network delay, etc., the time when the same sender-side host broadcasts the message to multiple receiver-side hosts at the same time may be different. When a plurality of sending end hosts perform the broadcast operation, the sequence of the messages received by the receiving end hosts may be different from the sequence of the events of the plurality of sending end hosts.

It is desirable to implement a full-Order (Total Order) message processing mechanism in a distributed system, that is, each receiving end host is required to deliver messages in the same Order (delivery), so as to meet the requirement of consistency between distributed transaction (transactions) processing and distributed storage.

For example, the sender master S1 sent a message M11 and a message M12 with a timestamp T1 to the receiver master R1 and the receiver master R2, respectively, at time T1, while the sender master S2 sent a message M21 and a message M22 with a timestamp T2 to the receiver master R1 and the receiver master R2, respectively, at time T2. In a distributed system capable of achieving full message order, the delivery order of the receiver host R1 for message M11 and message M21 should be consistent with the delivery order of the receiver host R2 for message M12 and message M22. For example, if the delivery order of the receiver master R1 is message M11 → message M21, the delivery order of the receiver master R2 is required to be message M12 → message M22, whereas if the delivery order of the receiver master R1 is message M21 → message M11, the delivery order of the receiver master R2 is required to be message M22 → message M12.

It should be noted that the message full-order mechanism requires that the delivery order of each receiving end host is uniform, and does not require that each receiving end host delivers the message according to the sequence of sending messages by the sending end host. The message delivery referred to in this disclosure is the provision of a received message to an application.

The present disclosure provides a Message full-Order mechanism of a distributed system to implement full-Order Message Scattering (Total-Order Message Scattering), which implements full-Order control of messages mainly by means of timestamp combined with Message barriers (barriers), and can utilize a switch to process the barriers, thereby effectively improving the processing efficiency of implementing full-Order control of messages.

In this disclosure, two message barriers are involved, one being a data message barrier, which means that the basic meaning is: on its corresponding link, no more data messages with a timestamp less than the data message barrier are received, that is, on its corresponding link, data messages with a timestamp less than the data message barrier have all been sent. Basic message full-order control can be achieved by combining time stamps with data message barriers. Another type of message barrier is the response message barrier, which represents the basic meaning: data messages with a timestamp less than or equal to the acknowledgement message barrier on their corresponding links have all been received by the receiver hosts. The response message barrier is helpful for solving the problem of transmission packet loss of the distributed system, and reliable message full-sequence control can still be ensured under the condition of packet loss through the response barrier.

In addition, the present disclosure provides a timestamp adjustment mechanism, which adjusts timestamps of each host, so that the message delivery waiting time caused by link latency is shortened in the whole distributed system.

Example implementations of the message reordering mechanism of the present disclosure are described in detail below, respectively.

Illustrative Environment

The environment described below constitutes but one example and is not intended to limit the claims to any particular operating environment. Other environments may be used without departing from the spirit and scope of the claimed subject matter.

Illustrative Environment A

By way of example, as shown in FIG. 1, there is a block diagram 100 depicting one of the example environments for a full-order mechanism of messages for a distributed system, and as shown in FIG. 2, there is a block diagram 200 depicting a second of the example environments for a full-order mechanism of messages for a distributed system. As shown in block diagrams 100 and 200, the distributed system includes a plurality of initiator hosts 101, a plurality of switches 102, and a plurality of recipient hosts 103. A plurality of sender hosts 101 and a plurality of receiver hosts 103 transmit data messages 104 through a plurality of switches 102, and the data messages 104 include at least timestamps 105, data message barriers 106, and data 108. A plurality of links 107 are formed between the plurality of sender hosts 101, the plurality of switches 102, and the plurality of receiver hosts 103. The sending end host 101 and the receiving end host 103 are both hosts in a distributed system, and are only distinguished for clearly describing the transmission process of the data message 104, and those skilled in the art can understand that the sending end host 101 and the receiving end host 103 can exchange roles, so as to perform reverse transmission of the data message 104.

For convenience of the following exemplary description, the sending end hosts 101, the switches 102, the receiving end hosts 103, and the links 107 exemplarily shown in the block diagrams 100 and 200 are further numbered, specifically, the sending end hosts 101a to 101d, the switches 102a to 102d, the receiving end hosts 103a to 103b, and the links 107a to 107j are numbered with numbers of 101 to 103 and 107 when the sending end hosts, the switches, the receiving end hosts, and the links are generally designated, and numbers of 101a to 101d, 102a to 102d, 103a to 103b, and 107a to 107j are numbered when a certain sending end host, a switch, a receiving end host, and a link are specifically described.

Similarly, to clearly illustrate the processing of the data message barrier 106 by the switch, the data message barriers 106 on the links in the block diagrams 100 and 200 are further numbered 106 a-106 j and are simply labeled "barriers" in the figures.

Processing by the sender host 101:

the sender host 101 may generate a plurality of data messages 104 in response to the occurrence of an event, and these data messages 104 are written with the same time stamp 105 and sent to a plurality of receiver hosts 103. The timestamps 105 may be generated based on the local physical time of the sending host 101, and the timestamps 105 will be incremented in the order in which the data messages 104 were sent. In the present disclosure, the local physical clocks of the various hosts of the distributed system are growing at the same or substantially the same rate (within a reasonable margin of error), or the local physical clocks of the various hosts are synchronized or substantially synchronized.

In the present disclosure, the data message 104 refers to a message that needs to be posted processed by the receiver-side host 103. The data message 104 generated by the sending-side host 101 includes a data message barrier 106 in addition to the time stamp 105. For the sending host 101, the data message barrier 106 represents the meaning: the sending host 101 will not send any more data messages 104 with a timestamp less than the data message barrier 106, i.e. data messages 104 with a timestamp less than the data message barrier 106 have been sent out by the sending host 101. The data message barrier 106 may be set to coincide with the timestamp 105 when the sending host 101 generates the data message 104.

It should be noted that the data message 104 sent by the sending-end host 101 to multiple receiving-end hosts may carry the same data, or may carry different data. In addition, the sending end host 101 may send the data message 104 to all receiving end hosts 103 in the distributed system including itself, or may send the data message 104 to some receiving end hosts 103 in the distributed system.

Processing by the switch 102:

as shown in block diagram 100, multiple links are formed between the sending host 101 and the receiving host 103 for transmitting messages. In the case illustrated in block diagram 100, links 107 a-107 j are formed between switches and between a switch and a host. The block diagram 100 is only used to illustrate the case of unidirectional transmission of messages from the sending end host 101 to the receiving end host 103, and for each switch 102, the link that receives the message is referred to as the ingress link and the link that sends the message is referred to as the egress link.

Each switch 102 in the block diagram 100 performs the following:

a data message 104 is received from the ingress link and a data message barrier 106 contained in the data message 104 is obtained. Each switch 102 may have multiple ingress links and multiple egress links, for example, the ingress link of switch 102a is links 107a and 107b and the egress link is link 107e and link 107g, so that switch 102 will receive data messages 104 from multiple ingress links and will also receive multiple data messages 104 on one link.

After obtaining the data message barriers 106 in each data message 104, the switch 102 determines the minimum value of the data message barriers 106, and then modifies the data message barriers 106 in the received data messages 104 such that the data message barriers 106 are equal to the determined minimum value. The modified data message 104 is then sent from the egress link.

For example, the data message barrier 106a carried in the data message 104 received by the switch 102a from the link 107a is 1, the data message barrier 106b of the data message 104 received by the link 107b is 4, and it can be known through comparison that the data message barrier 106a is smaller than the data message barrier 106b, and in the switch 102a, the minimum value of the current data message barrier 106 is 1, so that the switch 102a modifies all the data message barriers 106 in the data message 104 received by the switch 102a to 1. The modified data message 104 is then sent from links 107e and 107f to the next hop switches 102c and 102 d.

In the switch 102, the data message 104 carrying the data message barrier 106 is continuously received, the minimum value of the data message barrier 106 is continuously updated, and the previously determined minimum value of the data message barrier 106 is also involved in the processing of the minimum value of the data message barrier 106 that comes next.

As can be seen from the above processing procedure of the switch 102, no matter how many ingress links and egress links of the switch 102 exist, after being processed by the switch 102, the data message barrier 106 in the data message 104 sent to the next hop has the following meaning: the switch 102 does not retransmit the data message 104 with a timestamp less than the data barrier 106 on the egress link where the data barrier 106 is located, i.e., all data messages 104 with a timestamp less than the data message barrier 106 on the egress link where the data barrier 106 is located have been transmitted. As can be seen from the whole block diagram 100, after the hierarchical processing of the switches 102, the data message barrier 106 finally obtained by any receiver host 103 should be the smallest data message barrier 106 in the links associated with it.

In the block diagram 100, since the sending end hosts 101a to 101d send data messages 104 to the receiving end hosts 103a and 103b, respectively, the data message barriers 106 received by the receiving end hosts 103a and 103b are the minimum of the message barriers 106 in the entire distributed system.

However, it should be noted that, for any receiver host 103, the received data message barrier 106 does not have to be the minimum value in the entire distributed system, as long as it can be ensured that the data message barrier 106 received by the receiver host 103 is the minimum value of the data message barrier 106 in the link associated with the receiver host, that is, the data message barrier 106 received by the receiver host 103 is the minimum value of the data message barrier 106 in the data message 104 that can be received, so that it can be ensured that the receiver host 103 implements full-order delivery of data messages.

As shown in block diagram 200, as another exemplary environment, it differs from block diagram 100 in that link 107f is absent, in block diagram 200, sending end host 101a and sending end host 101b do not send data message 104 to receiving end host 103b, in this case, the minimum value of data message barrier 106 received by receiving end host 103b is 2, and the minimum value of data message barrier 106 received by receiving end host 103a is 1, although the minimum value of data message barrier 106 obtained by receiving end host 103b is not the minimum value of the entire distributed system, however, the minimum value 2 obtained by receiving end host 103b is already the minimum value of data message barrier 106 in data message 104 that it should receive, and for receiving end host 103b, the condition for performing message full-sequence control has been satisfied.

Processing by the receiving-end host 103:

as shown in block 100 and block 200, the data message 104 arrives at the receiver hosts 103 through the transmission process of the switches 102, and each receiver host 103 may be connected to one switch 102 (where there is one ingress link) or may be connected to multiple switches 102 (where there are multiple ingress links). The block diagram 100 and the block diagram 200 only show the case where the receiver host 103 is connected to one switch 102.

Each receiver host 103 in the block diagrams 100 and 200 performs the following processing:

the receiver host 103 receives the data message 104 from the ingress link, and then sorts the messages according to the timestamp 105 included in the received data message 104, and the sorted data message 104 may be stored in the buffer queue 120 of the receiver host 103 first.

The receiver host 103 obtains the data message barrier 106 from the received data message 104, determines the minimum value 119 of the data message barrier 106, and delivers data messages 104 having timestamps 105 less than the minimum value 119.

Since in the sending end-host 101 the initial value of the data message barrier 106 may be equal to the timestamp 105 of the sent data message 104. After processing by each switch 102, the data message barrier 106 is set by the switch 102 to the minimum value of the data message barrier 106 in each ingress link of the switch 102, and is sent to the next-level switch 102 or receiver host 103 through the egress link. With such a processing mechanism, after the data message barrier 106 reaches the receiver host 103, the physical meaning of the minimum value 119 of the data message barrier 106 for the receiver host 103 is: data messages 104 with timestamps 105 less than the minimum value 119 do not occur in all links associated with the receiver host 103, i.e., data messages 104 with timestamps 105 less than the minimum value 119 have been sent. Thus, the receiver host 103 can safely deliver data messages 104 in the buffer queue 120 having a timestamp 105 less than the data message barrier 106 without disrupting the full order of the messages in the distributed system.

Based on the processing of the data message 104 by the distributed system of the block diagram 100 and the block diagram 200, the full-order control of the data message 104 is realized, so that the processing of the data message 104 can be maintained in the same order in a plurality of receiver hosts 103.

In terms of implementation of full-order messages, a data message barrier 106 is introduced, and the data message barrier 106 is processed by a switch, so that the data message 104 on a transmission link, for any switch 102, the data message barrier 106 carried in the data message 104 sent to an egress link is the minimum value of the current data message barrier 106, thereby ensuring that after the data message 104 reaches a receiver host 103, the receiver host 103 can know the minimum timestamp of the data message 104 sent by each sender host 101 to the receiver host 103 according to the data message barrier 106 carried in the data message 104, that is, for a certain receiver host 103, it can be determined through the data message barrier 106 received by the receiver host 103 that the receiver host 103 can no longer receive the data message 104 with the timestamp smaller than that of the data message barrier 106, accordingly, data messages 104 that are received with a timestamp less than the data message barrier 106 may be posted to ensure full order delivery of the data messages 104.

In the distributed system, the timestamp 105 and the data message barrier 106 are control information of the full order of the message, and these control information only need to be transmitted once from the sending-end host 101 to the receiving-end host 103 along with the data message 104, which reduces the amount of information transmission generated in the distributed system for realizing the full order control of the message and also reduces the difficulty in coordinating the distributed system for realizing the full order of the message.

Furthermore, the switch 102 only makes the minimum determination and modification of the data message barrier 106 without involving processing of the data 108 in the data message 104, and therefore does not impede transmission of the data message 104. That is to say, in the aspect of implementing message full-order control, the control layer and the data layer of the message are separated, and the switch only processes the information in the aspect of the full-order control layer without influencing the normal transmission of the message, so that the method can be efficiently implemented under the structure of the existing commercial switch.

As a modification, the sender host 101 may be configured such that the initial value of the data message barrier 106 is null, and the switch 102 connected to the sender host 101 writes the data message barrier 106. For example, in a switch 102 connected to a sender host 101, a timestamp 105 in a data message 104 received from an ingress link corresponding to the sender host 101 may be used as an initial data message barrier 106. It can be seen that this variant is substantially the same as the aforementioned handling of the timestamp 105 in the data message 104 as an initial value of the data message barrier 106 by the sender host 101.

Illustrative Environment B

In the example shown in block diagram 100 and block diagram 200, a data message barrier 106, which is message reordering control information, is communicated via data message 104. In another example, the data message barrier 106 may also be transmitted via beacon messages other than the data message 104, which are relatively simple in structure and do not need to carry data, as opposed to the data message 104.

As shown in FIG. 3, a block diagram 300 depicting a third example environment for a message ordering mechanism for a distributed system. The block diagram 300 includes a plurality of initiator hosts 301, a plurality of switches 302, and a plurality of target hosts 303, wherein a plurality of links 307 are formed between the plurality of switches 302, and between the plurality of switches 302 and the plurality of initiator hosts 301 and the plurality of target hosts 303.

At sending end-host 301, in block 300, two types of messages are generated: one is data message 304 and the other is beacon message 309. Data message 304 includes timestamp 305 and data 308. Data message 304 in block diagram 300 differs from data message 104 in block diagram 100 and block diagram 200 in that it does not contain data message barrier 106. In block diagram 300, a data message barrier 306 is included in a beacon message 309.

The initiator host 301 generates and transmits the data message 304 as usual, and the beacon message 309 may be generated and transmitted at certain time intervals. Beacon message 309 is equivalent to being inserted at regular intervals into the data message stream formed by data message 304. The data message barrier 306 in the beacon message 309 may be generated based on the local physical time when the beacon message 309 was generated (corresponding to the timestamp of the beacon message 309), or may be the timestamp 305 of the data message 304 that has been generated and transmitted in the current time period. To be able to set the data message barrier 306 more accurately, the timestamp 305 of the most recently generated and transmitted data message 304 may be taken as the data message barrier 306 when the beacon message 309 is generated.

In block diagram 300, in each switch 302, the data message barrier 306 is obtained from the beacon message 309 and the minimum value is determined, and then the data message barrier 306 in the received beacon message 309 is modified such that the data message barrier 306 equals the determined minimum value. The modified beacon message 309 is then sent from the egress link. In the example of block diagram 300, the switch 302 processes the message full-order control information only for the beacon message 309 and no longer for the data message 304. Under normal circumstances, since the number of beacon messages 309 will be less than the number of data messages 304, and the structure of the beacon messages 309 is relatively simple, computational resources of the switch 302 can be saved.

As a variation, after determining the minimum value of the data message barrier 306, the switch 302 may send only the beacon message 309 including the minimum value of the data message barrier 306 to the egress link, and no longer send the beacon message 309 with the data message barrier 306 being greater than the minimum value, which can reduce the number of beacon messages to the downstream switch and/or the receiver host, thereby further saving control message overhead in the network and processing resources of the switch and the receiver host.

After receiving the data message 304 and the beacon message 309, the receiver host 303 sorts the data message 304 according to the timestamp 305 included in the data message 304, and the sorted data message 304 may be stored in the buffer queue 320 of the receiver host 303 first. The receiver host 303 also retrieves the data message barrier 306 from the beacon message 309 and determines the minimum value 319. The receiver host 303 may post the data message 304 based on the minimum value 319 of the data message barrier 306 it determines, i.e., only data messages 304 having a timestamp 305 that is less than the minimum value 319 of the data message barrier 306 are delivered.

In the example shown in the block diagram 300, the sending frequency of the beacon message 309 can be flexibly set according to actual needs, and the higher the generating and sending frequency of the beacon message 309 is, the more accurate the message full-sequence control information received by the receiving-end host 303 is, and the more timely the delivery of the data message 304 is. Of course, in view of saving the calculation resources of the initiator host 301 and the switch 302, the frequency of sending the beacon message 309 is reduced, and the effect of saving the calculation resources can be achieved.

Illustrative Environment C

As shown in fig. 4, a block diagram 400 depicting an example environment for a message ordering mechanism for a distributed system is shown. The block diagram 400 includes a plurality of initiator hosts 401, a plurality of switches 402, and a plurality of target hosts 403, wherein a plurality of links 407 are formed between the plurality of switches 402, and between the plurality of switches 402 and the plurality of initiator hosts 401 and the plurality of target hosts 403.

In block diagram 400, at the sending host 401, two types of messages are generated: one is data message 404 and the other is beacon message 409. Data message 404 includes timestamp 405, data message barrier 406, and data 408. The difference from the example in block diagram 300 is that data message 404 and beacon message 409 in block diagram 400 each include a data message barrier 406, and beacon message 409 supplements data message 404 in conveying data message barrier 406 to switch 402 and recipient host 403.

On the sending-end host 401, if the data message 404 is not generated and sent after a preset time interval, a beacon message 409 may be generated and sent to the link, so as to update the data message barrier 406 transmitted in the link, thereby avoiding the delivery delay of the receiving-end host 403 due to the reduction of the sending amount of the data message 404 of part of the sending-end host 401, and improving the delivery processing efficiency of the distributed system on the data message 404. Data message barrier 406 in beacon message 409 can be generated from the local physical time at which beacon message 409 was generated (equivalent to the timestamp of beacon message 409).

The switch 402 retrieves the data message barrier 406 from the received data message 404 and beacon message 409, determines the minimum value of the retrieved data message barrier 406, and then modifies the data message barrier 406 in its received data message 404 and beacon message 409 to equal the determined minimum value. The modified data message 404 is then sent from the egress link.

As a variation, after determining the minimum value of the data message barrier 406, the switch 402 may send only the beacon message 409 including the minimum value of the data message barrier 406 to the egress link, and no longer send the beacon message 409 with the data message barrier 406 being greater than the minimum value, which can reduce the number of beacon messages to the downstream switch and/or the receiver host, thereby further saving the overhead of control messages in the network and the processing resources of the switch and the receiver host.

The receiver host 403 receives the data message 404, and then sorts the messages according to the timestamp 405 included in the received data message 404, and the sorted data message 404 may be stored in the buffer queue 420 of the receiver host 403 first.

The receiver host 403 obtains the data message barrier 406 from the received data message 404 and the beacon message 409, determines a minimum value 419 of the data message barrier 406, and delivers the data message 404 having a timestamp 405 less than the minimum value 419.

Illustrative Environment D

The above example illustrates a basic scheme of how message ordering can be implemented in a distributed system. The method and the device also provide a solution for realizing the message complete sequence aiming at the situation of packet loss in the distributed system so as to reduce the influence of the packet loss problem on the realization of the message complete sequence.

As shown in FIG. 5, there is a block diagram 500 depicting a fifth example environment for a message ordering mechanism for a distributed system. Block diagram 500 includes a plurality of sending end hosts 501, a plurality of switches 502, and a plurality of receiving end hosts 503, where a plurality of links 507 are formed between the plurality of switches 502, and between the plurality of switches 502 and the plurality of sending end hosts 501 and the plurality of receiving end hosts 503.

The difference from the previous example is that in the example shown in the block diagram 500, a response barrier message 511 and a packet loss identifier 512 are introduced. In block diagram 500, the sender host 501 generates a data message 504, where the data message 504 includes a timestamp 505, a data message barrier 506, data 508, an acknowledgement message barrier 511, and a packet loss identification 512.

Packet loss detection and processing mechanism

As an example of a mechanism for detecting and processing packet loss, in a distributed system, a sending host 501 sends a data message 504 to a receiving host 503, where the data message 504 carries a message number, and the message number is continuously incremented. After receiving the data message 504, the receiving end host 503 sends a response message to the sending end host 501, where the response message carries the same message number as the data message 504, and after receiving the response message, the sending end host 501 confirms that the data message 504 has been received by the receiving end host 503.

If the receiving end host 503 finds that the message number is discontinuous in the process of continuously receiving the data message 504 sent by the sending end host 501, it is determined that a packet loss event occurs, and therefore, the receiving end host 503 will send a response message containing the message number before the discontinuous situation occurs to the sending end host 501 again. For example, the receiving host 503 has received the data message 504 with message numbers 1005 and 1007, and has previously sent a response message containing message number 1005 since the message with message number 1005 was normally received. When the receiving end host 503 finds that the message number is not continuous, it will send the response message with the message number 1005 again, so that the sending end host 501 knows that the packet loss occurs and what data message 504 the packet loss occurs.

In the case of a packet loss event, the sending-end host 501 may receive two response messages with the same message number (double response messages), or may not receive a response message within a preset time range (response message timeout). When the sending host 501 detects any situation, it determines that a packet loss event occurs, and retransmits the data message 504 in which the packet loss occurs.

When the receiving host 503 receives the retransmitted data message 504, it will recover to the state before the packet is lost and continue to send the response message. It should be noted that the receiving host 503 does not need to send the response message of the data message 504 with the packet loss again, but then sends the response message which is not sent in the order before the packet loss. For example, as in the previous example, the receiving host 503 has received the data message 504 with message numbers 1005 and 1007, and the message with message number 1005 is normally received, so that the receiving host 503 has already transmitted the response message with message number 1005, and after the receiving host 503 receives the data message 504 with message number 1007, it recognizes that the data message 504 with message number 1006 has lost packet, and transmits the response message with message number 1005 to the sending host 501 again, the sending host 501 retransmits the data message 504 with message number 1006, and after the receiving host 503 receives the data message 504 with message number 1006, the response message with message number 1007 is directly transmitted without transmitting the response message with message number 1006 to the sending host 501, as long as the sending host 501 receives the response message with message number 1007, it can be known that the data message 504 with message number 1007 and message number less than 1007 was received by the receiving host 503.

On the switch 502, whether a packet loss event occurs can be found out by a packet loss counter (packet loss counter), and the switch 502 can detect that the packet loss event occurs earlier than the sending-end host 501 and the receiving-end host 503. In addition, for a certain switch 502, if it detects that the packet loss flag 512 in the received data message 504 is set to be in the packet loss state, the switch 502 also considers that a packet loss event occurs on the link where the switch is located, and also considers that the switch is currently in the packet loss state.

Handling of acknowledgement message barriers

The response message barrier 511 means: data message 504 with timestamp 505 less than or equal to the reply message barrier 511 has been received by receiver host 503. The initial value of the reply message barrier 511 is the maximum value of the timestamp 505 in the data message 504 for which the reply message has been received on the sending host 501.

On switch 502, similar processing is performed on reply message barrier 511 in addition to hierarchical processing of data message barrier 506. I.e., receives data message 504 from the ingress link and obtains data message barrier 506 and reply message barrier 511 contained in data message 504. A minimum value of data message barrier 506 and a minimum value of acknowledgement message barrier 511 are determined, respectively, and then data message barrier 506 and acknowledgement message barrier 511 in received data message 504 are modified such that data message barrier 506 and acknowledgement message barrier 511 are equal to the respective determined minimum values. The modified data message 504 is then sent from the egress link.

Processing of packet loss identification

The packet loss identifier 512 is used to identify whether a packet loss state exists on the current link.

On the sending end host 501, if a packet loss event is detected, before the packet loss is not retransmitted and a response message returned by the receiving end host 503 is received, it is considered that the sending end host 501 is in a packet loss state, and in the packet loss state, in the data message 504 generated by the sending end host 501, an initial value of the packet loss identifier 512 is set to be in a packet loss state. If the packet loss has been retransmitted and a response message returned from the receiving host 503 is received, and a new packet loss event is not detected after the last packet loss, it is considered that the packet loss state disappears, and when the sending host 501 generates a new data message 504, the initial value of the packet loss flag 512 is set to be in a non-packet loss state.

It should be noted that, as another implementation example, since the switch 502 finds a packet loss event before the sending-end host 501 and the receiving-end host 503, the processing of the packet loss identifier 512 may also be completed only by the switch 502, and the sending-end host 501 does not perform any processing on the packet loss identifier 512, and may set the packet loss identifier 512 in the sent data message 504 to be in a non-packet loss state all the time.

On the switch 502, if a packet loss event is detected, recording a minimum value of the current data message barrier 506 (if a plurality of packet losses are detected, the recorded minimum value is continuously updated), and before the minimum value of the response message barrier 511 determined by the switch 502 does not reach the recorded minimum value of the data message barrier 506, the switch 502 is considered to be in a packet loss state, and in the packet loss state, the switch 502 sets the packet loss flag 512 in the received data message 504 to be in the packet loss state. When the minimum value of the response message barrier 511 determined by the switch 502 reaches the minimum value of the recorded data message barrier 506, it is considered that the packet loss state disappears, and then the packet loss flag 512 in the data message 504 is not set.

On the receiver host 503, after receiving the data message 504 from the ingress link, the messages are sorted according to the timestamp 505 included in the received data message 504, and the sorted data message 504 may be stored in the buffer queue 520 of the receiver host 503 first.

Receiver host 503 obtains data message barrier 506 and acknowledgement message barrier 511 from received data message 504 and determines the minimum of data message barrier 506 and acknowledgement message barrier 511, respectively.

The receiving end host 503 determines the packet loss identifier 512 included in the data message 504, and executes different message delivery processes according to the determination result.

If it is detected that the packet loss flag 512 in the undelivered data message 504 is in a packet loss state, the minimum value of the current data message barrier 506 is recorded (if it is detected that multiple packets are lost, the recorded minimum value is continuously updated), before the minimum value of the response message barrier 511 determined by the receiver host 503 does not reach the recorded minimum value of the data message barrier 506, the receiver host 503 is considered to be in the packet loss state, and in the packet loss state, the receiver host 503 delivers the data message 504 with the minimum value 518 of the response message barrier 511 as a reference, that is, delivers the data message 504 with the timestamp 505 smaller than the minimum value 518 of the response message barrier 511.

When the minimum value of the response message barrier 511 determined by the receiver host 503 reaches the minimum value of the recorded data message barrier 506, it is determined that the packet loss state disappears, and then the receiver host 503 delivers the data message 504 with reference to the minimum value 519 of the data message barrier 506, that is, delivers the data message 504 whose timestamp 505 is smaller than the minimum value 519 of the data message barrier 506.

In the example of the block diagram 500, the data message barrier 506, the response message barrier 511, and the packet loss identifier 512, which are set in the data message 504, constitute high-reliability full-sequence control information, so that when a packet loss occurs, the packet loss can be detected in time, and each receiving end host 503 can switch the message barrier from the data message barrier 506 to the response message barrier 511, thereby implementing a high-reliability processing mechanism for the full-sequence of messages, and when no packet loss occurs, the receiving end host 503 still uses the data message barrier 506 as a processing reference for message delivery, thereby ensuring the message delivery efficiency of the distributed system under normal conditions.

Illustrative Environment E

Based on the block diagram 500, the message full-order control mechanism may also be modified in terms of message structure as follows.

As a variation example, as shown in FIG. 6, there is a block diagram 600 depicting one of the variations of the message structure in an example environment of a message ordering mechanism for a distributed system. The difference from the message structure of the example of block diagram 500 is that in block diagram 600 there are two types of messages, one being a data message 604 and the other being a beacon message 609. The data message 604 includes a timestamp 605, a data message barrier 606, data 608, and a packet loss indicator 612. The beacon message 609 includes an acknowledgement message barrier 611.

The same is true for the processing mechanisms for the packet loss identification 612, the data message barrier 606, and the acknowledgement message barrier 611, as is the case for the example of block diagram 500. In the example shown in block diagram 600, beacon message 609 carrying acknowledgement message barrier 611 is added to separate the delivery of full-order control information from the transmission of data message 604. Beacon message 609 may be generated and transmitted at intervals similar to the manner in which beacon message 309 is transmitted in block diagram 300. Beacon message 609 is equivalently inserted at regular intervals into the data message stream formed by data message 604.

As another variation example, as shown in FIG. 7, there is a block diagram 700 depicting a second variation of the message structure in an example environment of a message ordering mechanism for a distributed system. The difference from the block diagram 600 is that the data message barrier 606 and the response message barrier 611 are respectively carried by two beacon messages, and for distinguishing, the respective messages are numbered again, the data message 701 includes a timestamp 605, data 608, and a packet loss identifier 612, the beacon message carrying the data message barrier 606 is referred to as a first beacon message 702, and the response message barrier 611 is referred to as a second beacon message 703. The first beacon message 702 and the second beacon message 703 may be generated and transmitted at certain time intervals. The first beacon message 702 and the second beacon message 703 are equivalently inserted at regular intervals into the data message stream formed by the data message 604.

As yet another variation, as shown in FIG. 8, there is a block diagram 800 depicting a third variation of the message structure in an example environment of a message ordering mechanism for a distributed system. In block 800, a data message 801 includes a timestamp 605, data 608, and a packet loss identifier 612, and a beacon message 802 includes a data message barrier 606 and a response message barrier 611. In the message structure example of block 800, the full-order control information is transmitted in a beacon message 802, which is processed separately from the data message 801 carrying the data 608, so that the transmission of the data message 801 is less affected.

As yet another variation, as shown in FIG. 9, which is a block diagram 900 depicting four variations of message structures in an example environment of a message ordering mechanism for a distributed system. In the block diagram 900, the data message 901 includes a timestamp 605, a data message barrier 606, data 608, a response message barrier 611, and a packet loss identifier 612, and the beacon message 902 includes the data message barrier 606, the response message barrier 611, and the packet loss identifier 612.

The beacon message 902 in the block diagram 900 is used as a supplement of the data message 901 in transmitting the full-sequence control information to the switch 602 and the receiving end host 603, and when the sending end host 601 does not generate and send the data message 901 after a preset time interval, the beacon message 902 may be generated and sent to the link, so as to update the data message barrier 606, the response message barrier 611, and the packet loss identifier 612 transmitted in the link, so that the processing can avoid the delivery delay of the receiving end host 603 caused by the reduction of the transmission amount of the data message 901 of a part of the sending end host 601, and improve the delivery processing efficiency of the distributed system on the data message 901.

It should be noted that, because the value of the packet loss flag 612 is a logical value, it does not involve processing such as determining a minimum value, and even after a preset time interval, part of the receiving end hosts 603 does not send a message carrying the packet loss flag 612, and the whole message full-order mechanism is not affected. Thus, as a variation of the message structure of block diagram 900, the packet loss identifier 612 may be included in the data message 901 only, and the beacon message 902 includes the data message barrier 606 and the response message barrier 611.

Furthermore, as a variation, for the beacon message 609 including the response message barrier 611 in the block diagram 600, the first beacon message 702 including the data message barrier 606 and the second beacon message 703 including the response message barrier 611 in the block diagram 700, the beacon message 802 including the data message barrier 606 and the response message barrier 611 in the block diagram 800, and the beacon message 902 including the data message barrier 606 and the response message barrier 611 in the block diagram 900, the switch in each block diagram, after determining the minimum value of the data message barrier 606 and/or the minimum value of the response message barrier 611, may send only the beacon message including the minimum value of the data message barrier 606 and/or the minimum value of the response message barrier 611 to the egress link, and no longer send the beacon message whose data message barrier 606 and/or response message barrier 611 is greater than the minimum value, such an approach can reduce the number of beacon messages to downstream switches and/or receiver hosts, which can further save control message overhead and processing resources of the switches and receiver hosts in the network.

Illustrative Environment F

In the foregoing illustrative environment, how to implement a processing mechanism for message full-order and how to implement a message full-order processing mechanism with high reliability for a packet loss situation of a distributed system are introduced. The following describes a technical scheme designed for improving the delivery efficiency of data messages in a distributed system with respect to link delay.

As shown in fig. 10, a block diagram 1000 depicting six of an example environment for a message ordering mechanism for a distributed system. As shown in FIG. 11, a block diagram 1100 depicting a seventh example environment for a message ordering mechanism for a distributed system is presented. As shown in fig. 12, there is a block diagram 1200 depicting an eighth example environment for a message ordering mechanism for a distributed system.

In the examples shown in the block diagrams 1000, 1100, and 1200, the technical solutions are designed for link delay that may exist between a sending end host and a receiving end host in a distributed system, and therefore, switches are omitted in the block diagrams. Each link shown in the block diagram is a link formed between a sending end host and a receiving end host, and a switch passing through the link is regarded as a part of the link. In the block diagram, only the message transmission between two initiator hosts S1 and S2 and two target hosts R1 and R2 is taken as an example for explanation, and in an actual distributed system, any number of initiator hosts and target hosts may be used.

In a distributed system, the link delay from each sending end host to each receiving end host may be different, and data messages with the same timestamp sent from each sending end host have larger time deviation when arriving at the receiving end host due to different link delays. In this disclosure, it is assumed that the various host physical clocks are synchronized or within a reasonable range of error.

As shown in the block diagram 1000, the initiator host S1 transmits a message M10 to the recipient hosts R1 and R2 at time 0 (in this case, the local physical time TC of the message M10 is 0, and the timestamp TM is 0), and the initiator host S2 also transmits a message M20 to the recipient hosts R1 and R2 at time 0 (in this case, the local physical time TC of the message M20 is 0, and the timestamp TM is 0). The number in the double circle in the figure represents the local physical time TC corresponding to the message, which may represent the local physical time at which the message is sent or the local physical time at which the message is received, depending on the state in which the message is transmitted. The number in the double block represents the timestamp TM of the message, which in block 1000 has a value equal to the local physical time at which the message was sent.

The link delay of the link L11 between the sender host S1 and the receiver host R1 is 10, the link delay of the link L12 between the sender host S1 and the receiver host R2 is 10, the link delay of the link L21 between the sender host S2 and the receiver host R1 is 1, and the link delay of the link L22 between the sender host S2 and the receiver host R2 is 5. The figure shows one-way link delay, and the value of RTT (Round-Trip Time) is twice the one-way link delay.

After the message M10 arrives at the receiver host R1, it is denoted as message M11, and due to the effect of the link delay of the link L11, the local physical time corresponding to the message M11 is 10. After the message M20 arrives at the receiver host R1, it is denoted as message M21, and due to the effect of the link delay of the link L21, the local physical time corresponding to the message M21 is 1. It can be seen from the difference of local physical time that the messages M11 and M21 are sent from the sending end hosts S1 and S2 at the same time, but when reaching the receiving end host R1, the message M11 is delayed by 9 time units compared with the message M21, because the timestamps of the message M11 and the message M21 are the same, the message M11 and the message M21 need to be delivered at the same time according to the requirement of the full order of the messages, therefore, in the receiving end host R1, after receiving the message M21, it needs to wait 9 time units to deliver the message M21 and the message M11, that is, the delivery delay DL1 is 9.

Similarly, the message M10 is denoted as message M12 after reaching the receiver host R2, and the local physical time corresponding to the message M12 is 10 due to the influence of the link delay of the link L12. After the message M20 arrives at the receiver host R2, it is denoted as message M22, and due to the effect of the link delay of the link L22, the local physical time corresponding to the message M22 is 5. It can be seen from the difference of local physical time that the messages M12 and M22 are sent from the sending end hosts S1 and S2 at the same time, but when arriving at the receiving end host R2, the message M12 is delayed by 5 time units compared with the message M22, because the timestamps of the message M12 and the message M22 are the same, the message M12 and the message M22 need to be delivered at the same time according to the requirement of the full order of the messages, therefore, in the receiving end host R2, after receiving the message M22, it needs to wait for 5 time units to deliver the message M22 and the message M12, that is, the delivery delay DL2 is 5.

Aiming at the message delivery waiting phenomenon, the mechanism for adjusting the timestamp is provided by the disclosure, and the timestamp of the sending end host is adjusted, so that the messages with approximate timestamps sent by different sending end hosts are close to each other in local time when arriving at the receiving end, and the message delivery waiting of the receiving end host after message sequencing can be reduced. The adjustment of the timestamp is implemented by sending a timestamp adjustment message carrying the timestamp between the sending end host and the receiving end host, and the following three stages are used to illustrate a technical solution of timestamp adjustment and an adjustment result.

Stage one: forward aggregation (taking timestamp maximum)

The method comprises the steps that a plurality of sending end hosts respectively send first time adjustment messages to a plurality of receiving end hosts, wherein the first time adjustment messages comprise first time stamps, and the first time stamps are local physical time when the first time adjustment messages are sent.

After receiving the first time adjustment messages sent by the multiple hosts, the receiving end host adjusts the timestamps in the first time messages, so that the local physical time when the received first time adjustment messages reach the receiving end host is the same, namely the local physical time when the first time adjustment messages are received is aligned. A maximum value in the local physical time-aligned first time stamp is then determined.

And generating a second time adjustment message, including the maximum value of the first time stamp as a second time stamp in the second time adjustment message, and transmitting the second time adjustment message to the plurality of sender hosts.

The above-described process is described below in connection with the exemplary environment of block diagram 1000. Messages M10 and M20 (shown as messages M11, M12, M21, and M22 after arriving at the receiver host) are considered first time adjustment messages. The timestamps contained in messages M10 and M20 are considered to be first timestamps. Messages M10 and M20 with a timestamp of 0 are sent from the initiator hosts S1 and S2 to the recipient hosts R1 and R2 at time 0.

Processing by receiver host R1

The local physical time of arrival of the message M11 at the receiver host R1 is 10, and the corresponding local physical time of arrival of the message M21 is 1. The receiver host R1 aligns the messages M11 and M21, adjusts the message M21 based on the local physical time of arrival of the message M11, generates a message M21 ', and has a local physical time corresponding to the message M21' of 10 and a timestamp of 9. Message M21 ' is a virtual message, and assuming that at sender host S2, if message M21 ' with timestamp 9 is sent to receiver host R1 at time 9, message M21 ' will arrive at receiver host R1 at the same time as message M11. The message M21' actually contains information of the difference in link delay between link L11 and link L21. Of course, the message M11 may be adjusted based on the message M21, and the principle is the same.

The maximum value of the time stamp among the message M11 and the message M21' is determined. The maximum value of the time stamp is 9 as can be seen by comparison. Then, the receiver host R1 generates the message Mmax1 as the second time adjustment message, and transmits the maximum value of the determined time stamps as the time stamp of the message Mmax1 to the sender hosts S1 and S2, respectively. Since the processing time for determining the maximum value of the timestamp is negligible and short, the local physical time of transmission of message Mmax1 is considered to coincide with the local physical time of reception of message M11, i.e., the local physical time of transmission of message Mmax1 is 10.

Processing by receiver host R2

The local physical time for message M12 to reach the receiver host R2 is 10, and the local physical time for message M22 to reach is 5. The receiver host R2 aligns the messages M12 and M22, adjusts the message M22 based on the local physical time of arrival of the message M12, and generates a message M22 ', the local physical time corresponding to the message M22' is 10, and the timestamp is 5. Message M22 ' is a virtual message, and assuming that at sender host S2, if message M22 ' with timestamp 5 is sent to receiver host R2 at time 5, then message M22 ' will arrive at receiver host R2 at the same time as message M12. The message M22' actually contains information of the difference in link delay between link L21 and link L22. Of course, the message M12 may be adjusted with the message M22 as a reference, and the principle is the same.

The maximum value of the time stamp among the message M22' and the message M12 is determined. The maximum value of the time stamp is 5 as can be seen by comparison. Then, the receiver host R2 generates the message Mmax2 as the second time adjustment message, and transmits the maximum value of the determined time stamps as the time stamp of the message Mmax2 to the sender hosts S1 and S2, respectively. Since the processing time for determining the maximum value of the timestamp is negligible and short, the local physical time of transmission of message Mmax2 is considered to coincide with the local physical time of reception of message M12, i.e., the local physical time of transmission of message Mmax2 is 10.

And a second stage: backward aggregation (taking the minimum of the difference between the timestamp and the local physical time) and RTT compensation

In the first stage, the receiving end host sends second time adjustment messages to the sending end host respectively, and after receiving the second time adjustment messages, the sending end host performs round trip delay (RTT) compensation on a second timestamp in the second time adjustment messages to generate a third timestamp.

And the receiving end host calculates a first difference between the local physical time of receiving the second time adjustment message and a third timestamp corresponding to the second time adjustment message, acquires the minimum value of the first difference, and then takes the minimum value as a timestamp adjustment value of a subsequent sending message. The specific adjustment mode is that when the message is generated, the timestamp adjustment value is added to the local physical time to be used as the timestamp of the message.

The above-described process is described below in connection with the exemplary environment of block diagram 1100. In phase one, the receiver hosts R1 and R2 generate messages Mmax1 and Mmax2 and send to the sender hosts S1 and S2, respectively. For ease of illustration, in block 1100, messages Mmax1 and Mmax2 generated in phase one are renumbered to message M30 and message M40. The message M30 and the message M40 are sent as second time adjustment messages to the sender hosts S1 and S2, respectively, and for convenience of illustration, after the message M30 and the message M40 reach the sender hosts S1 and S2, they are denoted as message M31, message M41, message M32, and message M42.

Processing by initiator host S1

The local physical time when the sender host S1 received both message M31 and message M41 is 20. After compensating the RTT values at the time stamps of the respective messages, a message M31 '(time stamp 29) and a message M41' (time stamp 25) are generated.

On the sender S1, the difference between the local physical time and the timestamp of the message M31 'and the message M41' is calculated, the difference between the local physical time and the timestamp of the message M31 'is 9, the difference between the local physical time and the timestamp of the message M41' is 5, the minimum difference is 5 by comparison, and the minimum difference is taken as the adjustment value of the timestamp of the future message. In the diagram 1100, a virtual message Mmin1 is used to represent the new message generated based on the adjustment value, and the message Mmin1 represents the following: when a message is sent in the future, the minimum difference 5 calculated above can be added to the local physical time to generate the timestamp. Since message Mmin1 represents a new round of messages, the local physical time is denoted by 0.

Processing by initiator host S2

The sender host S2 receives the local physical times of message M32 and message M42, 11 and 15, respectively. After compensating the RTT values at the time stamps of the respective messages, a message M32 '(time stamp 11) and a message M42' (time stamp 15) are generated.

On the sender host S2, the difference between the local physical time and the timestamps of the message M32 'and the message M42' is calculated, the difference between the local physical time and the timestamp of the message M32 'is 0, the difference between the local physical time and the timestamp of the message M42' is 0, the minimum difference is 0, and the minimum difference is taken as the adjustment value of the timestamp of the future message. In block 1100, a new message generated based on the adjusted value is represented by a virtual message Mmin2, and since the minimum difference is 0, the timestamp does not need to be adjusted in the next round of sending the message, but the local physical time when the message was generated is still used as the timestamp.

And a third stage: delay condition after timestamp adjustment

The adjustment process of the timestamp is completed once through the processing of the first stage and the second stage, and the process of the third stage is mainly used for explaining the improvement result of the delivery delay after the timestamp adjustment is performed.

The delay condition after timestamp adjustment is described below in conjunction with the exemplary environment in block diagram 1200. In a new round of message transmission, the timestamp of the new message is adjusted at the initiator hosts S1 and S2 according to the timestamp adjustment value determined at stage two, wherein the timestamp adjustment value of the initiator S1 is 5 and the timestamp adjustment value of the initiator S2 is 0. For ease of illustration, in block 1200, the new round of messages to be sent are numbered as message M50 and message M60. The message M50 is sent by the sender S1 at a time when the local physical time is 0, the adjusted timestamp is 5, the message M60 is sent by the sender S2 at a time when the local physical time is 0, and the timestamp is 0.

Latency condition of receiver host R1

After the message M50 and the message M60 arrive at the receiver host R1, they are denoted as message M51 and message 61. The local physical time of the message M51 is 10, the timestamp is 5, the local physical time of the message M61 is 1, and the timestamp is 0. Messages with the same timestamp need to be delivered simultaneously, as required by the full order of the messages. To compare the latency of message delivery, message M51 and message M61 are transformed so that their timestamps are the same. For example, the message M61 is transformed with reference to the message M51 to generate a message M61', the transformed timestamp is 5, and the transformed local physical time to the receiver host R1 is 6. According to the full-order message mechanism, the message M61 'and the message M51 are delivered simultaneously, since the difference between the local physical times of the message M61' and the message M51 reaching the receiver R1 is 4, i.e. the delivery delay DL3 is 4. The delivery delay is shortened by 5 time units compared to the situation in block diagram 1000 where the delivery delay DL1 is 9.

Latency condition of receiver host R2

Similarly, message M50 and message M60 arrive at the receiver host R2 and are denoted as message M52 and message 62. The local physical time of the message M52 is 10, the timestamp is 5, the local physical time of the message M62 is 5, and the timestamp is 0. The message M62 is transformed with the message M52 as a reference to generate a message M62', the transformed timestamp is 5, and the transformed local physical time to the receiver host R2 is 10. According to the full-order message mechanism, the message M62 'and the message M52 are delivered simultaneously, the difference between the local physical time of the message M62' and the local physical time of the message M52 reaching the receiver R2 is 0, that is, the delivery delay DL4 is 0. The delivery delay is shortened by 5 time units compared to the case of delivery delay DL2 of 5 in block diagram 1000.

Based on the timestamp adjustment mechanism, the timestamp adjustment value required to be adjusted by each sending end host is calculated, and the timestamp adjustment value is determined according to the link delay condition of the whole distributed system, so that the time for the messages with the same timestamp to reach the receiving end host is as close as possible under the message full-sequence mechanism, and the message delivery delay of each receiving end host is reduced.

In the above-mentioned block diagrams 1000, 1100, and 1200, although the timestamp adjustment mechanism is exemplarily illustrated with two sending end hosts and two receiving end hosts, it is obvious that the timestamp adjustment mechanism is also applicable to a distributed system formed by other numbers of sending end hosts and receiving end hosts, and the adjustment principle is the same.

In addition, the information carried in the first time adjustment message and the second time adjustment message may also be carried by the data message and/or the beacon message in the foregoing example. Some information fields may be set in the data message or the beacon message for conveying the maximum value of the time stamp determined in phase one and phase two and the minimum value of the difference between the time stamp and the local physical time.

In addition, in the above example, only the processing procedure of performing a round of message interaction to perform timestamp adjustment is shown, and actually, since the link delay may be unstable and the frequency of message transmission on each link is different in different time periods from the whole distributed system, the timestamp adjustment processing described above may be performed for a plurality of rounds, may be performed periodically, or may be dynamically adjusted along with the transmission of data messages.

In addition, the numerical values relating to time in the block diagrams are merely for illustration, and therefore, no units are given, and in an actual distributed system, the numerical values such as the link delay, the local physical time, and the time stamp may be in units of hours, minutes, seconds, microseconds, and milliseconds, or may be encoded based on these time units.

Switch specific implementation examples

The distributed system can be used as a Data Center (Data Center) which comprises a plurality of hosts, and the hosts are connected through a plurality of switches to transmit messages among the hosts. The switch in the present disclosure may be a programmable switch, and the data message and the beacon message are processed by performing a program configuration on the switch and executing a corresponding program configuration by a processor in the switch. Of course, the processing logic for the data message and the beacon message in the switch may also be implemented by a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), an Application Specific Standard Product (ASSP), a System On Chip (SOC), a Complex Programmable Logic Device (CPLD), and the like.

Switch implementation example

As shown in fig. 13, which is a block diagram 1300 illustrating the structure of one example of an implementation of a switch that implements a message ordering mechanism. The switch 1301 shown in block diagram 1300 is configured to implement the message reordering mechanism in the distributed system shown in block diagram 100 and block diagram 200 described previously. The message structure in block 1300 may be the same as the message structure shown in block 100 and block 200, wherein the data message contains a timestamp, a data message barrier, and data. For simplicity of illustration, the timestamp and data are not shown in the message structure in block 1300. The switch 1301 includes: a message receiving port 1302, a processor 1303 and a message sending port 1304.

A message receiving port 1302 for receiving data messages from at least one ingress link. As shown in the block diagram 1300, the message receiving port 1302 receives n data messages 13051 to 1305n, where each data message 13051 to 1305n includes a data message barrier 13061 to 1306n, respectively, where n is a positive integer greater than 1.

The processor 1303 is configured to determine a minimum value 1307 of data message barriers 13061-1306 n included in the received data messages 13051-1305 n, and modify all data message barriers 13061-1306 n in the data messages 13051-1305 n to the minimum value 1307 of the data message barriers. The modified data messages are represented as data messages 13051 'to 1305 n'. Processor 1303 is also configured to perform routing control processing of data messages 13051 'to 1305 n' to send data messages 13051 'to 1305 n' to the next hop switch or host.

A messaging port 1304 for sending data messages 13051 'to 1305 n' to at least one egress link.

Switch second implementation example

As another example, as shown in fig. 14, it is a block diagram 1400 of a structure of a second implementation example of a switch implementing a message full-order mechanism. Switch 1401 shown in block diagram 1400 is configured to implement the message reordering mechanism previously described in the distributed system shown in block diagram 300. The message structure in block 1400 may be the same as that shown in block 300, where the data message contains a timestamp and data, and the beacon message contains a data message barrier. For simplicity of illustration, the timestamp and data are not shown in the message structure in block 1400. The switch 1401 includes: a message receiving port 1402, a first processor 1403, a second processor 1404, and a message sending port 1405.

A message receiving port 1402 for receiving data messages as well as beacon messages from at least one ingress link. As shown in block 1400, a message receiving port 1402 receives n data messages 14081 to 1408n and m beacon messages 14061 to 1406 m. The beacon messages 14061-1406 m respectively comprise data message barriers 14071-1407 m, wherein m and n are positive integers which are more than 1. The beacon messages 14061-1406 m are inserted into the data message stream formed by the data messages 14081-1408 n at regular intervals, so that m can be smaller than n, the larger the number of m, the more precise the data message barrier can perform the function of message full-sequence control, and the beacon messages 14061-1406 m can be inserted into the data message stream formed by the data messages 14081-1408 n at regular intervals.

The message receiving port 1402 splits the data message and the beacon message, sends the data messages 14081-1408 n to the first processor 1403 for processing, and sends the beacon messages 14061-1406 m to the second processor 1404 for processing.

The first processor 1403 can be the main processor of the switch, whose main function is to perform routing control processing on the data messages 14081 to 1408n and to send the processed data messages 14081 to 1408n to the messaging port 1405.

And the second processor 1404 is used for carrying out routing control processing on the beacon messages 14061-1406 m, determining the minimum value 1409 of the data message barriers 14071-1407 m contained in the beacon messages 14061-1406 m, and modifying the data message barriers 14071-1407 m in the beacon messages 14061-1406 m into the minimum value 1409 of the data message barriers. The modified beacon message is represented as beacon messages 14061 'to 1406 m' and is transmitted 14061 'to 1406 m' to messaging port 1405. The second processor 1404 may act as an auxiliary processor dedicated to processing beacon messages in connection with full-order control of messages.

A messaging port 1405 for transmitting data messages 14081 to 1408n and beacon messages 14061 'to 1406 m' to at least one egress link.

In the switch structure shown in block 1400, the message barrier information is separated from the data message, and is carried by the beacon message, and a second processor other than the main processor is provided to perform the related processing of the data message barrier, so that the forwarding of the data message can be separated from the processing of the message full-order control information, and the normal forwarding of the data message is not affected.

As a variation, the message structure shown in block 300 and block 1400 (data message containing timestamp and data, and beacon message containing data message barrier) may also be processed through the switch structure in block 1300. That is, the processing of the first processor 1403 and the second processor 1404 in the block diagram 1400 is performed by the processor 1303 in the block diagram 1300.

Third example of switch implementation

As yet another example, as shown in fig. 15, it is a block diagram 1500 illustrating a third structure of an implementation example of a switch implementing a message full-order mechanism. The switch 1501 shown in block diagram 1500 is configured to implement the message reordering mechanism previously described in the distributed system shown in block diagram 400. The message structure in block diagram 1500 may be the same as the message structure shown in block diagram 400, where the data message contains a timestamp, data, and a data message barrier and the beacon message contains a data message barrier. For simplicity of illustration, the timestamp and data are not shown in the message structure in block diagram 1500. The switch 1501 includes: a message receiving port 1502, a processor 1503 and a message sending port 1504.

A message receive port 1502 for receiving data messages as well as beacon messages from at least one ingress link. As shown in block diagram 1500, a message receiving port 1502 receives n data messages 15081-1508 n and m beacon messages 15061-1506 m. The data messages 15081-1508 n respectively comprise data message barriers 15051-1505 n, and the beacon messages 15061-1506 m respectively comprise data message barriers 15071-1507 m, wherein m and n are positive integers greater than 1, and m can be smaller than n. The difference from block diagram 1400 is that beacon messages 15061-1506 m are generated in addition to data messages 15081-1508 n in carrying the data message barrier. At the sending host, if no data message is generated after a preset time interval, a beacon message is generated to convey the current data message barrier. Therefore, the beacon messages 15061 to 1506m are also inserted into the data message stream formed by the data messages 15081 to 1508n, but the distribution of the beacon messages 15061 to 1506m is not regular and depends on the continuity of the data messages 15081 to 1508 n.

The message receiving port 1502 sends the data messages 15081-1508 n and the beacon messages 15061-1506 m to the processor 1503 for processing.

The processor 1503 performs routing control processing on the data messages 15081 to 1508n and the beacon messages 15061 to 1506m, determines the minimum value 1509 of the data message barriers 15051 to 1505n and the data message barriers 15071 to 1507m contained in the data messages 15081 to 1508n and the beacon messages 15061 to 1506m, and modifies the data message barriers in the data messages 15081 to 1508n and the beacon messages 15061 to 1506m to the minimum value 1509 of the data message barriers. The modified data messages and beacon messages are represented as beacon messages 15061 'to 1506 m' and data messages 15081 'to 1508 n'. The processor 1503 sends beacon messages 15061 'to 1506 m' and data messages 15081 'to 1508 n' to the message sending port 1504.

A messaging port 1504 for transmitting beacon messages 15061 '1506 m' and data messages 15081 '1508 n' to at least one egress link.

Switch implementation example four

As yet another example, as shown in fig. 16, it is a block diagram 1600 of the structure of the fourth implementation example of a switch implementing the message full-order mechanism. The switch 1601 shown in block 1600 is configured to implement the message reordering mechanism previously described in the distributed system shown in block 500. The message structure in block 1600 may be the same as that shown in block 500, where the data message contains a timestamp, data, a data message barrier, a packet loss identification, and a reply message barrier. For simplicity of illustration, the timestamp and data are not shown in the message structure in block 1600. In the message structure of block diagram 1600, a response message barrier and a packet loss identifier are added, and accordingly, in the processing of the switch, a processing mechanism for detecting packet loss and for detecting the response message barrier and the packet loss identifier is also added. Wherein, switch 1601 includes: message receiving port 1602, processor 1603, and message sending port 1604.

A message receiving port 1602 for receiving data messages from at least one ingress link. As shown in block 1600, a message receiving port 1602 receives n data messages 16051-1605 n. The data messages 16051-1605 n include data message barriers 16061-1606 n, response message barriers 16081-1608 n, and packet loss marks 16071-1607 n, respectively. The message receiving port 1602 sends the data messages 16051-1605 n to the processor 1603 for processing.

The processor 1603 performs routing control processing on the data messages 16051 to 1605n, respectively determines a minimum value 1609 of data message barriers 16061 to 1606n and a minimum value 1610 of response message barriers 16081 to 1608n which are included in the data messages 16051 to 1605n, and modifies the data message barriers and the response message barriers in the data messages 16051 to 1605n to be respectively corresponding minimum values. The modified data messages are represented as data messages 16051 'to 1605 n'.

The processor 1603 also performs packet loss detection. The processor 1603 may discover whether a packet loss occurs on an ingress link between the previous-hop switch and the switch through a packet loss counter (packet loss counter) arranged on the switch. In addition, if the processor 1603 detects that any one of the packet loss identifiers 16071 to 1607n in the data messages 16051 to 1605n is set to be in a packet loss state, it is also determined that a packet loss event occurs on the link where the switch 1601 is located, and it is determined that the link is in the packet loss state.

If the packet loss state is detected, recording a minimum value 1609 of the current data message barrier (if multiple packet losses are detected, the recorded minimum value is continuously updated), determining that the data message barrier is in the packet loss state before the minimum value 1610 of the response message barrier does not reach the minimum value 1609 of the recorded data message barrier, and setting packet loss marks 16071-1607 n in the received data messages 16051-1605 n to be in the packet loss state by the processor 1603 under the packet loss state. When the determined minimum value 1610 of the response message barrier reaches the minimum value 1609 of the recorded data message barrier, it is determined that the packet loss state is disappeared, the packet loss marks 16071-1607 n in the data messages 16051-1605 n are not set, that is, in the non-packet loss state, the packet loss marks 16071-1607 n in the data messages 16051 'to 1605 n' are the same as the packet loss marks 16071 'to 1607 n' in the data messages 16051 to 1605 n. It should be noted that before the packet loss state does not disappear, the processor 1603 sets all packet loss identifiers in the data message received by the processor 1603 to be the packet loss state. Processor 1603 sends data messages 16051 'to 1605 n' to messaging port 1604.

A messaging port 1604 for sending data messages 16051 '-1605 n' to at least one egress link.

The message structure in block 1600 may have several variations as follows:

one of the message structure variations of block diagram 1600

Corresponding to the message structure illustrated by block diagram 600, there are two types of messages, one being data messages and the other being beacon messages. The data message comprises a timestamp, a data message barrier, data and a packet loss identifier. The beacon message includes a response message barrier. In the message structure of block 600, the reply message barrier is separated into separate beacon messages for carrying.

Accordingly, the processor 1603 of the switch 1601 receives the two messages, performs corresponding routing control processing, and performs processing on the data message barrier, the packet loss identifier, and the response message barrier in the message, respectively, which are the same as the processing procedure detailed in the aforementioned block 1600 in technical principle.

Message Structure variation of Block 1600

Corresponding to the message structure illustrated by block diagram 700, the data message barrier and the reply message barrier are carried by two beacon messages, respectively. In the message structure of block 700, there are three types of messages: the data message comprises a timestamp, data and a packet loss identifier; a first beacon message comprising a data message barrier; a second beacon message comprising an acknowledgement message barrier. The first beacon message and the second beacon message are inserted at intervals into a data message stream formed by the data messages.

Accordingly, the processor 1603 of the switch 1601 receives the three messages, performs corresponding routing control processing, and performs processing on the data message barrier, the packet loss identifier, and the response message barrier in the three messages, respectively, which are the same as the processing procedure detailed in the aforementioned block 1600 in technical principle.

Message deformation structure III of block diagram 1600

Corresponding to the message structure illustrated by block diagram 800, the data message barrier and the reply message barrier are carried by one beacon message. In the message structure of block 800, there are two types of messages: a data message comprising a timestamp, data, and a packet loss flag; a beacon message comprising a data message barrier and an acknowledgement message barrier. The beacon messages are inserted at regular intervals into a data message stream formed by the data messages.

Accordingly, the processor 1603 of the switch 1601 receives the two messages, performs corresponding routing control processing, and performs processing on the data message barrier, the packet loss identifier, and the response message barrier in the two messages, respectively, which are the same as the processing procedure detailed in the aforementioned block 1600 in technical principle.

Fourth of message deformation structure of block diagram 1600

Corresponding to the message structure illustrated in block diagram 900, the data message barrier and the reply message barrier are carried together with the packet loss identification by the data message and the beacon message. In the message structure of block 900, there are two types of messages: a data message comprising a timestamp, data, a data message barrier, a reply message barrier, and a packet loss identification; and the beacon message comprises a data message barrier, a response message barrier and a packet loss identifier. In terms of transmitting full-sequence control information, the beacon message is used as a supplement to the data message, and after a preset time interval, if the data message is not generated and transmitted by the sending end host, the beacon message is generated and transmitted to the link. From the perspective of the switch, the beacon message is equivalent to being inserted into a data message stream formed by the data messages, except that the distribution of the beacon message is irregular and depends on the continuity of the data messages.

In addition, because the value of the packet loss identifier is a logical value, it does not involve the processing such as determining the minimum value, even after the preset time interval, part of the receiving end hosts do not send the message carrying the packet loss identifier, and the whole message full-sequence mechanism is not affected. Therefore, as a variation of the message structure, the packet loss identifier may be included in the data message only, and the beacon message includes a data message barrier and a response message barrier.

Five examples of switch implementations

As yet another example, as shown in fig. 17, it is a block diagram 1700 illustrating a structure of a fifth implementation example of a switch implementing a message full-order mechanism. The switch 1701 shown in block 1700 is configured to implement the message reordering mechanism previously described in the distributed system shown in block 800. The message structure in block 1700 may be the same as the message structure shown in block 800, with two types of messages: a data message comprising a timestamp, data, and a packet loss flag; a beacon message comprising a data message barrier and an acknowledgement message barrier. For simplicity of illustration, the timestamp and data are not shown in the message structure in block 1700. The switch 1701 includes: a message receiving port 1702, a first processor 1703, a second processor 1704, and a message sending port 1705.

A message receiving port 1702 for receiving data messages from at least one ingress link. As shown in block 1700, the message receiving port 1702 receives n data messages 17111-1711 n and m beacon messages 17061-1706 m. The data messages 17111-1711 n respectively comprise packet loss identifiers 17121-1712 n, and the beacon messages 17061-1706 m respectively comprise data message barriers 17071-1707 m and response message barriers 17081-1708 m, wherein m and n are positive integers greater than 1. The beacon messages 17061-1706 m are inserted into the data message stream formed by the data messages 17081-1708 n at regular time intervals, so that m can be smaller than n, the larger the number of m is, the more accurate the message full-order control function of the data message barrier can be, and the beacon messages 17061-1706 m can be inserted into the data message stream formed by the data messages 17111-1711 n at regular time intervals.

The message receiving port 1702 splits the data message and the beacon message, sends the data messages 17111 to 1711n to the first processor 1703 for processing, generates data messages 17111 'to 1711 n', sends the beacon messages 17061 to 1706m to the second processor 1704 for processing, and generates beacon messages 17061 'to 1706 m'.

The first processor 1703 may be a main processor of a switch, and its main functions are to perform routing control processing on the data messages 17111 to 1711n, perform packet loss detection, and perform related processing for the packet loss identifiers 17121 to 1712 n. If the packet loss state is detected, recording the minimum value 1709 of the current data message barrier, and considering the packet loss state before the minimum value 1710 of the response message barrier does not reach the minimum value 1709 of the recorded data message barrier. In the packet loss state, the first processor 1703 sets the packet loss flags 17121 to 1712n to be the packet loss state. And when the packet loss state message is detected, the packet loss identifiers 17121-1712 n are not set. In the non-packet-loss state, the packet loss identifiers 17121 to 1712n in the data messages 17111 to 1711n are the same as the packet loss identifiers 17121 'to 1712 n' in the data messages 17111 'to 1711 n'. It should be noted that, before the packet loss state does not disappear, the packet loss flag in the data message arriving at the processor 1703 is set to be in the packet loss state. After the processing is completed, the first processor 1703 sends the data messages 17111 'to 1711 n' generated after the processing to the message sending port 1705. The detailed process of the packet loss detection and processing of the first processor 1703 is the same as the process executed by the processor 1603 in the block 1600 in technical principle.

The second processor 1704 may act as an auxiliary processor dedicated to processing beacon messages related to full-order control of messages. The second processor 1704 performs routing control processing on the beacon messages 17061-1706 m, determines a minimum value 1709 of data message barriers 17071-1707 m and a minimum value 1710 of response message barriers 17081-1708 m contained in the beacon messages 17061-1706 m, modifies the data message barriers and the response message barriers in the beacon messages 17061-1706 m to respective corresponding minimum values, and identifies the modified beacon messages as beacon messages 17061 '-1706 m'. The second processor 1704 sends beacon messages 17061 '-1706 m' to the message sending port 1705.

A message sending port 1705 for sending data messages 17111 'to 1711 n' and beacon messages 17061 'to 1706 m' to at least one egress link.

Message transformation structure of block 1700

Accordingly, the first processor 1703 of the switch 1701 receives the data message as described above, and the processing procedure thereof is the same as that of the first processor 1703 described above. The second processor 1704 receives the first beacon message and the second beacon message, performs corresponding routing control processing, and performs processing on the data message barrier and the response message barrier in the two beacon messages respectively, which are the same as the processing procedure detailed in the aforementioned block 1700 in principle.

Six examples of switch implementations

As yet another example, as shown in fig. 18, which is a block diagram 1800 illustrating the structure of six implementation examples of a switch implementing the message ordering mechanism. The message structure in block 1800 may be the same as that shown in block 800, where there are two types of messages: a data message comprising a timestamp, data, and a packet loss flag; a beacon message comprising a data message barrier and an acknowledgement message barrier. For simplicity of illustration, only data messages and beacon message numbers are shown in block 1800, and the internal structure of the messages is not shown.

The switch 1801 includes: a message receiving port 1802, a processor 1803, and a message sending port 1804. Unlike several switch implementations described above, the switch 1801 sends a beacon message 1807 carrying message reordering control information to the proxy 1805 for processing, and receives a returned beacon message 1807' and then sends it to the egress link. The proxy 1805 may be any host in a distributed system. For data message 1806, it is still processed by switch 1801.

Specifically, after receiving the data message 1806 and the beacon message 1807, the message receiving port 1802 of the switch 1801 sends the data message 1806 to the processor 1803, and sends the beacon message 1807 to the proxy 1805.

The processor 1803 performs a routing control process on the data message 1806, performs packet loss detection and related processing for setting a packet loss identifier, which are performed by the first processor 1703 in the block 1700, and generates a data message 1806', which is sent to the egress link through the message sending port 1804.

After receiving the beacon message 1807, the proxy 1805 performs the determination and modification processing on the data message barrier and the response message barrier performed by the second processor 1704 in the block 1700, and generates a beacon message 1807 ', and then sends the beacon message 1807 ' to the message sending port 1804 of the switch 1801, and the message sending port 1804 sends the beacon message 1807 ' to the egress link.

Message warp structure for block 1800

The message structure in block 1800 may be the same as that shown in block 300 and block 1400, where the data message contains a timestamp and data, and the beacon message contains a data message barrier.

Accordingly, the processor 1803 performs the processing performed by the first processor 1403 in the block diagram 1400, and the proxy 1805 performs the processing performed by the second processor 1404.

It should be noted that, under the technical idea of the present disclosure, the above message structure relates to a packet loss identifier, a data message barrier, and a response message barrier as full-order control information, and the full-order control information may be carried in a data message and/or a beacon message in various combinations.

In addition, in the structure of the switch, the data message and the beacon message of the various message structures can be put on the switch provided with one processor for processing, or the switch provided with more than one processor can be used for processing, wherein the main processor of the switch is used for processing the data message, and the auxiliary processor is used for processing the beacon message, thereby reducing the load of the main processor and the influence on the data message processing. In addition, the beacon messages of the various message structures can be processed by the proxy, so that the load of the main processor can be reduced, and the influence on the data message processing can be reduced.

Illustrative Process

As shown in fig. 19, which is a flowchart 1900 illustrating one of the illustrative processes for implementing the message ordering mechanism, the process may be applied to a switching device in a distributed system, where the switching device may be the switch described above, or a host or other network device with message forwarding or routing functions. The processing process comprises the following steps:

s1901: a data message barrier for at least one ingress link is obtained. As introduced above, the data message barrier may be carried by a data message, may be carried by a beacon message, or may be carried by both a data message and a beacon message. The ingress link may be a link formed between the current switching device and a switch or host.

If the data message barrier is included in the data message, the process S1901 may include: a data message barrier is obtained from data messages received from at least one ingress link.

If the data message barrier is included in the first beacon message inserted into the data message stream, the process S1901 may include: a data message barrier is obtained from a first beacon message received from at least one ingress link.

If the data message barrier is included in the data message and the first beacon message, the process S1901 may include: a data message barrier is obtained from the data message and the first beacon message received from the at least one ingress link.

S1902: a minimum value of the data message barrier is determined. The switching device may determine and record the minimum value according to the acquired data message barrier, and continuously update the minimum value with the acquisition of a new data message barrier.

S1903: the minimum value of the data message barrier is sent to the at least one egress link. The egress link may be a link formed between the current switching device and a switch or host.

Corresponding to the processing procedure S1901, if the data message barrier is included in the data message, the processing procedure S1903 includes: and modifying a data message barrier in the data message to be the minimum value of the data message barrier, and sending the data message to at least one egress link.

If the data message barrier is contained in the first beacon message inserted into the data message stream, the process S1903 includes: the data message barrier in the first beacon message is modified to a minimum value of the data message barrier, and the first beacon message is sent to the at least one egress link.

If the data message barrier contains a data message and a first beacon message, the process S1903 includes: and modifying a data message barrier in the data message and the first beacon message to be a minimum value of the data message barrier, and sending the data message and the first beacon message to at least one egress link.

The related processing procedure for implementing the message full-order mechanism on the switching device is described based on fig. 19, and the corresponding processing procedure on the receiver host is described below.

As shown in fig. 20, a flowchart 2000 of a second illustrative process for implementing a message ordering mechanism is shown, which comprises:

s2001: and sequencing the data messages according to the time stamps contained in the received data messages. The sequenced data messages may be placed in a buffer queue of the host and await delivery.

S2002: a data message barrier for at least one ingress link is obtained, and a minimum value of the data message barrier is determined. For a receiver host, the ingress link may be a link between the receiver host and the switching device. As introduced above, obtaining the data message barrier of the at least one ingress link according to the difference between the messages carrying the data message barrier may include: a data message barrier is obtained from data messages and/or first beacon messages of at least one ingress link.

S2003: data messages having timestamps less than the minimum value of the data message barrier are delivered in-order. Based on the message full-order mechanism of the present disclosure, the data message barrier represents: for the current receiver host, no more data messages having a timestamp less than the minimum value of the data message barrier are received. Thus, data messages having timestamps less than the minimum value of the data message barrier may be delivered to an application for processing.

Through the flows shown in fig. 19 and fig. 20, a basic message full-order mechanism is realized. Next, it is described how to implement a message full-sequence mechanism on the premise of ensuring high reliability for packet loss in a distributed system.

Referring now to FIG. 21, a flowchart 2100 is presented illustrating a third process flow for implementing the message ordering mechanism, which is based on the process flow shown in FIG. 19. the process flow shown in FIG. 21 may be performed in parallel with the process flow shown in FIG. 19. The method specifically comprises the following steps:

s2101: an acknowledgement message barrier for at least one ingress link is obtained. Similar to the data message barrier, the reply message barrier may be carried by a data message, may be carried by a beacon message, or may be carried by both a data message and a beacon message. The beacon message carrying the response message barrier may be the first beacon message in fig. 19, or may be a second beacon message other than the first beacon message.

Correspondingly, according to the difference of the messages carrying the response message barrier, the specific processing procedure of S2101 is as follows: and acquiring a response message barrier from the data message or the first beacon message of the at least one ingress link, or acquiring the response message barrier from the data message and the second beacon message of the at least one ingress link.

S2102: a minimum value of the acknowledgement message barrier is determined. The switching device may determine and record the minimum value according to the acquired response message barrier, and continuously update the minimum value with the acquisition of a new response message barrier.

S2103: sending the minimum value of the acknowledgement message barrier to the at least one egress link. Corresponding to the processing procedure S2101, the processing procedure of S2103 may include the following possible specific processing procedures according to the difference of the message carrying the response message barrier:

and modifying the response message barrier in the data message into the minimum value of the response message barrier, and sending the data message to at least one egress link.

The reply message barrier in the first beacon message is modified to a minimum value of the reply message barrier, and the first beacon message is sent to the at least one egress link.

The reply message barrier in the second beacon message is modified to a minimum value of the reply message barrier, and the second beacon message is sent to the at least one egress link.

The method further includes modifying a reply message barrier in the data message and the first beacon message to a minimum value of the reply message barrier and transmitting the data message and the first beacon message to the at least one egress link.

And modifying the response message barrier in the data message and the second beacon message to be the minimum value of the response message barrier, and sending the data message and the second beacon message to the at least one egress link.

In addition, a packet loss state may be further embedded in the data message, and accordingly, on the switching device, a process of detecting the packet loss state may be performed, and when the packet loss state is detected, a packet loss flag in the data message received from the at least one ingress link is set to be in the packet loss state. The detailed processing mechanism for detecting the packet loss state and for the packet loss identifier is already described in detail above, and is not described herein again.

Note that, in the processing procedure S2103 of fig. 21 and the processing procedure S1903 of fig. 19, the same processing procedure may be used for the processing of transmitting the data message and the first beacon message to the egress link.

The related processing procedure for implementing the high-reliability message full-order mechanism on the switching device is described based on fig. 21, and the corresponding processing procedure on the receiver host is described below.

As shown in fig. 22, a flow diagram 2200 is provided for an illustrative process for implementing a message ordering mechanism. The processing process comprises the following steps:

s2201: and sequencing the data messages according to the time stamps contained in the received data messages.

S2202: the method includes acquiring a data message barrier and an acknowledgement message barrier of at least one ingress link and determining a minimum of the data message barrier and the acknowledgement message barrier. For the specific manner of acquiring the data message barrier and the response message barrier, the data message barrier and the response message barrier may be acquired from one or some combination of the data message, the first beacon message and the second beacon message according to the difference between the messages carrying the data message barrier and the response message barrier in fig. 19 and 21.

S2203: and judging the packet loss identifier contained in the data message, and executing corresponding processing according to the judgment result. Specifically, if the packet loss flag indicates a packet loss state, S2204 is performed, and if the packet loss flag indicates a non-packet loss state, S2205 is performed.

S2204: data messages having timestamps less than the minimum value of the reply message barrier are posted in-order.

S2205: data messages having timestamps less than the minimum value of the data message barrier are delivered in-order.

In the above processing, the packet loss identifier may be regarded as a switch for switching between the response message barrier and the data message barrier, and in the packet loss state, the receiving end host uses the response message barrier as a message delivery barrier, and switches to use the data message barrier as a message delivery barrier until the packet loss state disappears. By the processing mode, a message full-order mechanism with high reliability in a distributed system is realized.

The following describes a processing procedure of timestamp adjustment for a link delay phenomenon proposed by the present disclosure.

As shown in fig. 23, a flow diagram 2300 of five illustrative processes for implementing a message ordering mechanism. The processing procedure can be applied to the hosts of the distributed system, and the timestamp of each host in the distributed system can be adjusted through message interaction among the hosts. The processing process comprises the following steps:

s2301: and sending a first time adjustment message to the plurality of hosts, wherein the first time adjustment message comprises a first time stamp, and the first time stamp is the local physical time when the first time adjustment message is sent.

S2302: and receiving second time adjustment messages returned from the plurality of hosts, wherein the second time adjustment messages comprise second timestamps, and the second timestamps are the maximum values of the corresponding first timestamps after the local physical time of the received first time adjustment messages is aligned.

S2303: and performing round trip delay (RTT) compensation on the second time stamp in the second time adjustment message to generate a third time stamp.

S2304: a first difference between a local physical time at which the second time adjustment message is received and a third timestamp corresponding to the second time adjustment message is calculated.

S2305: and acquiring the minimum value of the first difference value as a timestamp adjustment value of the subsequent sending message.

The above-described processing in S2301 to S2305 relates to the processing of the sending-end host in the timestamp adjustment mechanism, and in fact, the host also performs the corresponding processing as the receiving-end host. Thus, as shown in FIG. 24, there is a flow diagram 2400 of six illustrative processes for implementing a message order-filling mechanism. In addition to the processing flow of fig. 23, the following flow may be included:

s2401: first time adjustment messages sent by a plurality of hosts are received.

S2402: and calculating the maximum value of the corresponding first time stamps after aligning the local physical time of the received first time adjustment message.

S2403: the maximum value is included as a second time stamp in a second time adjustment message and transmitted to the plurality of hosts.

By the timestamp adjustment processing, the delivery delay can be improved as a whole. The information carried in the first time adjustment message and the second time adjustment message may also be carried by the data message and/or the beacon message in the foregoing example, that is, the foregoing data message and/or beacon message may be used as the first time adjustment message and the second time adjustment message, so that the adjustment of the timestamp is completed in the normal message transmission process, and the adjustment mechanism of the timestamp performs multiple rounds of adjustment along with the message transmission process, so that the effective adjustment of the timestamp can be realized even in the case of the fluctuation of the link delay.

Electronic device implementation examples

The electronic apparatus of the present disclosure may be an electronic device having mobility, and may also be a computing device that is less mobile or non-mobile. The electronic device at least comprises a processing unit and a memory, wherein the memory is stored with instructions, and the processing unit acquires the instructions from the memory and executes processing so as to enable the electronic device to execute actions.

In some examples, one or more modules or one or more steps or one or more processing procedures related to fig. 1 to 24 may be implemented by software programs, hardware circuits, or by a combination of software programs and hardware circuits. For example, each of the above components or modules and one or more of the steps may be implemented in a system on chip (SoC). The SoC may include: an integrated circuit chip, the integrated circuit chip comprising one or more of: a processing unit (e.g., a Central Processing Unit (CPU), microcontroller, microprocessing unit, digital signal processing unit (DSP), etc.), a memory, one or more communication interfaces, and/or further circuitry for performing its functions and optionally embedded firmware.

As shown in fig. 25, it is a block diagram of an exemplary electronic device 2500 with mobility. The electronic device 2500 may be a small form factor portable (or mobile) electronic device. The small form factor portable (or mobile) electronic devices referred to herein may be: such as a cellular phone, a Personal Data Assistant (PDA), a laptop computer, a tablet computer, a personal media player device, a wireless network viewing device, a personal headset device, an application specific device, or a hybrid device that includes any of the above functions. The electronic device 2500 includes at least: a memory 2501 and a processor 2502.

The memory 2501 stores programs. In addition to the programs described above, the memory 2501 may also be configured to store other various data to support operations on the electronic device 2500. Examples of such data include instructions for any application or method operating on the electronic device 2500, contact data, phonebook data, messages, pictures, videos, and so forth.

The memory 2501 may be implemented by any type of volatile or non-volatile memory device or combination thereof, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.

The memory 2501 is coupled to the processor 2502 and contains instructions stored thereon that, when executed by the processor 2502, cause the electronic device to perform acts that, as one embodiment of an electronic device, may include: the relevant processing flows performed by the examples corresponding to fig. 19-24, the processing logic performed by the switch examples shown in fig. 13-18, and the processing logic performed by the various hosts or switches of the distributed system in the examples shown in fig. 1-12. The electronic device 2500 may perform the corresponding functional logic as a host and/or a switch in a distributed system.

The above processing operations are described in detail in the foregoing embodiments of the method and apparatus, and the details of the above processing operations are also applicable to the electronic device 2500, that is, the specific processing operations mentioned in the foregoing embodiments can be written in the memory 2501 in the form of a program and executed by the processor 2502.

Further, as shown in fig. 25, the electronic device 2500 may further include: communication components 2503, power components 2504, audio components 2505, displays 2506, chipsets 2507, and the like. Only some of the components are shown schematically in fig. 25, and the electronic device 2500 is not meant to include only the components shown in fig. 25.

The communication component 2503 is configured to facilitate communications between the electronic device 2500 and other devices in a wired or wireless manner. The electronic device may access a wireless network based on a communication standard, such as WiFi, 2G, 3G, 4G, and 5G, or a combination thereof. In an exemplary embodiment, the communication component 2503 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communications component 2503 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

A power supply component 2504 provides power to the various components of the electronic device. The power components 2504 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for an electronic device.

The audio component 2505 is configured to output and/or input audio signals. For example, the audio component 2505 includes a Microphone (MIC) configured to receive external audio signals when the electronic device is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 2501 or transmitted via the communication component 2503. In some embodiments, audio component 2505 also includes a speaker for outputting audio signals.

The display 2506 includes a screen, which may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation.

The memory 2501, processor 2502, communication components 2503, power components 2504, audio components 2505, and display 2506 described above may be connected to a chipset 2507. The chipset 2507 may provide an interface between the processor 2502 and the rest of the components in the electronic device 2500. The chipset 2507 may further provide an access interface for various components of the electronic device 2500 to the memory 2501 and a communication interface for various components to access each other.

In some examples, one or more of the modules or one or more of the steps or one or more of the processes described above in relation to fig. 1-24 may be implemented by a computing device having an operating system and a hardware configuration.

Fig. 26 is a block diagram illustrating an exemplary computing device 2600. For example, the switches and hosts (sender host and receiver host) in the present disclosure may be implemented in one or more computing devices similar to computing device 2600 in the static computer embodiment, including one or more features and/or alternative features of computing device 2600. The description of computer 2600 provided herein is provided for purposes of illustration, and is not intended to be limiting. Embodiments may also be implemented in other types of computer systems known to those skilled in the relevant art.

As shown in fig. 26, computing device 2600 includes one or more processors 2602, a system memory 2604, and a bus 2606 that couples various system components including the system memory 2604 to the processors 2602. Bus 2606 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. The system memory 2604 includes Read Only Memory (ROM)2608 and Random Access Memory (RAM) 2610. A basic input/output system 2612(BIOS) is stored in ROM 2608.

Computer system 2600 also has one or more of the following drives: a hard disk drive 2614 for reading from and writing to a hard disk, a magnetic disk drive 2616 for reading from or writing to a removable magnetic disk 2618, and an optical disk drive 2620 for reading from or writing to a removable optical disk 2622 such as a CD ROM, DVDROM, or other optical media. The hard disk drive 2614, magnetic disk drive 2616, and optical disk drive 2620 are connected to the bus 2606 by a hard disk drive interface 2624, a magnetic disk drive interface 2626, and an optical drive interface 2628, respectively. The drives and their associated computer-readable media provide nonvolatile storage of computer-readable instructions, data structures, program modules and other data for the computer. Although a hard disk, a removable magnetic disk and a removable optical disk are described, other types of computer-readable storage media can be used to store data, such as flash memory cards, digital video disks, Random Access Memories (RAMs), Read Only Memories (ROMs), and the like.

Several program modules may be stored on the hard disk, magnetic disk, optical disk, ROM, or RAM. These programs include an operating system 2630, one or more application programs 2632, other programs 2634, and program data 2636. These programs may include, for example, the relevant process flows performed to implement the examples shown in fig. 19-24, the processing logic performed by the switch examples shown in fig. 13-18, and the processing logic performed by the various hosts or switches of the distributed system in the examples shown in fig. 1-12.

A user may enter commands and information into computing device 2600 through input devices such as a keyboard 2638 and pointing device 2640. Other input devices (not shown) may include a microphone, joystick, game pad, satellite dish, scanner, touch screen, and/or touch pad, voice recognition system for receiving voice inputs, gesture recognition system for receiving gesture inputs, and the like. These and other input devices can be connected to the processor 2602 through a serial port interface 2642 that is coupled to bus 2606, but may be connected by other interfaces, such as a parallel port, game port, or a Universal Serial Bus (USB).

A display screen 2644 is also connected to bus 2606 via an interface, such as a video adapter 2646. The display screen 2644 may be external to or incorporated within the computing device 2600. The display screen 2644 may display information as well as act as a user interface for receiving user commands and/or other information (e.g., via touch, finger gestures, virtual keyboard, etc.). In addition to the display screen 2644, the computing device 2600 may include other peripheral output devices (not shown), such as speakers and printers.

Computer 2600 is connected to a network 2648 (e.g., the Internet) through an adapter or network interface 2650, a modem 2652, or other means for establishing communications over the network. The modem 2652, which may be internal or external, may be connected to the bus 2606 via the serial port interface 2642, as shown in fig. 26, or may be connected to the bus 2606 using another interface type, including a parallel interface.

As used herein, the terms "computer program medium," "computer-readable medium," and "computer-readable storage medium" are used to generally refer to media such as the hard disk associated with hard disk drive 2614, removable magnetic disk 2618, removable optical disk 2622, system memory 2604, flash memory cards, digital video disks, Random Access Memory (RAM), Read Only Memory (ROM), and other types of physical/tangible storage media. These computer-readable storage media are distinct and non-overlapping with respect to communication media (which does not include communication media). Communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave. The term "modulated data signal" means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wireless media such as acoustic, RF, infrared and other wireless media and wired media. Various embodiments are also directed to these communication media.

As indicated above, computer programs and modules (including application programs 2632 and other programs 2634) may be stored on the hard disk, magnetic disk, optical disk, ROM, or RAM. Such computer programs may also be received via network interface 2650, serial port interface 2642, or any other interface type. Such computer programs, when executed or loaded by an application, enable computer 2600 to implement features of embodiments discussed herein. Accordingly, such computer programs represent controllers of computer system 2600.

As such, various embodiments are also directed to computer program products comprising computer instructions/code stored on any computer usable storage medium. Such code/instructions, when executed in one or more data processing devices, cause the data processing devices to operate as described herein. Examples of computer readable storage devices that may include computer readable storage media include storage devices such as RAM, hard disk drives, floppy disk drives, CD ROM drives, DVD DOM drives, compact disk drives, tape drives, magnetic storage device drives, optical storage device drives, MEM devices, nanotechnology-based storage devices, and other types of physical/tangible computer readable storage devices.

Example clauses

A1: a method, comprising:

obtaining a data message barrier of at least one ingress link;

determining a minimum value of the data message barrier;

transmitting the minimum value of the data message barrier to at least one egress link.

A2: the method of paragraph A1, wherein the data message barrier is included in a data message, the obtaining the data message barrier of the at least one ingress link comprising:

and acquiring the data message barrier from the data messages received by at least one ingress link.

A3: the method of paragraph A1, wherein the data message barrier is included in a first beacon message inserted into a data message stream, the obtaining the data message barrier of the at least one ingress link comprising:

obtaining the data message barrier from the first beacon message received from at least one ingress link.

A4: the method of paragraph A2, wherein the sending the minimum value of the data message barrier to the at least one egress link comprises:

setting a data message barrier in the data message to a minimum value of the data message barrier, and sending the data message to the at least one egress link.

A5: the method of paragraph A3, wherein the sending the minimum value of the data message barrier to the at least one egress link comprises:

transmitting a first beacon message including a minimum value of the data message barrier to the at least one egress link.

A6: the method of paragraph A1, further comprising:

acquiring a response message barrier of at least one entry link;

determining a minimum value of the reply message barrier;

sending the minimum value of the reply message barrier to at least one egress link.

A7: the method of paragraph A6, wherein the obtaining an acknowledgement message barrier for at least one ingress link comprises:

and acquiring the response message barrier from the data message received by at least one ingress link.

A8: the method of paragraph a6, wherein the reply message barrier is included in a second beacon message inserted into the data message stream, the obtaining the reply message barrier for the at least one ingress link comprising:

obtaining the response message barrier from the second beacon message received from at least one ingress link.

A9: the method of claim a7, wherein the sending the minimum value of the acknowledgement message barrier to at least one egress link includes:

setting a reply message barrier in the data message to a minimum value of the reply message barrier, and sending the data message to the at least one egress link.

A10: the method of claim A8, wherein the sending the minimum value of the acknowledgement message barrier to at least one egress link includes:

transmitting a first beacon message including a minimum value of the reply message barrier to the at least one egress link.

A11: the method according to paragraph a6, wherein the data message includes a packet loss flag, the method further comprising:

and judging the packet loss state, and if the packet loss state is existed, setting the packet loss identifier in the data message received from at least one inlet link as the packet loss state.

A12: the method of paragraph a11, wherein the performing packet loss status monitoring comprises:

and when the packet loss event is detected, if the minimum value of the response message barrier is smaller than the minimum value of the data message barrier when the packet loss event occurs, determining that the packet loss state is present.

A13: a switch, comprising:

a message receiving port for receiving data messages from at least one ingress link;

the processor is used for carrying out routing control processing on the data message, determining the minimum value of a data message barrier contained in the received data message and setting the data message barrier in the data message as the minimum value of the data message barrier;

a message sending port for sending data messages to the at least one egress link.

A14: the switch of paragraph a13, wherein the processor is further to:

determining the minimum value of a response message barrier contained in the received data message, and setting the response message barrier in the data message as the minimum value of the response message barrier.

A15: the switch of paragraph a13, wherein the processor is further to:

and judging the packet loss state, and if the packet loss state is detected, setting the packet loss identifier in the data message to be the packet loss state.

A16: the switch of paragraph a15, wherein the performing packet loss state detection comprises:

A17: an electronic device, comprising:

a processing unit; and

a memory coupled to the processing unit and containing instructions stored thereon that, when executed by the processing unit, cause the electronic device to perform acts comprising:

obtaining a data message barrier of at least one ingress link;

determining a minimum value of the data message barrier;

transmitting the minimum value of the data message barrier to the at least one egress link.

A18: the electronic device of paragraph a17, wherein the data message barrier is included in a data message, the obtaining the data message barrier of the at least one ingress link comprising:

A19: the electronic device of paragraph a17, wherein the data message barrier is included in a first beacon message inserted into a data message stream, the obtaining the data message barrier for the at least one ingress link comprising:

A20: the electronic device of paragraph a 18, wherein the sending the minimum value of the data message barrier to the at least one egress link comprises:

A21: the electronic device of paragraph a19, wherein the sending the minimum value of the data message barrier to the at least one egress link comprises:

A22: the electronic device of paragraph a17, wherein the actions further comprise:

acquiring a response message barrier of at least one entry link;

determining a minimum value of the reply message barrier;

A23: the electronic device of paragraph a 22, wherein the data message includes a packet loss flag, and the actions further include:

A24: the electronic device according to paragraph a 23, wherein the monitoring of the packet loss status includes:

B1: a switch, comprising:

a message receiving port for receiving a data message and a beacon message from at least one ingress link, and sending the data message to a first processor and the beacon message to a second processor;

the first processor is used for executing routing control processing on the data message;

the second processor is configured to determine a minimum value of a data message barrier included in the received beacon message, and set the data message barrier in the beacon message to the minimum value of the data message barrier;

a message transmit port for transmitting data messages and the beacon messages to at least one egress link.

B2: the switch of paragraph B1, wherein the second processor is further to:

and determining the minimum value of the response message barriers contained in the received beacon message, and setting the response message barriers in the beacon message to be the minimum value of the response message barriers.

B3: the switch of paragraph B2, wherein the data message includes a packet loss flag, and the first processor is further configured to:

B4: the switch of paragraph B3, wherein the monitoring of the packet loss status includes:

C1: a switch, comprising:

a message receive port for receiving data messages and beacon messages from at least one ingress link, sending the data messages to a first processor, sending the beacon messages to a proxy, the beacon messages including a data message barrier,

the processor is used for executing routing control processing on the data message;

and the message sending port is used for receiving the beacon message returned by the proxy host, setting a data message barrier in the returned beacon message as the minimum value of the data message barrier, and sending the data message and the beacon message to at least one egress link.

C2: the switch of paragraph C1, wherein the beacon message further includes an acknowledgement message barrier, and the acknowledgement message barrier in the returned beacon message is set to a minimum value of the acknowledgement message barrier.

C3: the switch of paragraph C2, wherein the data message includes a packet loss flag, the processor is further configured to:

C4: the switch of paragraph C3, wherein the monitoring of the packet loss status includes:

D1: a method, comprising:

sequencing the data messages according to the time stamps contained in the received data messages;

obtaining a data message barrier of at least one ingress link;

determining a minimum value of the data message barrier;

and delivering the data messages with the time stamps smaller than the minimum value of the data message barrier in sequence.

D2: the method of paragraph D1, wherein the data message barrier is included in a data message, the obtaining the data message barrier of the at least one ingress link comprising:

D3: the method of paragraph D1, wherein the data message barrier is included in a first beacon message inserted into a data message stream, the obtaining the data message barrier of the at least one ingress link comprising:

D4: an electronic device, comprising:

a processing unit; and

obtaining a data message barrier of at least one ingress link;

determining a minimum value of the data message barrier;

D5: the electronic device of paragraph D4, wherein the data message barrier is included in a data message, the obtaining the data message barrier of the at least one ingress link comprising:

D6: the electronic device of paragraph D4, wherein the data message barrier is included in a first beacon message inserted into a data message stream, the obtaining the data message barrier of the at least one ingress link comprising:

E1: a method, comprising:

acquiring a data message barrier and a response message barrier of at least one entry link;

determining a minimum value of the data message barrier and the reply message barrier;

if the packet loss mark contained in the data message is in a packet loss state, delivering the data messages with the time stamps smaller than the minimum value of the response message barrier in sequence; and if the packet loss mark is in a non-packet loss state, delivering the data messages with the time stamps smaller than the minimum value of the data message barrier in sequence.

E2: the method of paragraph E1, wherein obtaining the data message barrier and the reply message barrier of the at least one ingress link comprises:

acquiring the data message barrier and a response message barrier from the data messages received by at least one entrance link;

e3: the method of paragraph E1, wherein the data message barrier is included in a first beacon message inserted into the data message stream, the reply message barrier is included in a second beacon message inserted into the data message stream, and obtaining the data message barrier and the reply message barrier for the at least one ingress link comprises:

the data message barrier is obtained from the first beacon message received from at least one ingress link, and the reply message barrier is obtained from the second beacon message received from at least one ingress link.

E4: an electronic device, comprising:

a processing unit; and

E5: the electronic device of paragraph E4, wherein obtaining the data message barrier and the reply message barrier for the at least one ingress link comprises:

e6: the electronic device of paragraph E4, wherein the data message barrier is included in a first beacon message inserted into the data message stream and the reply message barrier is included in a second beacon message inserted into the data message stream, the obtaining the data message barrier and the reply message barrier for the at least one ingress link comprising:

F1: a method, comprising:

sending a first time adjustment message to a plurality of hosts, the first time adjustment message including a first timestamp indicating a local physical time at which the first time adjustment message was sent;

receiving second time adjustment messages sent by the multiple hosts, wherein the second time adjustment messages comprise second timestamps, and the second timestamps are the maximum values of the corresponding first timestamps after local physical time of the received first time adjustment messages is aligned;

performing round-trip delay compensation on the second timestamp;

and determining a minimum value of a first difference between the local physical time at which the second time adjustment message is received and the compensated second timestamp as a timestamp adjustment value of the transmission message.

F2: the method of paragraph F1, further comprising:

receiving first time adjustment messages sent by the plurality of hosts;

determining a maximum value in a corresponding first timestamp after aligning local physical time of receiving the first time adjustment message;

the maximum value is included as a second time stamp in a second time adjustment message and transmitted to the plurality of hosts.

F3: an electronic device, comprising:

a processing unit; and

compensating for a round trip delay (RTT) of the second timestamp;

F4: the electronic device of paragraph F3, wherein the actions further include:

receiving first time adjustment messages sent by the plurality of hosts;

Final phrase

The system is not greatly different in hardware and software implementation in various aspects; the use of hardware or software is often a design choice that trades off cost versus efficiency, but in some cases, the choice of hardware or software may be a relatively obvious decision. There are various vehicles by which processes and/or systems and/or other technologies described herein can be effected (e.g., hardware, software, and/or firmware), and that the vehicle can, preferably, vary depending upon the context in which the processes and/or systems and/or other technologies are employed. For example, if an implementer determines that speed and accuracy are paramount, the implementer may opt to be primarily implemented by a hardware and/or firmware vehicle; if flexibility is most important, the implementer may choose the primary software to implement; alternatively, or in addition, the implementer may opt for some combination of hardware, software, and/or firmware.

The foregoing detailed description has set forth various embodiments of the devices and/or processes via the use of block diagrams, flowcharts, and/or examples. Such block diagrams, flowcharts, and/or examples contain one or more functions and/or operations, and it will be understood by those within the art that each function and/or operation within such block diagrams, flowcharts, or examples can be implemented, individually and/or collectively, by a wide range of hardware, software, firmware, or virtually any combination thereof. In one embodiment, portions of the subject matter described herein may be implemented via an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), or other integrated format. However, those skilled in the art will recognize that some aspects of the embodiments disclosed herein, in whole or in part, can be equivalently implemented in integrated circuits, as one or more computer programs running on one or more computers (e.g., as one or more programs running on one or more computer systems), as one or more programs running on one or more processors (e.g., as one or more programs running on one or more microprocessors), as firmware, or as virtually any combination thereof, and that designing the circuitry and/or writing the code for the software and or firmware would be well within the skill of one of skill in the art in light of this disclosure. In addition, it should be apparent to those skilled in the art that the mechanisms of the subject matter described herein are capable of being distributed as a program product in a variety of forms, and that an illustrative embodiment of the subject matter described herein applies regardless of the particular type of signal bearing media used to actually carry out the distribution. Examples of signal bearing media include, but are not limited to: recordable type media such as floppy disks, Hard Disk Drives (HDD), Compact Disks (CD), Digital Versatile Disks (DVD), digital tapes, computer memory, etc.; and a transmission type medium such as a digital and/or an analog communication medium (e.g., a fiber optic cable, a waveguide, a wired communications link, a wireless communication link, etc.).

It will be appreciated by those skilled in the art that the apparatus and/or processes described in the manner set forth herein, and thereafter, the integration of such described apparatus and/or processes into a data processing system using technical practices, is common within the art. That is, at least a portion of the devices and/or processes described herein may be integrated into a data processing system via a reasonable number of experiments. Those skilled in the art will recognize that typical data processing systems typically include one or more of the following: a system unit housing, a video display device, a memory such as volatile and non-volatile memory, a processor such as a microprocessor and a digital signal processor, a computing entity such as an operating system, a driver, a graphical user interface, and an application program, one or more interactive devices such as a touch pad or a touch screen, and/or a control system including a feedback loop and a control motor (e.g., feedback for sensing position and/or velocity; control motor for moving and/or adjusting components and/or quantities). The general data processing system may be implemented using any suitable commercially available components, such as those commonly found in data computing/communication and/or network communication/computing systems.

The subject matter described herein sometimes illustrates different components contained within, or connected with, different other components. Those of ordinary skill in the art will appreciate that the architecture so depicted is merely exemplary and that in fact many other architectures can be implemented which achieve the same functionality. In a conceptual sense, any arrangement of components to achieve the same functionality is effectively "associated" such that the desired functionality is achieved. Hence, any two components herein combined to achieve a particular functionality can be seen as "associated with" each other such that the desired functionality is achieved, irrespective of architectures or intermedial components. Likewise, any two components so associated can also be viewed as being "operably connected," or "operably coupled," to each other to achieve the desired functionality, and any two components capable of being so associated can also be viewed as being "operably couplable," to each other to achieve the desired functionality. Specific examples of operably couplable include, but are not limited to, physically mateable and/or physically interacting components and/or wirelessly interactable and/or wirelessly interacting components and/or logically interacting and/or logically interactable components.

With respect to virtually any term used herein, those of skill in the art will recognize that it can be singular and/or plural as appropriate for the context and/or application. Various singular/plural variations are set forth herein for clarity.

It will be understood by those within the art that, in general, terms used herein, and especially in the appended claims (e.g., bodies of the appended claims) are generally intended as "open" expressions (e.g., the expression "including" should be interpreted as "including but not limited to," the expression "having" should be interpreted as "having at least," etc.). It will be further understood by those within the art that if a specific number of an claim recitation is intended, such an intent will be explicitly recited in the claim, and in the absence of such recitation no such intent is present. For example, as an aid to understanding, the following appended claims may contain usage of the introductory phrases "at least one" and "one or more" to introduce claim recitations. However, the use of such phrases should not be construed to limit the claim recitation to any particular claim containing such introduced claim recitation to inventions containing only one such recitation, even if the same claim includes the introductory phrases "one or more" or "at least one"; the same holds true for the definitions used to introduce claim recitations. In addition, even if a specific number of an claim recitation is explicitly recited, those skilled in the art will recognize that such recitation should typically be interpreted to mean, or at least represent, the recited number (e.g., such recitation of "two" typically means at least two, or two or more, without other modifiers). Also, in those instances where an expression similar to "A, B, and at least one of C, etc." is used, in general such a syntactic structure is intended to be understood by those skilled in the art in the sense that such an expression (e.g., "a system having A, B, and at least one of C" shall include, but not be limited to, systems having a alone, B alone, C, A alone and B together, a and C together, B and C together, and/or A, B and C together, etc.). In those instances where a convention analogous to "A, B, or at least one of C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand such a convention (e.g., "a system having A, B, or at least one of C" would include but not be limited to systems having a alone, B alone, C, A alone and B together, a and C together, B and C together, and/or A, B and C together, etc.). It will be further understood by those within the art that virtually any disjunctive word and/or phrase presenting two or more alternative terms, whether in the description, claims, or drawings, should be understood to contemplate the possibilities of including such terms, either of such terms, or both terms. For example, the phrase "a or B" should be understood to include the possibility of "a" or "B" or "a and B".

Reference in this specification to "an implementation," "one implementation," "some implementations," or "other implementations" may mean that a particular feature, structure, or characteristic described in connection with one or more implementations may be included in at least some implementations, but not necessarily in all implementations. Different appearances of "an implementation," "one implementation," or "some implementations" in the foregoing description are not necessarily all referring to the same implementations.

While certain exemplary techniques have been described and shown with various methods and systems, it will be understood by those skilled in the art that various other modifications may be made, and equivalents may be substituted, without departing from claimed subject matter. In addition, many modifications may be made to adapt a particular situation to the teachings of the claimed subject matter without departing from the central concept described herein. Therefore, it is intended that claimed subject matter not be limited to the particular examples disclosed, but that such claimed subject matter may also include all implementations falling within the scope of the appended claims, and equivalents thereof.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described. Rather, the specific features and acts are disclosed as illustrative forms of implementing the claims.

Conditional language (such as "can," "might," or "may") may be understood and used in context generally to mean that a particular example includes, but other examples do not include, particular features, elements and/or steps unless specifically stated otherwise. Thus, such conditional language is not generally intended to imply that features, elements, and/or steps are in any way required for one or more examples or that one or more examples necessarily include logic for deciding, with or without user input or prompting, whether such features, elements, and/or steps are to be included or are to be performed in any particular embodiment.

Unless specifically stated otherwise, it is to be understood that conjunctions (such as the phrase "X, Y or at least one of Z") indicate that the listed items, words, etc. can be either X, Y or Z, or a combination thereof.

Any routine descriptions, elements, or blocks in flow charts described in this disclosure and/or in the figures should be understood as potentially representing modules, segments, or portions of code that include one or more executable instructions for implementing specific logical functions or elements in the routine. Alternative examples are included within the scope of the examples described in this disclosure, in which elements or functions may be deleted or executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art.

It should be emphasized that many variations and modifications may be made to the above-described examples, the elements of which are to be understood as being in other acceptable examples. All such modifications and variations are intended to be included herein within the scope of this disclosure and protected by the following claims.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method, comprising:

obtaining a data message barrier of at least one ingress link;

determining a minimum value of the data message barrier;

2. The method of claim 1, wherein the data message barrier is included in a data message, the obtaining the data message barrier for the at least one ingress link comprising:

3. The method of claim 1, wherein the data message barrier is included in a first beacon message inserted into a data message stream, the obtaining the data message barrier for the at least one ingress link comprising:

4. The method of claim 2, wherein the sending the minimum value of the data message barrier to the at least one egress link comprises:

5. The method of claim 3, wherein the sending the minimum value of the data message barrier to the at least one egress link comprises:

6. The method of claim 1, further comprising:

acquiring a response message barrier of at least one entry link;

determining a minimum value of the reply message barrier;

7. The method of claim 6, wherein the obtaining an acknowledgement message barrier for at least one ingress link comprises:

8. The method of claim 6, wherein the reply message barrier is included in a second beacon message inserted into a data message stream, the obtaining the reply message barrier for the at least one ingress link comprising:

9. The method of claim 7, wherein the sending the minimum value of the reply message barrier to the at least one egress link comprises:

10. The method of claim 8, wherein the sending the minimum value of the acknowledgement message barrier to the at least one egress link comprises:

11. The method of claim 6, wherein the method further comprises: judging a packet loss state;

and if the data message is in the packet loss state, setting the packet loss identifier in the data message received from the at least one ingress link to be in the packet loss state.

12. The method of claim 11, wherein the determining the packet loss status comprises:

and when responding to the packet loss event, if the minimum value of the response message barrier is smaller than the minimum value of the data message barrier when the packet loss event occurs, determining that the packet loss state is existed.

13. A method, comprising:

14. The method of claim 13, wherein obtaining the data message barrier and the acknowledgement message barrier for the at least one ingress link comprises:

or,

the data message barrier is included in a first beacon message inserted into a data message stream, the reply message barrier is included in a second beacon message inserted into the data message stream, and the obtaining the data message barrier and the reply message barrier of the at least one ingress link includes:

15. The method of claim 14, further comprising:

sending a first time adjustment message to a plurality of hosts, wherein the first time adjustment message comprises a first time stamp which indicates a local physical time for sending the first time adjustment message;

and compensating the second time stamp for round trip delay, and determining the minimum value of the first difference value between the local physical time when the second time adjustment message is received and the compensated second time stamp as the time stamp adjustment value of the sending message.

16. The method of claim 15, further comprising:

receiving first time adjustment messages sent by the plurality of hosts;

17. An electronic device, comprising:

a processing unit; and

obtaining a data message barrier of at least one ingress link;

determining a minimum value of the data message barrier;

18. The electronic device of claim 17, wherein the data message barrier is included in a data message, the obtaining the data message barrier for the at least one ingress link comprising:

obtaining the data message barrier from data messages received by at least one ingress link;

or, the obtaining the data message barrier of the at least one ingress link includes:

19. The electronic device of claim 18,

the sending the minimum value of the data message barrier to the at least one egress link comprises:

setting a data message barrier in the data message to a minimum value of the data message barrier, and sending the data message to the at least one egress link;

or,

20. The electronic device of claim 15, wherein the actions further comprise:

acquiring a response message barrier of at least one entry link;

determining a minimum value of the reply message barrier;