CN115794731A - Decoupling control method for transmission of multi-channel data link between core particles - Google Patents

Decoupling control method for transmission of multi-channel data link between core particles Download PDF

Info

Publication number
CN115794731A
CN115794731A CN202310043090.1A CN202310043090A CN115794731A CN 115794731 A CN115794731 A CN 115794731A CN 202310043090 A CN202310043090 A CN 202310043090A CN 115794731 A CN115794731 A CN 115794731A
Authority
CN
China
Prior art keywords
physical control
control sublayer
data
channel
data packet
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310043090.1A
Other languages
Chinese (zh)
Other versions
CN115794731B (en
Inventor
谷江涛
李超
范靖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Chaomo Technology Co ltd
Original Assignee
Beijing Chaomo Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Chaomo Technology Co ltd filed Critical Beijing Chaomo Technology Co ltd
Priority to CN202310043090.1A priority Critical patent/CN115794731B/en
Publication of CN115794731A publication Critical patent/CN115794731A/en
Application granted granted Critical
Publication of CN115794731B publication Critical patent/CN115794731B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The application relates to a decoupling control method for transmission of a multi-channel data link between core particles, and belongs to the technical field of core particles. The method and the device can obtain the transaction layer data packet in the media data access control sublayer, and distribute the same data packet in the transaction layer data packet to the same physical control sublayer group, wherein the physical control sublayer group comprises at least one physical control sublayer channel, and one physical control sublayer channel corresponds to one link control logic, so that the method and the device are favorable for solving the problems that the transmission coupling of the data link between core particles is strong, and the transmission of the data link is stopped due to channel failure or blocking; and the transmission in the decoupled multilink is subjected to fault tolerance processing and load balancing processing, so that the link transmission obtains better characteristics of reliability, availability and serviceability (RAS).

Description

Decoupling control method for transmission of multi-channel data link between core particles
Technical Field
The application belongs to the technical field of core particles, and particularly relates to a decoupling control method for transmission of a multi-channel data link between core particles.
Background
At present, when various high-performance services are realized based on core particles, the requirement for data link transmission between the core particles is often higher, so as to avoid interruption of the high-performance services. The core grain refers to a wafer which is manufactured in advance, has a specific function and can be combined and integrated.
In practice, it has been found that when data link transmission is performed between core particles, data transmission depends on a plurality of channels in parallel, wherein the coupling between the plurality of channels is strong. Thus, if a lane fails or becomes blocked, it may cause the entire data link transmission to cease. Therefore, the problems that the coupling is strong and the data link transmission is stopped due to channel failure or blockage exist in the data link transmission between core particles at present.
In view of the above problems, no effective solution has been proposed.
Disclosure of Invention
Therefore, the decoupling control method for the transmission of the multi-channel data link between the core particles is provided, and the problems that the transmission coupling of the multi-channel data link between the core particles is strong, and the transmission of the data link is stopped due to channel failure or blocking are solved.
In order to achieve the purpose, the following technical scheme is adopted in the application:
in a first aspect, the present application provides a decoupling control method for core inter-die multichannel data link transmission, the method comprising:
acquiring a transaction layer data packet in a media data access control sublayer;
distributing the same data packet in the transaction layer data packets to the same physical control sub-layer group to realize the decoupling among the physical control sub-layer groups; the physical control sublayer group comprises at least one physical control sublayer channel, and one physical control sublayer channel corresponds to one link control logic.
Further, assigning the same one of the transaction layer packets to the same physical control sublayer group, including:
and distributing the same data packet in the transaction layer data packets to the same physical control sub-layer group based on preset data fault tolerance operation and/or preset load balancing operation so as to realize data fault tolerance and/or load balancing among the physical control sub-layer groups.
Further, the preset load balancing operation at least comprises the following steps:
acquiring a credit ready state table; wherein the credit ready state table is used for representing the receiving state of the physical control sublayer channel for the transaction layer packet;
performing weighted service quality arbitration on the credit ready state table to obtain an arbitration result; and the arbitration result is used for indicating the service quality of data transmission of each physical control sublayer channel.
Further, based on a preset load balancing operation, assigning the same data packet in the transaction layer data packets to the same physical control sub-layer group, including:
if the physical control sublayer group comprises one physical control sublayer channel, distributing the same data packet in the transaction layer data packets to the physical control sublayer channel with the same service quality meeting preset quality conditions according to a cross matrix selection mode based on the arbitration result;
and if the physical control sublayer group comprises two or more physical control sublayer channels, distributing the same data packet in the transaction layer data packet to the two or more physical control sublayer channels with the service quality meeting the preset quality condition in the same physical control sublayer group according to a cross matrix selection mode based on the arbitration result.
Further, allocating the same data packet in the transaction layer data packets to the same physical control sub-layer group based on a preset data fault tolerance operation and a preset load balancing operation, including:
acquiring a fault state vector table; wherein, the failure state vector table is used for representing the failure state of the physical control sublayer channel;
if the physical control sublayer group comprises one physical control sublayer channel, distributing the same data packet in the transaction layer data packets to the physical control sublayer channel with the same service quality meeting the preset quality condition and without fault according to a cross matrix selection mode based on the fault state vector table and the arbitration result;
and if the physical control sublayer group comprises two or more physical control sublayer channels, distributing the same data packet in the transaction layer data packet to the two or more non-fault physical control sublayer channels with service quality meeting preset quality conditions in the same physical control sublayer group according to a cross matrix selection mode based on the fault state vector table and the arbitration result.
Further, the method further comprises:
acquiring channel quantity configuration parameters and group quantity configuration parameters of a physical control sublayer;
setting the number of the physical control sublayer channels and the number of the physical control sublayer groups in the physical control sublayer group based on the channel number configuration parameter and the group number configuration parameter.
Further, the link control logic includes at least one of: the method comprises the steps that a physical control sublayer channel flow control first-in first-out queue, physical control sublayer channel transmission allocation logic, a retransmission cache unit, cyclic redundancy check, error check and correction, physical control sublayer channel credit ready state table logic and physical control sublayer channel fault-free state marking logic are arranged; and each physical control sublayer channel shares the data cache space of the retransmission cache unit.
Further, the preset data fault tolerance operation further includes the following steps:
controlling the target control operation of the transaction layer data packet at the media data access control sublayer; the target control operation is at least one of: cyclic redundancy check, dirty data marking, error data packet counting logic, transaction layer data packet overtime counting logic and transaction layer data packet error interruption triggering logic;
if the cyclic redundancy check detects that the physical control sublayer returns the specified error data, determining the error correction category corresponding to the specified error data; wherein the error correction categories include a correctable category or an uncorrectable category;
and if the error correction category is the uncorrectable category, executing the backup data retransmission operation in the retransmission cache unit.
Further, the method further comprises:
acquiring target data after the dirty data marking is executed;
returning the target data to a main processor device or an on-chip interconnection protocol bus so that the main processor device or the on-chip interconnection protocol bus determines the error importance degree corresponding to the target data;
executing specified data processing operation according to the error importance degree; wherein the specified data processing operation comprises at least one of: retransmission, ignore, discard, interrupt, update; wherein the update comprises an update to a fault vector status table and/or an update to a credit ready status table.
Further, the method further comprises:
setting a transaction layer data packet timeout timer and/or a transaction layer data packet retransmission counter in the media data access control sublayer;
controlling, at the media data access control sublayer, a target control operation on the transaction layer packet, including:
if the transaction layer data packet timeout timer detects timeout and/or the transaction layer data packet retransmission counter detects exceeding of a set number of times, reporting an exception to a main processor device to enable the main processor device to perform fault handling operation, wherein the fault handling operation includes at least one of the following operations: retransmission, ignore, discard, interrupt, reset restart.
In a second aspect, the present application provides a core die for inter-die data link transmission by the above-described decoupling control method for inter-die multi-channel data link transmission.
This application adopts above technical scheme, possesses following beneficial effect at least:
the same data packet in the transaction layer data packet in the media data access control sublayer is allocated to the same physical control sublayer group, the data dependency in the same data packet is strong, and the data dependency between different data packets is weak, so that the data with strong dependency in the same data packet is allocated to the same physical control sublayer group (channel), and the data with weak dependency between different data packets can be allocated to different physical control sublayer groups (channels), so that the coupling between the physical control sublayer groups (channels) can be reduced, the decoupling control of the transmission of the multi-channel data link between core particles is realized, and the problems that the transmission coupling of the data link between the core particles is strong, and the transmission of the data link is suspended due to channel failure or blocking are solved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings can be obtained by those skilled in the art without creative efforts.
FIG. 1 is a flow diagram illustrating a method of decoupled control for inter-core multi-channel data link transmission in accordance with an exemplary embodiment;
FIG. 2 is a flow chart illustrating a method of decoupled control for inter-core multi-channel data link transmission in accordance with another exemplary embodiment;
FIG. 3 is an overall architecture diagram illustrating transmission of a multi-channel inter-core grain data link in accordance with an exemplary embodiment;
FIG. 4 is a diagram illustrating a decoupled control architecture for an inter-core multi-channel data link transmission in accordance with another exemplary embodiment;
FIG. 5 is a diagram illustrating a decoupled control architecture for inter-core multi-channel data link transmission in accordance with another exemplary embodiment;
FIG. 6 is a diagram illustrating a decoupled control architecture for inter-core multi-channel data link transmission in accordance with another exemplary embodiment;
fig. 7 is a diagram illustrating a mapping between physical control sublayer channels and physical channels, according to an example embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be described in detail below. It should be apparent that the described embodiments are only a few embodiments of the present application, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the examples given herein without making any creative effort, shall fall within the protection scope of the present application.
Referring to fig. 1, fig. 1 is a flowchart illustrating a decoupling control method for inter-core multi-channel data link transmission according to an exemplary embodiment, the decoupling control method for inter-core multi-channel data link transmission including the steps of:
s11, acquiring a transaction layer data packet in a media data access control sublayer;
step S12, distributing the same data packet in the transaction layer data packets to the same physical control sub-layer group to realize the decoupling among the physical control sub-layer groups; the physical control sublayer group comprises at least one physical control sublayer channel, and one physical control sublayer channel corresponds to one link control logic.
In the present embodiment, the execution body may be a core particle (chip). The core particles are wafers (Die) which are manufactured in advance, have specific functions and can be combined and integrated, data links in transmission among the core particles are often designed in parallel in multiple channels (Lane), and if a certain channel is failed or blocked, transmission of the whole data link is stopped. In contrast, by adopting the decoupling control method for the transmission of the core grain multichannel data link, the strong coupling relation between multichannel parallel transmission data contents can be weakened or eliminated, so that the problem that the transmission of the whole data link is suspended due to the fault or blockage of a certain channel in the multichannel parallel data transmission is solved, and the transmission of the high-speed IO bus link between each chip or each slot of the high-performance heterogeneous interconnected computing system obtains better RAS (RAS) characteristic (reliability, availability and serviceability).
When data transmission is carried out among the core particles, the main processor can firstly send out an application data packet, and the application data packet passes through a protocol application layer, a protocol network layer and a protocol link layer in the on-chip interconnection protocol bus and reaches a link transaction layer in an external interface link. The link transaction layer data packet is transmitted to the data link layer and then reaches the physical layer through the data link layer. The data link layer comprises a plurality of physical control sublayer channels, and the physical layer comprises a plurality of physical channels. The improvement of the application mainly lies in a data link layer and a physical layer, and the strong coupling relation of the parallel transmission data content between physical control sublayer channels is weakened by executing corresponding decoupling control operation on the physical control sublayer channels. And after the decoupling control operation is executed, in order to obtain better RAS characteristics, further control is needed to realize fault tolerance and load balancing among multiple channels.
Specifically, the execution main body may first obtain a Transaction Layer Packet (TLP) in the media data access control sublayer (MAC Layer), and allocate the same number of packets (TLPs) in the Transaction Layer Packet to the same physical control sublayer group. And the physical control sublayer group comprises at least one physical control sublayer channel. For the case that only one physical control sublayer channel is included in the physical control sublayer group, the same packet in the transaction layer packet may be allocated to the same physical control sublayer channel. Due to the fact that the data coupling between the same data packets is high, compared with a mode that the same data packet is distributed to different physical control sublayer channels, the data coupling between the same physical control sublayer channels can be reduced in a mode that the same data packet is distributed to the same physical control sublayer channel. For the situation that the physical control sublayer group comprises two or more physical control sublayer channels, the same data packet can be distributed to the two or more physical control sublayer channels of the same physical control sublayer group so as to reduce the coupling between the physical control sublayer groups, and the two or more physical control sublayer channels in the same physical control sublayer group process the same data packet together, so that the processing efficiency of the data packet is improved.
And, one physical control sublayer channel may correspond to one link control logic. Compared with a mode that a plurality of physical control sublayer channels correspond to one link control logic, the method has the advantages that one link control logic is adopted to control one physical control sublayer channel, and the coupling between the physical control sublayer channels can be reduced. The link control logic may include control logic for a control flow first-in-first-out queue of the physical control sublayer channel, control logic for credits of the physical control sublayer channel, and error correction control logic for the physical control sublayer channel.
By the method and the device, the same packet data in the TLP data are distributed to the same PCS lane (in PCS Group), and the same packet data in the TLP data are data in FC FIFO (flow control first-in first-out queue) output by a MAC (media data access control) layer. Therefore, the strong coupling relation among the contents of the multichannel parallel transmission data can be weakened or eliminated, and the multichannel parallel transmission independent work is realized. The MAC TLP may perform data transmission from other PCS lane links when a channel fails or is blocked, so as to solve the problem that the transmission of the entire data link is suspended due to the failure or the blockage of a channel in the multi-channel parallel data transmission.
Referring also to fig. 2, fig. 2 is a flowchart illustrating a decoupling control method for inter-core multi-channel data link transmission according to another exemplary embodiment, the decoupling control method for inter-core multi-channel data link transmission including the steps of:
s21, acquiring a transaction layer data packet in a media data access control sublayer;
and S22, distributing the same data packet in the transaction layer data packets to the same physical control sub-layer group based on preset data fault tolerance operation and/or preset load balancing operation so as to realize decoupling, data fault tolerance and/or load balancing among the physical control sub-layer groups.
As an optional implementation manner, the preset load balancing operation at least includes the following steps:
acquiring a credit ready state table; wherein the credit ready state table is used for representing the receiving state of the physical control sublayer channel for the transaction layer data packet;
performing weighted service quality arbitration on the credit ready state table to obtain an arbitration result; and the arbitration result is used for indicating the service quality of data transmission of each physical control sublayer channel.
As an optional embodiment, allocating the same data packet in the transaction layer data packets to the same physical control sublayer group based on a preset load balancing operation includes:
if the physical control sublayer group comprises one physical control sublayer channel, distributing the same data packet in the transaction layer data packets to the physical control sublayer channel with the same service quality meeting preset quality conditions according to a cross matrix selection mode based on the arbitration result;
and if the physical control sublayer group comprises two or more physical control sublayer channels, distributing the same data packet in the transaction layer data packet to the two or more physical control sublayer channels with the service quality meeting the preset quality condition in the same physical control sublayer group according to a cross matrix selection mode based on the arbitration result.
In this embodiment, the credit ready state table may be used to indicate the receiving state of each physical control sublayer channel for the transaction layer data packet, and by analyzing the credit ready state table, it may be determined which channels in the physical control sublayer channels can receive more transaction layer data packets, and then allocate the transaction layer data packets that need to be allocated to the physical control sublayer channels with stronger receiving capability. In particular, after the credit ready state table is obtained, weighted quality of service arbitration may be performed on the credit ready state table. The Quality of Service (QoS) refers to that a network can use various basic technologies to provide better Service capability for specified network communication, and is a security mechanism of the network, and is a technology for solving the problems of network delay and congestion. And performing weighted service quality arbitration on the credit ready state table, wherein the obtained arbitration result is used for indicating the service quality of data transmission of each physical control sublayer channel. When distributing the data packet, the optimal physical control sublayer channel can be selected based on the arbitration result.
Specifically, if the physical control sublayer group includes one physical control sublayer channel, based on the arbitration result, a transaction layer data packet input by a plurality of virtual channels in the media data access control sublayer is allocated to the same physical control sublayer channel with the best service quality according to a Matrix Mux (Matrix Mux) manner. And if the physical control sublayer group comprises two or more physical control sublayer channels, based on the arbitration result, allocating the same data packet in the transaction layer data packets input by a plurality of virtual channels in the media data access control sublayer to the two or more physical control sublayer channels in the same physical control sublayer group with the optimal current service quality according to a Matrix Mux (Matrix Mux) mode. At this time, the preset quality condition may be used to indicate that the qos parameter of the physical control sublayer channel satisfies the corresponding parameter condition, i.e., it is determined that the qos is at a better level. By setting the parameter conditions, the flexibility of determining the service quality can be improved, for example, the parameter conditions are set to be severer conditions, and an optimal at least one physical control sublayer channel can be selected as a channel for distributing the data packet at this time.
The MAC layer of the data link layer may perform weighted QOS arbitration based on a Credit Ready Status Table (CRST) of the PCS layer, so as to obtain the arbitration result. By the method for distributing the TLP data packets by the weight QOS, the number of the data packets transmitted on each physical control sublayer channel can be balanced at the same time, so that the problem of load balance among multiple channels is solved. And, grouping all physical control sublayer channels to obtain a physical control sublayer group, where a plurality of adjacent physical control sublayer channels may be taken as a whole and become an object to be allocated by the FCFIFO output by the MAC layer according to a cross matrix selection (matrix mux), so that the larger the number of physical control sublayer channels in the physical control sublayer group is, the smaller the transmission delay of a single TLP packet back and forth in the data link layer is, the better the performance is, but the stronger the dependency of parallel transmission data content between the physical control sublayer channels is. Therefore, the number of physical control sublayer channels in which the link fails and cannot normally operate, the data bandwidth requirement of the MAC layer link, the system transmission performance, the transmission delay, and other conditions can be comprehensively considered, the number of physical control sublayer groups and the number of physical control sublayer channels in each physical control sublayer group can be reasonably set, and the specific numerical value is not limited in this embodiment. And through data alignment among multiple channels, the condition that extra invalid bandwidth on a link is occupied due to multi-channel data swing can be effectively eliminated, so that the effective transmission bandwidth of the link is improved.
As an alternative embodiment, allocating the same data packet in the transaction layer data packets to the same physical control sublayer group based on a preset data fault tolerance operation and a preset load balancing operation, includes:
acquiring a fault state vector table; wherein, the failure state vector table is used for representing the failure state of the physical control sublayer channel;
if the physical control sublayer group comprises one physical control sublayer channel, distributing the same data packet in the transaction layer data packets to the physical control sublayer channel with the same service quality meeting the preset quality condition and without fault according to a cross matrix selection mode based on the fault state vector table and the arbitration result;
and if the physical control sublayer group comprises two or more physical control sublayer channels, distributing the same data packet in the transaction layer data packet to the two or more non-fault physical control sublayer channels with service quality meeting preset quality conditions in the same physical control sublayer group according to a cross matrix selection mode based on the fault state vector table and the arbitration result.
In this embodiment, the fault status vector table is used to represent fault statuses of the physical control sublayer channels, wherein the fault statuses may include a fault and no fault. When data packet allocation is performed, the same data packet in the transaction layer data packets can be allocated to the same physical control subchannel without a fault by combining the fault state vector table.
According to the FC FIFO credit condition of the PCS channels, which PCS channels are in a state capable of receiving a TLP packet of the transaction layer may be marked in a hierarchical manner, and then a TLP of a protocol layer in a certain virtual channel is arbitrated from the MAC layer in a weight priority manner to enter a corresponding and appropriate PCS channel for data interaction.
As an optional implementation, the method further comprises:
acquiring channel quantity configuration parameters and group quantity configuration parameters of a physical control sublayer;
setting the number of the physical control sublayer channels and the number of the physical control sublayer groups in the physical control sublayer group based on the channel number configuration parameter and the group number configuration parameter.
In this embodiment, the configuration can be customized for the number of channels of the physical control sublayer and for the number of groups of the physical control sublayer. The channel number configuration parameters and the group number parameters are obtained, the number of corresponding physical control sublayer channels is set, the number of physical control sublayer groups is set, and the decoupling degree and the data processing efficiency between the physical control sublayers can be balanced.
As an optional implementation, the link control logic includes at least one of: the system comprises a physical control sublayer channel flow control first-in first-out queue, physical control sublayer channel transmission allocation logic, a retransmission buffer unit, cyclic redundancy check, error check and correction, physical control sublayer channel credit ready state table logic and physical control sublayer channel failure-free state marking logic.
In this embodiment, the control logic corresponding to the flow control fifo queue of the physical control sublayer channel may be configured to control data transmission of a control flow in a corresponding physical control sublayer channel, the control logic corresponding to the retransmission buffer unit (Retry buffer) may be configured to trigger Retry buffering for abnormal data in the physical control sublayer channel, the control logic corresponding to Cyclic Redundancy Check (CRC) may be configured to detect or check errors that may occur after data transmission or storage in the physical control sublayer channel, the control logic corresponding to Error Check and Correction (ECC) may be configured to trigger check operations and correction operations for erroneous data in the physical control sublayer channel, the credit ready state table logic of the physical control sublayer channel may manage and control of the corresponding physical control sublayer channel, and the physical control sublayer channel no-fault state flag logic may flag a physical control sublayer channel that can normally perform packet transmission. CRC error detection and ECC error correction can be carried out on data transmitted on a PCS channel, and retransmission operation can be carried out on link data which cannot be corrected, so that the problem of temporary data transmission errors in each channel is solved.
As an optional implementation manner, each physical control sublayer channel shares a data buffer space of the retransmission buffer unit.
In this embodiment, each physical control sublayer channel can share the data cache space of the retransmission cache unit, so that the data cache space of the retransmission cache unit can be saved, the size of the retransmission cache unit can be expanded, and the retransmission capability can be improved.
As an optional implementation manner, the preset data fault tolerance operation further includes the following steps:
controlling the target control operation of the transaction layer data packet at the media data access control sublayer; the target control operation is at least one of: cyclic redundancy check, dirty data marking, error data packet counting logic, transaction layer data packet overtime counting logic and transaction layer data packet error interruption triggering logic;
if the cyclic redundancy check detects that the physical control sublayer returns the specified error data, determining the error correction category corresponding to the specified error data; wherein the error correction category comprises a correctable category or an uncorrectable category;
and if the error correction category is the uncorrectable category, executing the backup data retransmission operation in the retransmission cache unit.
In this embodiment, if uncorrectable data is returned to the media data access control sublayer, a corresponding data error may be determined through cyclic redundancy check. Or, these uncorrectable data may be marked by the dirty data flag, and the uncorrectable data with the flag is returned to the main processor or the protocol layer interconnection bus, so that the upper layer logic determines whether to discard the faulty data, perform abnormal interruption on the data marked by the dirty data, perform original data retransmission request, or the like according to the importance degree of the data error.
In this embodiment, the media data access control sublayer may perform cyclic redundancy check on the transaction layer packet, and if the cyclic redundancy check detects specified error data, perform error importance determination. The specified error data is data that is not correctable and is returned by the physical control sublayer. And if the error importance degree corresponding to the error importance judgment reaches a set degree, determining that the error correction type is the uncorrectable type, and executing the backup data retransmission operation in the retransmission cache unit to try to correct the specified error data.
As an optional implementation, the method further comprises:
acquiring target data after the dirty data marking is executed;
returning the target data to a main processor device or an on-chip interconnection protocol bus so that the main processor device or the on-chip interconnection protocol bus determines the error importance degree corresponding to the target data;
executing specified data processing operation according to the error importance degree; wherein the specified data processing operation comprises at least one of: retransmission, ignore, discard, interrupt, update; wherein the update comprises an update to a fault vector status table and/or an update to a credit ready status table.
In this embodiment, the execution main body may control, at the media data access control sublayer, to perform dirty data marking on the transaction layer data packet, so as to obtain target data after the dirty data marking. And then, returning the target data to the main processor device or the on-chip interconnection protocol bus to make the main processor device or the on-chip interconnection protocol bus perform error importance judgment to obtain the error importance degree corresponding to the target data. If the importance degree of the error reaches the set degree, the target data can be retransmitted, ignored, discarded, interrupted and the like. If the importance degree of the error is higher than the threshold value, the error is determined to be serious, and the fault vector state table and the credit ready state table can be updated according to the error.
When uncorrectable data in the PCS layer returns to the MAC layer, the error can be detected through CRC check of the MAC layer, and whether a retransmission request is performed is determined according to the importance degree of the data error, and the retransmission request is returned to the host processor device or the protocol layer interconnection bus to request a retransmission operation of the erroneous TLP data packet. Or, when uncorrectable data in the PCS layer returns to the MAC layer, a dirty data flag may be returned to the host processor or the protocol layer interconnection bus along with the data or the response, and the upper layer logic determines whether to ignore or discard the faulty data, to perform an abnormal interrupt on the dirty data, or to perform a processing operation such as a data retransmission request.
As an optional implementation, the method further comprises:
setting a transaction layer data packet timeout timer and/or a transaction layer data packet retransmission counter in the media data access control sublayer;
controlling, at the media data access control sublayer, a target control operation on the transaction layer packet, including:
and if the transaction layer data packet timeout timer detects timeout and/or the transaction layer data packet retransmission counter detects the number of times exceeding the set number, reporting an exception to the main processor device so as to enable the main processor device to perform fault processing.
In this embodiment, by setting the transaction layer data packet timeout timer and the transaction layer data packet retransmission counter in the media data access control sublayer, it is possible to monitor that a serious temporary data transmission error or a high-frequency temporary data transmission error storm occurs in the physical control sublayer group or the physical control sublayer channel, and after the transaction layer data packet in the media data access control sublayer is allocated to the physical control sublayer group or the physical control sublayer channel, the transaction layer data packet does not respond for a long time and times of timeout or times of timeout occur. At this time, reporting of the exception to the main processor device may be triggered, so that the main processor device performs a corresponding processing operation. Among other things, fault handling may include, but is not limited to, operations such as application process shutdown, restart, data link reset or retraining, and even an entire system reset restart.
As an optional implementation manner, when the PCS channels in each PCS group are different, or the number of PCS channels that can actually work normally is different from the original configuration number due to a link failure, the input allocation module of the PCS channel may allocate a TLP data packet in the FC FIFO output by the MAC layer according to the FC FIFO CRST input by the PCS channel in the PCS group and the actual failure mapping condition of each PCS channel in the PCS group, that is, the MAC layer TLP data entering the PCS group is allocated in parallel to a plurality of PCS channels that can work normally inside according to a certain processing principle to be transmitted in parallel. Therefore, the configuration design of different PCS channels in different PCS groups is realized, and the actual fault mapping condition of FCFIFO CRST input by the PCS channels and each PCS channel in the PCS groups can be dynamically updated in real time in the working process of a data transmission link, so that the problem that a certain physical channel in multiple channels has a permanent fault is solved by a dynamic index remapping reconfigurable method.
Referring to fig. 3, fig. 3 is an overall architecture diagram of a multi-channel data Link transmission between core particles according to an exemplary embodiment, as shown in fig. 3, when the multi-channel data Link transmission is performed between core particles, a Host processor device (Host CPU) may send an application packet to reach a Link Transaction Layer (Link Transaction Layer) in an External Interface connection Link (External Interface Link) through a Protocol transmission Layer (Protocol Transaction Layer), a Protocol Network Layer (Protocol Network Layer), and a Protocol Link Layer (Protocol Link Layer) in a Protocol interconnection bus (Protocol Interconnect). The method mainly comprises the steps of carrying out scheme processing on a transaction Layer Data packet (TLP) from a Link transaction Layer in a Data Link Layer (Link Data Layer) and a Physical Layer (Physical Layer), and comprehensively finishing the design of a channel fault-tolerant and channel load balancing scheme of complete decoupling of the Data Link Layer in multi-channel (Lane) transmission by matching with a retransmission (replay), dirty Data marking (poison) and interrupt (inter) exception reporting mode. The link control and PHY Lane (Physical Lane) of a PCS (Physical control Sublayer) are in a one-to-one correspondence relationship, a plurality of PCS lanes control a shared retry buffer to implement link data retransmission, and the Physical Lane may implement fault adaptive fault tolerance by using a redundancy replacement policy. Specifically, it can be seen that the physical layer has a replay PHY Lane (physical channel for failure recovery) set.
Referring to fig. 4 together, fig. 4 is a diagram illustrating a decoupling control architecture for core inter-core multi-channel data link transmission according to another exemplary embodiment, as shown in fig. 4, the top of fig. 4 is a media data access control sublayer link layer (hereinafter referred to as MAC layer), where the MAC layer has input buffers (input buffers) of 2 virtual channel TLPs (transaction layer packets), which are VC0 and VC1, respectively. And, the flow control FIFO queue output by the MAC layer is in a credit mode. In the MAC layer, logic such as a TLP Timeout Timer, replay Logic, a TLP CRC (Cyclic Redundancy Check), a Poison Logic, a CRC Check, and a Replay Counter may be added. And, the MAC layer may also support the use of a TLP status buffer table and a PCS CRST (Credit Ready status table). The MAC layer can arbitrate Quality of Service (QOS) based on the PCS CRST, and distribute multiple virtual channel input TLP packets to corresponding PCS lanes for transmission in a Matrix selection (Matrix Mux) manner. And the MAC layer can report the exception to the upper layer processing logic in coordination with the retransmission (replay), dirty data flag (poison), and interrupt (intr) exception reporting modes.
For the PCS link layer, the alternative shown in fig. 4 includes 8 PCS groups, each of which includes a PCS Lane. Each PCS Lane corresponds to a PCS link control, wherein the PCS link control corresponds to FC FIFO (control flow first-in first-out queue), retry Buffer, CRC (Cyclic Redundancy Check), ECC (Error Correcting Code), and Credit Ready status in fig. 4. Also, the PCS Link layer can support PCS Link Control, PCS Lane Input Allocation, PCS Flow Control, PCS FIFO CRST (credit for PCS FIFO queue), PHY Lanes FSVT (fault state vector table for physical channel), PCS Index IMRT (Index relation mapping table between PCS Lane and PHY Lane).
For the physical layer, as shown in FIG. 4, there are 8 conventional physical lanes, 1 redundant physical lane, and one failed physical lane. The PCS link control and the PHY Lane are in one-to-one correspondence, a plurality of PCS control and share a retry buffer to realize link data retransmission, and the physical Lane adopts a redundancy replacement strategy and the like.
Referring also to fig. 5, fig. 5 is a diagram illustrating an architecture of a decoupling control for transmission of a multi-channel inter-core grain data link according to another exemplary embodiment, and the main difference between the architecture diagram of fig. 5 and fig. 4 is that 2 PCS groups are set in a PCS link control layer, and each PCS Group includes 4 PCS lanes. Referring also to fig. 6, fig. 6 is a diagram illustrating an architecture of a decoupling control for transmission of a multi-channel inter-core grain data link according to another exemplary embodiment, and the main difference between the architecture diagram in fig. 6 and fig. 4 is that 1 PCS Group is set in a PCS link control layer, and each PCS Group includes 8 PCS lanes.
Referring to fig. 7, fig. 7 is a mapping diagram illustrating mapping between physical control sublayer channels and physical channels according to an exemplary embodiment, and as shown in fig. 7, 2 physical control sublayer groups (PCS Group) are taken as an example, and for the physical control sublayer Group 1 and the physical control sublayer Group 2, each physical control sublayer Group respectively includes 4 physical control sublayer channels (PCS lane1, PCS lane2, PCS lane3, and PCS lane 4). When data link transmission is carried out between core particles, each physical control sublayer channel needs to be mapped into a corresponding physical channel in a physical layer. In fig. 7, the physical layer may include 9 physical lanes, and there are 1 Redundant physical Lane (PYH _2 Lane reducer) and 8 normal physical lanes in the 9 physical lanes. In the example of FIG. 7, physical lane PYH _1 Lane _3can be a failed physical lane. And in contrast, the fault-tolerant mapping mode is adopted, the fault physical channel can be skipped, and the corresponding fault-free physical channel is distributed to each physical control sublayer channel by combining the preset redundant physical channel. In practical applications, the setting of the number of redundant physical channels may be determined according to actual needs, and this embodiment does not limit this.
The processing performed in the physical layer can replace a certain physical channel which has a permanent fault and cannot work for introducing redundancy, and the fault states of the conventional physical lane and the repair lane are marked in a link test ring section of the physical layer, so that a complete fault state vector table can be obtained in data link layer PCS transmission. Therefore, before the link is recovered to be normal, a software or hardware algorithm can provide a reasonable Index (Index) relation of the PCS lane channel by combining a fault state vector table in a flexible configuration mode and map the reasonable Index (Index) relation to the normal physical lane.
The same data packet in the transaction layer data packet in the media data access control sublayer is allocated to the same physical control sublayer group, the data dependency in the same data packet is strong, and the data dependency between different data packets is weak, so that the data with strong dependency in the same data packet is allocated to the same physical control sublayer group (channel), and the data with weak dependency between different data packets can be allocated to different physical control sublayer groups (channels), so that the coupling between the physical control sublayer groups (channels) can be reduced, the decoupling control of the transmission of the core inter-particle multichannel data link is realized, the problems of strong transmission coupling and data link transmission suspension caused by channel failure or blocking existing in the core inter-particle data link are solved, the fault-tolerant processing is performed on the transmission in the decoupled multilink, the load balancing processing is performed, and the link transmission has better reliability, availability and serviceability (RAS) characteristics.
Further, the present application also provides a core particle, where the core particle performs inter-core-particle data link transmission by the decoupling control method for inter-core-particle multichannel data link transmission.
It is understood that the same or similar parts in the above embodiments may be mutually referred to, and the same or similar parts in other embodiments may be referred to for the content which is not described in detail in some embodiments.
It should be noted that, in the description of the present application, the terms "first", "second", etc. are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. In addition, in the description of the present application, the meaning of "plurality" means at least two unless otherwise specified.
It will be understood that when an element is referred to as being "fixed" or "disposed" to another element, it can be directly on the other element or intervening elements may also be present; when an element is referred to as being "connected" to another element, it can be directly connected to the other element or intervening elements may also be present, and further, as used herein, connected may include wirelessly connected; the term "and/or" is used to include any and all combinations of one or more of the associated listed items.
Any process or method descriptions in flow charts or otherwise described herein may be understood as: represents modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and the scope of the preferred embodiments of the present application includes additional implementations in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present application.
It should be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware that is related to instructions of a program, and the program may be stored in a computer-readable storage medium, and when executed, the program includes one or a combination of the steps of the method embodiments.
In addition, functional units in the embodiments of the present application may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a separate product, may also be stored in a computer readable storage medium.
The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc.
In the description herein, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
Although embodiments of the present application have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present application, and that variations, modifications, substitutions and alterations may be made to the above embodiments by those of ordinary skill in the art within the scope of the present application.

Claims (10)

1. A decoupling control method for transmission of a multi-channel inter-core data link, the method comprising:
acquiring a transaction layer data packet in a media data access control sublayer;
distributing the same data packet in the transaction layer data packets to the same physical control sub-layer group to realize the decoupling among the physical control sub-layer groups; the physical control sublayer group comprises at least one physical control sublayer channel, and one physical control sublayer channel corresponds to one link control logic.
2. The method of claim 1, wherein assigning the same one of the transaction layer packets to the same group of physical control sublayer comprises:
and distributing the same data packet in the transaction layer data packets to the same physical control sub-layer group based on preset data fault tolerance operation and/or preset load balancing operation so as to realize data fault tolerance and/or load balancing among the physical control sub-layer groups.
3. The method according to claim 2, wherein the predetermined load balancing operation comprises at least the following steps:
acquiring a credit ready state table; wherein the credit ready state table is used for representing the receiving state of the physical control sublayer channel for the transaction layer data packet;
performing weighted service quality arbitration on the credit ready state table to obtain an arbitration result; and the arbitration result is used for indicating the service quality of data transmission of each physical control sublayer channel.
4. The method according to claim 3, wherein assigning the same one of the transaction layer packets to the same group of physical control sublayer based on a preset load balancing operation comprises:
if the physical control sublayer group comprises one physical control sublayer channel, distributing the same data packet in the transaction layer data packets to the physical control sublayer channel with the same service quality meeting a preset quality condition according to a cross matrix selection mode based on the arbitration result;
and if the physical control sublayer group comprises two or more physical control sublayer channels, distributing the same data packet in the transaction layer data packet to the two or more physical control sublayer channels with the service quality meeting the preset quality condition in the same physical control sublayer group according to a cross matrix selection mode based on the arbitration result.
5. The method according to claim 3, wherein assigning the same one of the transaction layer packets to the same group of physical control sub-layers based on a preset data fault tolerance operation and a preset load balancing operation comprises:
acquiring a fault state vector table; the fault state vector table is used for representing the fault state of the physical control sub-layer channel;
if the physical control sublayer group comprises one physical control sublayer channel, distributing the same data packet in the transaction layer data packets to the physical control sublayer channel with the same service quality meeting the preset quality condition and without fault according to a cross matrix selection mode based on the fault state vector table and the arbitration result;
and if the physical control sublayer group comprises two or more physical control sublayer channels, distributing the same data packet in the transaction layer data packet to the two or more non-fault physical control sublayer channels with service quality meeting preset quality conditions in the same physical control sublayer group according to a cross matrix selection mode based on the fault state vector table and the arbitration result.
6. The method of claim 1, further comprising:
acquiring channel quantity configuration parameters and group quantity configuration parameters of a physical control sublayer;
setting the number of the physical control sublayer channels and the number of the physical control sublayer groups in the physical control sublayer group based on the channel number configuration parameter and the group number configuration parameter.
7. The method of claim 1, wherein the link control logic comprises at least one of: the method comprises the steps that a physical control sublayer channel flow control first-in first-out queue, physical control sublayer channel transmission allocation logic, a retransmission cache unit, cyclic redundancy check, error check and correction, physical control sublayer channel credit ready state table logic and physical control sublayer channel fault-free state marking logic are arranged; and each physical control sublayer channel shares the data cache space of the retransmission cache unit.
8. The method of claim 2, wherein the predetermined data fault tolerance operation further comprises the steps of:
controlling the target control operation of the transaction layer data packet at the media data access control sublayer; the target control operation is at least one of: cyclic redundancy check, dirty data marking, error data packet counting logic, transaction layer data packet overtime counting logic and transaction layer data packet error interruption triggering logic;
if the cyclic redundancy check detects that the physical control sublayer returns the specified error data, determining the error correction category corresponding to the specified error data; wherein the error correction categories include a correctable category or an uncorrectable category;
and if the error correction category is the uncorrectable category, executing the backup data retransmission operation in the retransmission cache unit.
9. The method of claim 8, further comprising:
acquiring target data after the dirty data marking is executed;
returning the target data to a main processor device or an on-chip interconnection protocol bus so that the main processor device or the on-chip interconnection protocol bus determines the error importance degree corresponding to the target data;
executing specified data processing operation according to the error importance degree; wherein the specified data processing operation comprises at least one of: retransmission, ignore, discard, interrupt, update; wherein the update comprises an update to a fault vector status table and/or an update to a credit ready status table.
10. The method of claim 8, further comprising:
setting a transaction layer data packet overtime timer and/or a transaction layer data packet retransmission counter on the media data access control sublayer;
controlling, at the media data access control sublayer, a target control operation on the transaction layer packet, including:
if the transaction layer data packet timeout timer detects timeout and/or the transaction layer data packet retransmission counter detects exceeding of a set number of times, reporting an exception to a main processor device so that the main processor device performs fault handling operation, wherein the fault handling operation includes at least one of the following operations: retransmission, ignore, discard, interrupt, reset restart.
CN202310043090.1A 2023-01-29 2023-01-29 Decoupling control method for inter-chip multi-channel data link transmission Active CN115794731B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310043090.1A CN115794731B (en) 2023-01-29 2023-01-29 Decoupling control method for inter-chip multi-channel data link transmission

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310043090.1A CN115794731B (en) 2023-01-29 2023-01-29 Decoupling control method for inter-chip multi-channel data link transmission

Publications (2)

Publication Number Publication Date
CN115794731A true CN115794731A (en) 2023-03-14
CN115794731B CN115794731B (en) 2023-07-04

Family

ID=85428986

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310043090.1A Active CN115794731B (en) 2023-01-29 2023-01-29 Decoupling control method for inter-chip multi-channel data link transmission

Country Status (1)

Country Link
CN (1) CN115794731B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116627894A (en) * 2023-07-20 2023-08-22 之江实验室 Medium access control layer, communication method and system
CN117834755A (en) * 2024-03-04 2024-04-05 中国人民解放军国防科技大学 Interface circuit between protocol layer and adapter layer facing core particle interconnection interface and chip
CN117834755B (en) * 2024-03-04 2024-05-10 中国人民解放军国防科技大学 Interface circuit between protocol layer and adapter layer facing core particle interconnection interface and chip

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101547192A (en) * 2008-03-24 2009-09-30 大唐移动通信设备有限公司 Method and device for allocating and transmitting TCP data pockets
CN106850188A (en) * 2017-01-24 2017-06-13 中国航天系统科学与工程研究院 A kind of data transmission system based on multichannel isomery one-way transmission path
US20180337993A1 (en) * 2017-05-22 2018-11-22 Microsoft Technology Licensing, Llc Sharding over multi-link data channels
US20190227972A1 (en) * 2019-04-02 2019-07-25 Intel Corporation Virtualized link states of multiple protocol layer package interconnects
CN115622666A (en) * 2022-12-06 2023-01-17 北京超摩科技有限公司 Fault channel replacement method for transmission of data link between core particles and core particles

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101547192A (en) * 2008-03-24 2009-09-30 大唐移动通信设备有限公司 Method and device for allocating and transmitting TCP data pockets
CN106850188A (en) * 2017-01-24 2017-06-13 中国航天系统科学与工程研究院 A kind of data transmission system based on multichannel isomery one-way transmission path
US20180337993A1 (en) * 2017-05-22 2018-11-22 Microsoft Technology Licensing, Llc Sharding over multi-link data channels
US20190227972A1 (en) * 2019-04-02 2019-07-25 Intel Corporation Virtualized link states of multiple protocol layer package interconnects
CN115622666A (en) * 2022-12-06 2023-01-17 北京超摩科技有限公司 Fault channel replacement method for transmission of data link between core particles and core particles

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116627894A (en) * 2023-07-20 2023-08-22 之江实验室 Medium access control layer, communication method and system
CN116627894B (en) * 2023-07-20 2023-10-20 之江实验室 Medium access control layer, communication method and system
CN117834755A (en) * 2024-03-04 2024-04-05 中国人民解放军国防科技大学 Interface circuit between protocol layer and adapter layer facing core particle interconnection interface and chip
CN117834755B (en) * 2024-03-04 2024-05-10 中国人民解放军国防科技大学 Interface circuit between protocol layer and adapter layer facing core particle interconnection interface and chip

Also Published As

Publication number Publication date
CN115794731B (en) 2023-07-04

Similar Documents

Publication Publication Date Title
JP4961481B2 (en) Bridging Serial Advanced Technology Attachment (SATA) and Serial Attached Small Computer System Interface (SCSI) (SAS)
US7739432B1 (en) Command switching for multiple initiator access to a SATA drive
US8213294B2 (en) Mechanism for detecting and clearing I/O fabric lockup conditions for error recovery
JP5874879B2 (en) I / O device control method and virtual computer system
US6418488B1 (en) Data transfer state machines
US7584377B2 (en) System, machine, and method for maintenance of mirrored datasets through surrogate writes during storage-area networks transients
US8312187B2 (en) Input/output device including a mechanism for transaction layer packet processing in multiple processor systems
US8560878B2 (en) System and method for failure detection by a storage expander preceding an expander experiencing a failure
US10078543B2 (en) Correctable error filtering for input/output subsystem
CN106021147B (en) Storage device exhibiting direct access under logical drive model
US20120311199A1 (en) Fibre channel input/output data routing including discarding of data transfer requests in response to error detection
US20160077997A1 (en) Apparatus and method for deadlock avoidance
US20080184077A1 (en) Method and system for handling input/output (i/o) errors
US8583989B2 (en) Fibre channel input/output data routing system and method
EP3608791B1 (en) Non-volatile memory switch with host isolation
US8683084B2 (en) Fibre channel input/output data routing system and method
US7962676B2 (en) Debugging multi-port bridge system conforming to serial advanced technology attachment (SATA) or serial attached small computer system interface (SCSI) (SAS) standards using idle/scrambled dwords
WO2007114887A1 (en) Error handling in a data storage system
CN115794731A (en) Decoupling control method for transmission of multi-channel data link between core particles
CN115622666B (en) Fault channel replacement method for transmission of data link between core particles and core particles
US5805791A (en) Method and system for detection of and graceful recovery from a peripheral device fault
US8090789B1 (en) Method of operating a data storage system having plural data pipes
US8402320B2 (en) Input/output device including a mechanism for error handling in multiple processor and multi-function systems
EP1890439B1 (en) Data processing management apparatus, mode management apparatus and mode management method
US8683083B2 (en) Fibre channel input/output data routing system and method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant