CN111131091B - Inter-chip interconnection method and system for network on chip - Google Patents

Inter-chip interconnection method and system for network on chip Download PDF

Info

Publication number
CN111131091B
CN111131091B CN201911374246.4A CN201911374246A CN111131091B CN 111131091 B CN111131091 B CN 111131091B CN 201911374246 A CN201911374246 A CN 201911374246A CN 111131091 B CN111131091 B CN 111131091B
Authority
CN
China
Prior art keywords
data
packet
virtual channel
chip
buffer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911374246.4A
Other languages
Chinese (zh)
Other versions
CN111131091A (en
Inventor
廖文康
邓慧鹏
罗毅
肖山林
虞志益
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
Original Assignee
Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen University filed Critical Sun Yat Sen University
Priority to CN201911374246.4A priority Critical patent/CN111131091B/en
Publication of CN111131091A publication Critical patent/CN111131091A/en
Application granted granted Critical
Publication of CN111131091B publication Critical patent/CN111131091B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/10Packet switching elements characterised by the switching fabric construction
    • H04L49/109Integrated on microchip, e.g. switch-on-chip

Abstract

The invention discloses a network-on-chip oriented inter-chip interconnection method and system, wherein the method comprises the following steps: acquiring a sending end data packet through a router; processing the information of the data packet and storing the data packet into a virtual channel; arbitrating the processed information, and selecting a target virtual channel to finish data transmission and generate a command packet; updating the state of the virtual channel according to the command packet; coding and scrambling according to the transmitted data and/or command packet, converting the coded and scrambled data and/or command packet into serial data and transmitting the serial data to a receiving end; the method converts useless information in the original data packet into useful inter-chip coding information through a virtual channel and an arbitration mode, can meet the requirements of network priority and multicast on a chip, cannot cause low-priority blockage due to high-priority blockage, reduces data coding overhead, improves effective data transmission efficiency, and can be widely applied to the field of integrated circuit design.

Description

Inter-chip interconnection method and system for network on chip
Technical Field
The invention relates to the field of integrated circuit design, in particular to a network-on-chip oriented inter-chip interconnection method and system.
Background
Network-on-chip (noc): a new communication method of a system-on-chip (SoC) is a main component of multi-core technology.
Virtual channel: a Virtual Channel (VC) is a communication circuit that can transport ATM cells between two or more endpoints; ATM is a packet-oriented technology, packets of which are called cells, and ATM cells are composed of two parts, a cell header and a Payload (Payload). The cell header contains cell control information, and the payload is used for carrying user data.
A state machine: the state machine is a control center which is composed of a state register and a combinational logic circuit, can carry out state transition according to a preset state according to a control signal, coordinates the action of the related signal and completes a specific operation.
Bus (Bus): the bus of the computer can be divided into a data bus, an address bus and a control bus according to the type of information transmitted by the computer, and the data bus, the address bus and the control bus are respectively used for transmitting data, data addresses and control signals.
Earlier, intra-chip modules or inter-chip communications were transmitted in parallel, since the parallel bit width was large, and the rate at the same frequency was clearly superior to that of the through-going. However, as the interface frequency increases, a series of problems such as clock skew, data skew, skew between data and clock, and noise and wiring limit the continuous increase of data bit width. The clock frequency of the source synchronous interface has already met the bottleneck, because of the non-ideal characteristic of the channel, while continuing to raise the frequency, the signal can be seriously damaged, at this moment the serial bus has returned to people's view because of the development demand of the communication speed. The serial bus has a smaller pin number than the parallel bus, so that the interconnection difficulty and the PCB (printed circuit board) routing difficulty can be reduced. As the requirement for transmission bandwidth increases, the parallel bus needs to meet the constraint requirement for the equal length of the signal lines of the bus, which puts more severe requirements on EMI (electromagnetic interference) and crosstalk. In contrast, with the development of clock recovery technology and equalization technology, the serial bus rate can be higher and higher. Meanwhile, the serial bus can have stronger anti-interference capability, has longer transmission distance and is suitable for transmission of different media (optical fibers, wireless and the like). The high-speed serial bus is adopted at the cost of high-speed signal transmission, and higher requirements are provided for the PCB material, signal integrity and anti-interference capability. Serial bus communication includes two parts: analog circuits and digital circuits. The analog circuit part is usually solved using SerDes (Serializer/Deserializer, Serializer and Deserializer), while the digital part is specified by the protocol.
The current mainstream high speed protocol has the following disadvantages:
a. there is no inter-chip protocol for network-on-chip design. Although the network on chip well meets the requirement of the neural network distributed multi-core interconnection, no inter-chip protocol designed for the network on chip exists at present.
b. The overhead of each layer of data coding is very large, taking 256 bytes of data transmission as an example, the ethernet packing efficiency is 79%, the PCIE is 92%, and the Rapid I/O (4X) is 92-94%. (packing efficiency refers to the ratio of payload length to total packet length and does not take into account physical layer coding overhead). For large data volumes of network on chip, the coding overhead can significantly reduce the network on chip performance.
c. Communication overhead caused by a flow control mechanism and a protocol is too large, and for example, PCIE takes as an example, overhead caused by flow control and a protocol is present in both a data link layer and a physical layer.
Disclosure of Invention
To solve one of the above technical problems, the present invention aims to: the method and the system for interconnecting the chips are designed for the network on chip, simple in protocol and low in cost.
The first technical scheme adopted by the invention is as follows: an inter-chip interconnection method facing a network on chip comprises the following steps:
acquiring a sending end data packet through a router;
processing the information of the data packet and storing the data packet into a virtual channel;
arbitrating the processed information, and selecting a target virtual channel to finish data transmission and generate a command packet;
updating the state of the virtual channel according to the command packet;
coding and scrambling according to the transmitted data and/or command packet;
and converting the data and/or command packet subjected to coding and scrambling into serial data and transmitting the serial data to a receiving end.
Further, the method also comprises the following steps: acquiring data and/or command packets after coding and scrambling; packing the data and/or command into parallel data; restoring normal data according to the parallel data and completing descrambling and decoding; and when the command contains a backpressure instruction, stopping sending by the sending end.
Further, the step of arbitrating the processed information, selecting the target virtual channel to complete data transmission and generate the command packet further includes: caching data to be transmitted into a buffer and a retransmission buffer at the same time; when the data is normally transmitted, the data is directly read out from the buffer; when the data is retransmitted, the data is read out from the retransmission buffer.
Further, the step of sending the data and command packet with completed coding and scrambling to the receiving end specifically includes: initializing a state machine to complete data alignment and channel binding; when the control field is generated, the state machine generates and transmits the control field according to the priorities of different control fields, data and commands; when the control field is sent, the state machine sends a request signal to the data and command interface to stop receiving and sending data.
Further, the command packet is composed of: a start control field, a type field, a data field, a check code field, and an end control field.
Further, the command packet includes: an acknowledgement command packet, a negative acknowledgement command packet, a retransmission command packet, and an address update command packet.
Further, the step of arbitrating the processed information and selecting the target virtual channel specifically includes: generating a request list and a virtual channel list according to the arbitration request; generating a result list according to the request list and the channel list; and selecting a target virtual channel according to the result list.
The second technical scheme adopted by the invention is as follows: an inter-chip interconnection system facing a network on chip, comprising:
the system comprises a sending end, a router, a serializer, a deserializer, a buffer register, an arbitration module, a virtual channel counting module, a local virtual channel counting module, a buffer, a retransmission buffer, a packaging module, a selector, a command interface, a data interface, an encoder, a scrambler, a decoder, a descrambler, a speed change module and an elastic buffer;
the sending end is used for sending data packets;
the router is used for receiving a data packet sent by a sending end;
the serializer and deserializer are used for converting parallel data into serial data and converting the serial data into parallel data;
the receiving end is used for receiving a data packet and a command packet;
the buffer register is used for processing the information of the data packet and storing the data packet into a virtual channel;
the arbitration module is used for arbitrating the processed information, selecting a target virtual channel to finish data transmission and generate a command packet;
the virtual channel is used for transmitting data packets;
the virtual channel counting module is used for managing the available depth of the virtual channel of the receiving end;
the local virtual channel counting module is used for managing the available depth of the local virtual channel;
the buffer is used for directly reading the data from the buffer when the data are normally transmitted;
the retransmission buffer is used for reading out the data from the retransmission buffer when the data is retransmitted;
the packaging module is used for packaging data of the buffer or the retransmission buffer and transmitting the data to the data interface;
the selector is used for selecting a virtual channel;
the command interface and the data interface are respectively used for receiving a command packet and a data packet;
the encoder is used for encoding the command packet and the data packet;
the scrambler is used for scrambling the command packet and the data packet;
the decoder is used for decoding the encoded data packet and the command packet;
the descrambler is used for descrambling the scrambled data packet and the command packet;
the speed changing module is used for synchronizing the data of the physical layer and the speed of the deserializer;
and the elastic buffer is used for synchronizing clock signals between the chips.
Further, an interface of the buffer register is communicated with the router through a bus to send and receive data; the bus signals include bus input data, a bus output acknowledge signal, and a bus output valid signal.
Further, the bus output response signal is parameterized and is synchronously adjusted according to the priority number, and the interface number is synchronously adjusted according to the scale.
The invention has the beneficial effects that:
the method of the invention not only converts the useless information in the original data packet into useful inter-chip coding information, but also can meet the requirements of network-on-chip priority and multicast by processing the information of the data packet and storing the data packet into the virtual channel and arbitrating, can not cause low-priority blockage due to high-priority blockage, and transmits the data to the receiving end through the processes of coding scrambling and decoding descrambling, thereby reducing the data coding cost and improving the effective data transmission efficiency.
The system deletes redundant transactions through self-created buses and virtual channels with different priorities, and the lightweight system design can greatly reduce logic resources; through the redesigned command packet and the control field of the physical layer, the logic is simple and efficient, the requirement of the network on chip is met, the accurate transmission of data is guaranteed through the integration of frequent command packets, the ratio of the command packet/the data packet is reduced, and the link transmission efficiency is greatly improved.
Drawings
FIG. 1 is a flowchart of a network-on-chip oriented inter-chip interconnection method according to an embodiment of the present invention;
FIG. 2 is a schematic structural diagram of a priority virtual channel of an inter-chip interconnection method for a network-on-chip according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of an encoding method of an inter-chip interconnection method for a network on chip according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of an arbitration flow of a network-on-chip oriented inter-chip interconnection method according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a Req _ List and a Result _ List of a network-on-chip inter-chip interconnection method according to an embodiment of the present invention
FIG. 6 is a diagram illustrating fields of a command packet of an inter-chip interconnection method for a network-on-chip according to an embodiment of the present invention;
FIG. 7 is a schematic diagram of a CRI bus of an inter-chip interconnection system for a network-on-chip according to an embodiment of the present invention;
fig. 8 is a schematic overall structure diagram of an inter-chip interconnection system oriented to a network on chip according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to specific examples. The step numbers in the following embodiments are provided only for convenience of illustration, the order between the steps is not limited at all, and the execution order of each step in the embodiments can be adapted according to the understanding of those skilled in the art.
As shown in fig. 1, which is a flowchart of specific steps of an embodiment of the present invention, a method for interconnecting chips oriented to a network on chip of the embodiment includes the following specific steps:
s01, obtaining a data packet at a sending end through a router, specifically, the buffer register communicates with the router through a CRI (Ack/Request Interface) bus to send and receive data, and at the same time, resolves a header packet flit in the network on chip to perform next arbitration, where the CRI bus is a new bus provided in this embodiment, and a definition of a bus signal is shown in table 1:
TABLE 1
Name of signal Description of the invention
cri_din[129:0] CRI bus input data
cri_ack_out[3:0] CRI bus output answer signal
cri_valid_out CRI bus output signal valid
S02, processing the data packet and storing the processed data packet into a virtual channel; specifically, data cri _ din transmitted from the router is analyzed in a buffer Register Tap Register to obtain priority Cls, multicast type and packet Length, an address addr is analyzed through a routing algorithm, the result is spliced in a data bus and written into a txb _ vc _ data bus, wherein the lower 130 bits are cri _ din, and the upper six bits are txb _ vc _ addr [3:0 ]]And txb _ vc _ type [1: 0)]In this embodiment, the lowest priority VC0With three virtual channels VC, as shown in FIG. 2, preferably stored in the lowest VC0And then stored in the intermediate VC0If two VCs0All are in full state, then stored into the VC at the top layer0In the method, the concurrency of sending the low-priority data packet is improved, the requests of two packets can be sent simultaneously, and the condition that one packet request fails to block the following data packet is avoided.
The Header flit of the network on chip usually has a reserved field, and taking the 130-bit flit in this embodiment as an example, the Header flit includes a type (2bits, multicast type), a Cls (2bits, priority), a DestAddr (28bits, destination address), a SrcAddr (28bits, source address), a length (4bits, packet length), and in addition to two bits representing a Header, a body, and a tail, a 64-bit reserved field is also provided.
According to the main stream inter-chip protocol, the header flit will be transmitted as payload data payload, i.e. 64 bits of garbage will be transmitted along with the header flit. Considering that inter-chip transmission needs encoding for stable transmission, the reserved field in the Header Flit may be removed and replaced with encoding information to be added, as shown in fig. 3, the Header Flit, Body Flit, and Tail Flit are Header, Body, and Tail packets; information is the previously mentioned Typ, Cls, DestAddr, SrcAddr and length; reserve indicates that fields are reserved, which are not used; start and End are Start and End flags; sequence is the Sequence number of the packet; the CRC (cyclic Redundancy check) is a cyclic Redundancy check code.
S03, arbitrating the processed information, selecting a target virtual channel to finish data transmission and generate a command packet; specifically, after the analysis is completed, arbitration is performed, and the process mainly includes three signals, as shown in table 2:
TABLE 2
Figure GDA0002394303990000051
Figure GDA0002394303990000061
The arbitration principle is to transmit the packet with the highest priority first and then transmit the packet with the low priority. As shown in fig. 4, in the present embodiment, when the arbitration module receives an arbitration request, 4 Req _ lists (request lists) are generated, where each Req _ List is a two-dimensional array of 4 × 4 and corresponds to 4 priority VC sending requests respectively; the VC Credit of the opposite side (Credit is a flow control mechanism and represents the available depth of the target VC) generates 1 VC _ List (virtual channel List) which respectively corresponds to 4 VC available spaces with different priorities, if the VC available spaces are available, 1 is written, otherwise, 0 is written; each Req _ List and VC _ List generate 1 one-dimensional List, which reflects whether each VC in the current Req _ List can be read or not; these 4 one-dimensional lists are spliced together into a Result _ List array of 4x 4. Obtaining an one-dimensional List by performing OR operation on each row of the List, reversely deducing the priority to be read through the List, checking a Result _ List value under the corresponding priority, judging whether Result is 1 according to the current target pointer, and if not, moving the pointer to the right and judging again until the target VC is pushed out at the position where the pointer is 1; and finally, outputting the result, and updating the local Credit after outputting the arbitration result. As shown in fig. 5, RXBs are structures of destination receivers, each RXB has 4 kinds of priorities VC, and is connected to a router.
In addition, the command packet is also generated by the arbitration process, communicating with the command interface of the physical layer. Since the network on chip requires inter-chip transmission to ensure correct transmission from data packets, many command packets (including flow control packets and control fields) are unnecessary, and command packets can be divided into two categories according to the frequency of transmission: one type is required to be sent frequently or every few cycles, and the other type is required to be sent under specific conditions or when the device is idle, as shown in table 3:
TABLE 3
Figure GDA0002394303990000062
Figure GDA0002394303990000071
Taking Ack/Nack (acknowledgement signal Ack, Nack negative acknowledgement) and credit flow control mechanisms in this embodiment as examples, a 64-bit command packet is used, flow control packets that are frequently sent are integrated together, but the sending frequency is low, the flow control packets remain unchanged, fields of the command packet are configured as shown in fig. 6, and descriptions of the fields are shown in table 4:
TABLE 4
Field(s) Bit wide (bits) Description of the invention
start 8 Start control field for physical layer addition
cmd type 4 Indicating flow controlType of bag
data&reserve 28 Traffic packet with data
crc 16 Check code CRC of flow control packet
end
8 End control field of physical layer addition
In the embodiment, the Ack & credit and the Nack & credit commands are integrated, because the Ack/Nack and credit are synchronized frequently in the mainstream protocol. Meanwhile, the flow control packets required by the network on chip are added, and the rest are deleted unnecessarily.
S04, updating the virtual channel state according to the command packet; specifically, the transmitting end synchronously updates the VC Credit at the receiving end; the arbitration module needs to synchronize the Credit value of the receiving party, subtracts 1 from the corresponding Credit after sending the request Req (if the request Req is multicast, updates a plurality of credits), and adds the corresponding Credit value after receiving the target Credit increment. In addition, the arbitration module packs the local Credit information to the receiving end, and the local VC needs to update the VC state according to the sending condition of the data packet of the receiving end. The synchronous Credit algorithm considers that an increment threshold-crossing mechanism and a timeout mechanism are adopted, namely, the Credit is synchronous after the increment of the Credit reaches a certain amount, or the Credit is not updated for a long time and is updated overtime. And a special mechanism is adopted for the Credit fast fullness of the high-priority VC sending end, so that the Credit fast synchronization is ensured. The arbitration module receives Credit increments sent by 4 RXBs (receiving end structures, one RXB comprises four VCs and is connected with a router) to form 1 4x4 two-dimensional array, the Credit increments of the RXBs are summed, and the RXBs are sent when the Credit increments pass a threshold; and if the time is out, the message is also sent.
S05, coding and scrambling according to the transmitted data and/or command packet; specifically, when the physical layer receives data and commands from the data link layer, the priority of the commands is higher than that of the data, so when the commands are transmitted, the ready signal interfacing with the data interface of the data link layer needs to be pulled down to prevent data loss. The data and command channels are the same and are uniformly output to the scrambling module and the coding module. The scrambling codes adopt the IEEE 802.3 standard:
G(x)=1+x^39+x58
coding according to data and commands, outputting the Start field and the End field, initializing a state machine by a physical layer before data transmission, sending a control field shown in a table 5 for data alignment and channel binding:
TABLE 5
Controlcharacter name Description of the invention
Channel bonding For initializing state machines
Not ready For initializing state machines
Idle For initializing state machines and data stitching
Back pressure Fast fill back pressure for elastic buffer
Except three necessary control fields of not ready, channel binding and Idle, only one Backpressure code is defined. This control code is different from SKP for PCIe and Clock Compensation (CC) for Aurora. The mainstream protocol is to transmit SKP/CC at every certain time (ppm, Parts Per Million, here indicating the accuracy of clock recovery) for inter-chip clock compensation. The Back pressure command is sent to inform the sending end to stop sending when the Elastic buffer is fast full.
When the control field needs to be generated, the generation and transmission of the control field are carried out according to the priorities of different control fields, data and commands; when the control field needs to be sent, the physical layer sends a request signal to the data and command interface to stop receiving and sending data, so that the command and data are prevented from being lost.
S06, sending the data and/or command packet which completes the coding and scrambling to a receiving end; specifically, the data after the encoding and scrambling are matched and then sent to a SerDes (serial-to-parallel converter) for parallel-to-serial conversion.
As a further preferred embodiment, after receiving the serial data, the receiving end converts the data into parallel data, and then recovers the normal data through the data boundary found when initializing the state machine, and transmits the data to the descrambling code and the decoder. The data analyzed in the decoder will be transmitted to an Elastic Buffer (the Elastic Buffer is actually asynchronous Fifo for synchronizing the clocks between chips), the clock at the write end is the clock recovered by SerDes from the serial data stream, and the clock at the read end is the local clock. When the Elastic Buffer is fast full, a back pressure packet needs to be generated to perform back pressure on data at a transmitting end. When the sending end receives the back pressure packet transmitted by the receiving end, the Ready signal of the data and command interface needs to be pulled down to prevent the overflow of the Elastic Buffer, and the data being transmitted is not affected, so that the data packet loss is prevented.
In addition, an embodiment of the present invention further provides an embodiment of an inter-chip interconnection system for a network on chip corresponding to the method embodiment, where the embodiment includes:
the sending terminal is used for sending the data packet;
the router is used for receiving the data packet sent by the sending end;
a serializer and deserializer for converting parallel data into serial data and serial data into parallel data;
the receiving end is used for receiving the data packet and the command packet;
the buffer register is used for processing the information of the data packet and storing the data packet into a virtual channel;
the arbitration module is used for arbitrating the processed information, selecting a target virtual channel to finish data transmission and generate a command packet;
the virtual channel is used for transmitting the data packet;
the virtual channel counting module is used for managing the available depth of the virtual channel of the receiving end;
the local virtual channel counting module is used for managing the available depth of the local virtual channel;
the buffer is used for directly reading the data from the buffer when the data are normally transmitted;
the retransmission buffer is used for reading out the data from the retransmission buffer when the data is retransmitted;
the packaging module is used for packaging the data of the buffer or the retransmission buffer and transmitting the data to the data interface;
a selector for selecting a virtual channel;
the command interface and the data interface are respectively used for receiving a command packet and a data packet;
an encoder for encoding the command packet and the data packet;
the scrambler is used for scrambling the command packet and the data packet;
a decoder for decoding the encoded data packet and the command packet;
the descrambler is used for descrambling the scrambled data packet and the command packet;
the speed changing module is used for synchronizing the data of the physical layer with the speed of the deserializer;
and the elastic buffer is used for synchronizing clock signals between the chips.
As a preferred embodiment of this example, as shown in fig. 7, the buffer registers communicate with the router via a bus for data transmission and reception; the bus signal comprises bus input data, a bus output response signal and a bus output effective signal; wherein, R is a router, core is a core mounted on the Network on chip, NI is a Network Interface, and Transaction Layer is a Transaction Layer (including a data link Layer and a physical Layer).
As a preferred embodiment of this example, the bus output acknowledge signal is parameterized, synchronously adjusted according to the number of priorities, and the number of interfaces synchronously adjusted according to the size.
The specific operation of the system of the present invention is described in detail below with reference to fig. 8:
in this embodiment, the data packet bit width is 130 bits, the highest two bits are SOP/EOP, and the remaining 128 bits are valid data. The SOP/EOP is used for distinguishing Header filters (10), body packets flags (00) and Tail packets Tail flags (01). In fig. 8, VC is a virtual channel, partner VC is a counting module for managing the depth of VC available at the receiving end, seconds control is a control module of a physical layer seconds IP, a Buffer core Retry Buffer is used for data buffering, and a packager (packing module) transmits data for selecting the Buffer to the physical layer; the Cmd interface is a command interface, and the data interface is a data interface; encoder is coder, Scrambler is Scrambler; the Gearbox (variable speed module) is used for synchronizing the data of the physical layer with the speed of the deserializer; de-scrambler is a descrambler, Decode is a Decoder, and Elastic buffer is actually asynchronous Fifo (first in first out buffer); local VC is Local VC available depth counting module, MUX is selector, R is router
The Data Link Layer (Data Link Layer) first communicates with Routers over the CRI bus. Req is the Router request sending data; data is a Data bus; the Ack is similar to the Ready signal and indicates whether the receiving end can receive the data packet with different priorities. According to the requirement of the Data packet format, Data is a 130-bit wide signal, and Ack is a 4-bit wide signal. After detecting that the Ack at the receiving end is high, the transmitting end transmits a Data packet, Data and corresponding req (valid) to be pulled high. After the end of packet transmission is completed, req (valid) is pulled low. After the data link layer receives the Header flit, the information to be processed includes:
1. a destination address (DestAddr, 28Bits) for a routing algorithm, and if multicast, a multicast label;
2. a multicast type (type, 2Bits) for judging whether the packet is multicast or not and searching a multicast table;
3. the priority (Cls, 2Bits) corresponds to 4 kinds of priorities, and the larger the Cls value is, the higher the priority is;
4. length (4 Bits), the number of flits remaining in the Packet except the header.
After removing the Reserve field in the header packet, adding Sequence, CRC and addr, priority and multicast type analyzed by a routing algorithm, and storing the addr, priority and multicast type in a virtual channel with corresponding priority; a read request txb _ crm _ req signal is sent to the arbitration module according to the content of the first packet in each priority VC. Sending a read request signal to the arbitration module, wherein the read request signal comprises txb _ VC _ addr and txb _ VC _ type in a packet header of a data packet to be read of 4 types of priority VC. If the current priority VC has no packet to be read or is being read, the request signal txb _ crm _ req becomes 0, preventing a packet from continuing to send a request after receiving the request. If the arbitration module responds to the read request command, the VC starts to read Fifo after receiving ack signal, which is a pulse signal, and deletes header information spliced in the data packet after receiving a necessary response. And after the transmission of the current data packet is finished, sending a txb _ crm _ finish signal to the arbitration module.
The VC with successful arbitration transfers the data to a buffer connected to a Physical Layer (Physical Layer) data interface, and buffers the data to a Retry buffer. Packager is used to handle SerDes busy and rate match. The Packager has 2 input ports, one is from TX Buffer (normal transmission Buffer) and one is from Retry Buffer, and is controlled by Packager _ source _ sel, if normal transmission data, sel is 0, TX Buffer data is selected to be read out, and in case of retransmission, sel (selection signal) is 1, Retry Buffer is selected to be read out.
The arbitration module is also a module for command packet generation and therefore communicates with the command interface of the physical layer. The stream control uses a Credit synchronization mechanism to prevent the receiving end from overflowing caused by the data transmission of the transmitting end. Therefore, the transmitting end synchronously updates the VC Credit at the receiving end. The arbitration module needs to synchronize the Credit value of the receiving party, subtracts 1 from the corresponding Credit after sending the Req (if the Req is multicast, updates a plurality of credits), and adds the corresponding Credit value after receiving the target Credit increment. In addition, the arbitration module packs the local Credit information to the receiving end.
The local VC control end needs to update the VC state according to the sending condition of the data packet of the receiving end. The synchronous Credit algorithm considers that an increment threshold-crossing mechanism and a timeout mechanism are adopted, namely, the Credit is synchronous after the increment of the Credit reaches a certain amount, or the Credit is not updated for a long time and is updated overtime. And a special mechanism is adopted for the Credit fast fullness of the high-priority VC sending end, so that the Credit fast synchronization is ensured. The arbitration module receives Credit increments sent by 4 RXBs (receiving end structures, one RXB comprises four VCs and is connected with a router) to form 1 4x4 two-dimensional array, the Credit increments of the RXBs are summed, and the RXBs are sent when the Credit increments pass a threshold; and if the time is out, sending the message.
The physical layer receives data and commands from the data link layer, and the data is received twice because the data is 130 bits; the command packet is 64 bits and thus can be received at one time; since the priority of the command is higher than the data here, the ready signal interfacing with the data link layer data interface needs to be pulled low to prevent loss of data when the command is transmitted. The data and command channels are the same and are uniformly output to the scrambling module and the coding module.
The code addition is coded according to data and commands, before the physical layer can normally transmit data, a state machine needs to be initialized, and control fields in the table 5 are sent to carry out data alignment and channel binding. When the control field needs to be generated, the generation and transmission of the control field are carried out according to the priorities of different control fields, data and commands; when the control field needs to be sent, a request signal is sent to the data and command interface to stop receiving and sending data, and the loss of commands and data is prevented. And sending the data after the coding and scrambling are finished to the SerDes through the Gearbox for parallel-serial conversion. The Gearbox functions to match the rate and bit width of the PCS (physical coding sublayer) layer and PMA (physical medium adaptation layer) to prevent loss of data.
After receiving the serial data, the receiving end converts the data into parallel data, restores normal data through the data boundary found when initializing the state machine, and transmits the data to the descrambling code and the decoder. The data parsed from the decoder is transmitted to an Elastic Buffer. The Elastic Buffer is actually asynchronous Fifo in order to synchronize the clocks between the chips. The write-in clock is the clock recovered by the SerDes from the serial data stream, and the read-out clock is the local clock. When the Elastic Buffer is fast full, a back pressure packet needs to be generated to perform back pressure on data at a transmitting end. When the sending end receives the back pressure packet transmitted by the receiving end, the Ready signal of the data and command interface needs to be pulled down to prevent the overflow of the Elastic Buffer, and the data being transmitted is not affected, so that the data packet loss is prevented. The above case not described by the receiving end indicates the opposite procedure to that of the transmitting end.
The contents in the above method embodiments are all applicable to the present system embodiment, the functions specifically implemented by the present system embodiment are the same as those in the above method embodiment, and the beneficial effects achieved by the present system embodiment are also the same as those achieved by the above method embodiment.
Compared with the prior art, the seismic elastic impedance low-frequency information extraction method, the system and the storage medium have the following advantages:
1) the scheme of the invention deletes redundant transactions and buses, increases CRI buses, priority design and the like. There are no redundant functions and transactions other than transceiving router data. The lightweight design can greatly reduce logic resources; the virtual channel and the arbitration mode can meet the requirements of network-on-chip priority and multicast. While this design does not result in low priority blocking due to high priority blocking. The arbitration method is efficient, and the arbitration time only needs 2-3 cycles for various arbitration requirements, thereby not increasing the delay.
2) Aiming at the data format of the network-on-chip header flit, the inter-chip transmission codes are embedded into the header flit, the reserve field in the header flit is deleted, meanwhile, the coding part required in transmission is replaced, and the coding part is restored to a receiving end, so that the original useless information is converted into useful inter-chip coding information; if the header flit is regarded as payload, the coding overhead is only that of the physical layer; 8b/10b coding is adopted, and the efficiency is 80%; 64/66b is adopted for coding, and the efficiency is 97%; if the header flit is regarded as encoding information used in transmission, taking 256 bytes of data as an example, the encoding overhead is only 1/(16+1) ═ 0.06 multiplied by the physical layer encoding overhead (0.2 or 0.03); the coding mode greatly reduces the data coding overhead and improves the effective data transmission efficiency;
3) the scheme of the invention redesigns the command packet and the physical layer control field, and has simple logic and high efficiency. Compared with a mainstream protocol, the flow control packet and the control field are greatly reduced, the design and the logic are simple, and the requirement of the network on chip can be met. Meanwhile, the integration of frequent flow control packets ensures accurate data transmission, reduces the ratio of command packet/data packet, and greatly improves the link transmission efficiency. The redesign of the physical layer control field also makes it unnecessary for the physical layer to transmit valid data instead of sending the control field every time. Physical layer overhead is also reduced compared to the mainstream protocol.
The step numbers in the above method embodiments are set for convenience of illustration only, the order between the steps is not limited at all, and the execution order of each step in the embodiments can be adapted according to the understanding of those skilled in the art.
While the preferred embodiments of the present invention have been illustrated and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (9)

1. An inter-chip interconnection method facing a network on chip is characterized by comprising the following steps:
acquiring a sending end data packet through a router;
processing the information of the data packet and storing the data packet into a virtual channel;
arbitrating the processed information, and selecting a target virtual channel to finish data transmission and generate a command packet;
updating the state of the virtual channel according to the command packet;
coding and scrambling according to the transmitted data and/or command packet;
converting the data and/or command packet which completes the coding and scrambling into serial data and sending the serial data to a receiving end;
the step of arbitrating the processed information, selecting the target virtual channel to complete data transmission and generate the command packet further comprises:
caching data to be transmitted into a buffer and a retransmission buffer at the same time;
when the data is normally transmitted, the data is directly read out from the buffer;
when the data is retransmitted, the data is read out from the retransmission buffer.
2. The method according to claim 1, further comprising the following steps:
acquiring data and/or command packets after coding and scrambling;
converting the data and/or command packets into parallel data;
restoring normal data according to the parallel data and completing descrambling and decoding;
and when the command contains a backpressure instruction, stopping the sending end from sending operation.
3. The method according to claim 1, wherein the step of sending the coded and scrambled data and command packet to a receiving end specifically comprises:
initializing a state machine to complete data alignment and channel binding;
when the control field is generated, the state machine generates and transmits the control field according to the priorities of different control fields, data and commands;
when the control field is sent, the state machine sends a request signal to the data and command interface to stop receiving and sending data.
4. The network-on-chip oriented inter-chip interconnection method according to any one of claims 1 to 3, wherein the command packet is configured to include: a start control field, a type field, a data field, a check code field, and an end control field.
5. The network-on-chip oriented inter-chip interconnection method according to claim 4, wherein the command packet comprises: an acknowledgement command packet, a negative acknowledgement command packet, a retransmission command packet, and an address update command packet.
6. The method according to claim 1, wherein the step of arbitrating the processed information and selecting the target virtual channel specifically comprises:
generating a request list and a virtual channel list according to the arbitration request;
generating a result list according to the request list and the channel list;
and selecting a target virtual channel according to the result list.
7. An inter-chip interconnection system for a network on chip, comprising: the system comprises a sending end, a router, a serializer, a deserializer, a buffer register, an arbitration module, a virtual channel counting module, a local virtual channel counting module, a buffer, a retransmission buffer, a packaging module, a selector, a command interface, a data interface, an encoder, a scrambler, a decoder, a descrambler, a variable speed module, an elastic buffer and a receiving end;
the sending end is used for sending data packets;
the router is used for receiving a data packet sent by a sending end;
the serializer and deserializer are used for converting parallel data into serial data and converting the serial data into parallel data;
the receiving end is used for receiving a data packet and a command packet;
the buffer register is used for processing the information of the data packet and storing the data packet into a virtual channel;
the arbitration module is used for arbitrating the processed information, selecting a target virtual channel to finish data transmission and generate a command packet; the step of arbitrating the processed information, selecting the target virtual channel to complete data transmission and generate the command packet further comprises: caching data to be transmitted into a buffer and a retransmission buffer at the same time; when the data is normally transmitted, the data is directly read out from the buffer; when the data is retransmitted, reading the data from the retransmission buffer;
the virtual channel is used for transmitting data packets;
the virtual channel counting module is used for managing the available depth of the virtual channel of the receiving end;
the local virtual channel counting module is used for managing the available depth of the local virtual channel;
the buffer is used for directly reading the data from the buffer when the data are normally transmitted;
the retransmission buffer is used for reading out the data from the retransmission buffer when the data is retransmitted;
the packaging module is used for packaging data of the buffer or the retransmission buffer and transmitting the data to the data interface;
the selector is used for selecting a virtual channel;
the command interface and the data interface are respectively used for receiving a command packet and a data packet;
the encoder is used for encoding the command packet and the data packet;
the scrambler is used for scrambling the command packet and the data packet;
the decoder is used for decoding the encoded data packet and the command packet;
the descrambler is used for descrambling the scrambled data packet and the command packet;
the speed changing module is used for synchronizing the data of the physical layer and the speed of the deserializer;
and the elastic buffer is used for synchronizing clock signals between the chips.
8. The network-on-chip oriented inter-chip interconnection system of claim 7, wherein the interface of the buffer register communicates with a router via a bus for data transmission and reception; the bus signals include bus input data, a bus output acknowledge signal, and a bus output valid signal.
9. The network-on-chip oriented inter-chip interconnect system of claim 8, wherein the bus output acknowledge signal is parameterized and synchronously adjusted according to a priority number, and the number of bus interfaces is synchronously adjusted according to a size.
CN201911374246.4A 2019-12-25 2019-12-25 Inter-chip interconnection method and system for network on chip Active CN111131091B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911374246.4A CN111131091B (en) 2019-12-25 2019-12-25 Inter-chip interconnection method and system for network on chip

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911374246.4A CN111131091B (en) 2019-12-25 2019-12-25 Inter-chip interconnection method and system for network on chip

Publications (2)

Publication Number Publication Date
CN111131091A CN111131091A (en) 2020-05-08
CN111131091B true CN111131091B (en) 2021-05-11

Family

ID=70503738

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911374246.4A Active CN111131091B (en) 2019-12-25 2019-12-25 Inter-chip interconnection method and system for network on chip

Country Status (1)

Country Link
CN (1) CN111131091B (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112866148B (en) * 2021-01-12 2022-06-24 之江实验室 Inter-chip data transmission system for brain-like computing chip and implementation method
CN113297022B (en) * 2021-06-09 2023-03-24 海光信息技术股份有限公司 Method and device for testing expansion bus of high-speed serial computer
WO2023279369A1 (en) * 2021-07-09 2023-01-12 华为技术有限公司 Data transmission apparatus and method, and related device
CN113676310B (en) * 2021-07-29 2023-09-12 北京无线电测量研究所 Data transmission device for radar system
CN113886299B (en) * 2021-09-24 2024-03-26 同济大学 AXI Stream protocol high-speed parallel-serial conversion system based on Aurora improved link protocol
CN113868172A (en) * 2021-09-28 2021-12-31 上海兆芯集成电路有限公司 Interconnection interface
CN114301491A (en) * 2022-01-18 2022-04-08 德氪微电子(深圳)有限公司 Millimeter wave wireless connector chip, wireless connector and signal transmission system
CN114679423A (en) * 2022-03-25 2022-06-28 中国电子科技集团公司第五十八研究所 Flow control mechanism-oriented deadlock-free extensible interconnected bare core structure
CN114691374B (en) * 2022-05-31 2022-10-28 苏州浪潮智能科技有限公司 Request distribution method of virtual channel and related device
WO2024040604A1 (en) * 2022-08-26 2024-02-29 华为技术有限公司 Data transmission method and apparatus
CN117156006B (en) * 2023-11-01 2024-02-13 中电科申泰信息科技有限公司 Data route control architecture of network on chip

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101841420B (en) * 2010-05-24 2011-11-23 中国人民解放军国防科学技术大学 Network-on-chip oriented low delay router structure
US9660942B2 (en) * 2015-02-03 2017-05-23 Netspeed Systems Automatic buffer sizing for optimal network-on-chip design
CN105721355A (en) * 2016-01-29 2016-06-29 浪潮(北京)电子信息产业有限公司 Method for transmitting message through network-on-chip route and network-on-chip route

Also Published As

Publication number Publication date
CN111131091A (en) 2020-05-08

Similar Documents

Publication Publication Date Title
CN111131091B (en) Inter-chip interconnection method and system for network on chip
US7689757B2 (en) Systems and methods for data transfer
US6735218B2 (en) Method and system for encoding wide striped cells
US5784370A (en) Method and apparatus for regenerating a control signal at an asynchronous transfer mode (ATM) layer or a physical (PHY) layer
US6452927B1 (en) Method and apparatus for providing a serial interface between an asynchronous transfer mode (ATM) layer and a physical (PHY) layer
JP4326939B2 (en) Parallel data communication with data group without skew tolerance
US20020097713A1 (en) Backplane interface adapter
US20050132089A1 (en) Directly connected low latency network and interface
CN108462620B (en) Gilbert-level SpaceWire bus system
EP1958404A2 (en) Alignment and deskew for multiple lanes of serial interconnect
JP3448241B2 (en) Interface device for communication device
US20100257293A1 (en) Route Lookup System, Ternary Content Addressable Memory, and Network Processor
US8848526B1 (en) Network processor with traffic shaping response bus interface
US7272675B1 (en) First-in-first-out (FIFO) memory for buffering packet fragments through use of read and write pointers incremented by a unit access and a fraction of the unit access
JP2986798B2 (en) Data transmission control method and data communication device
JP2005018768A (en) Dual-port functionality for single-port cell memory device
US8792348B1 (en) Serial link interface power control method and apparatus with selective idle data discard
CN101702714B (en) Method, system, and apparatus for a credit based flow control in a computer system
US5748917A (en) Line data architecture and bus interface circuits and methods for dual-edge clocking of data to bus-linked limited capacity devices
TW202306365A (en) Method for data processing of frame receiving of an interconnection protocol and storage device
Liao et al. An efficient and low-overhead chip-to-chip interconnect protocol design for NoC
Georgiou et al. Scalable protocol engine for high-bandwidth communications
Pradhitha et al. Development and Implementation of Parallel to Serial Data Transmitter using Aurora Protocol for High Speed Serial Data Transmission on Virtex-7 FPGA
Liao et al. A low-cost and high-throughput NoC-aware chip-to-chip interconnection
CN214253208U (en) IP core of dual-redundancy serial port controller

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant