CN112486871B - Routing method and system for on-chip bus - Google Patents

Routing method and system for on-chip bus Download PDF

Info

Publication number
CN112486871B
CN112486871B CN202011342617.3A CN202011342617A CN112486871B CN 112486871 B CN112486871 B CN 112486871B CN 202011342617 A CN202011342617 A CN 202011342617A CN 112486871 B CN112486871 B CN 112486871B
Authority
CN
China
Prior art keywords
transmission
predicted
destination
port
transmission bandwidth
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011342617.3A
Other languages
Chinese (zh)
Other versions
CN112486871A (en
Inventor
余德君
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Haiguang Information Technology Co Ltd
Original Assignee
Haiguang Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Haiguang Information Technology Co Ltd filed Critical Haiguang Information Technology Co Ltd
Priority to CN202011342617.3A priority Critical patent/CN112486871B/en
Publication of CN112486871A publication Critical patent/CN112486871A/en
Application granted granted Critical
Publication of CN112486871B publication Critical patent/CN112486871B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/16Handling requests for interconnection or transfer for access to memory bus
    • G06F13/1668Details of memory controller
    • G06F13/1678Details of memory controller using bus width
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3027Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a bus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/16Handling requests for interconnection or transfer for access to memory bus
    • G06F13/1668Details of memory controller
    • G06F13/1684Details of memory controller using multiple buses
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The embodiment of the application provides a routing method and a system for an on-chip bus, wherein the routing method comprises the following steps: acquiring the actual transmission bandwidth occupation amount of each destination port in at least one destination port on a bus; acquiring predicted transmission bandwidth occupation amount of each destination port in the at least one destination port in a future predicted time period; and determining a transmission strategy for the data to be transmitted from the source port according to the actual transmission bandwidth occupation amount and the predicted transmission bandwidth occupation amount. According to the method and the device, the actual bandwidth occupation condition and the predicted bandwidth occupation condition of each destination port in the bus transmission process are monitored, so that a proper target outlet is provided for data to be transmitted, and when the data transmission direction is up, the data to be transmitted is sent to a host (or an on-chip storage unit) connected with the target port through the target outlet.

Description

Routing method and system for on-chip bus
Technical Field
The present application relates to the field of computer IO, and in particular, embodiments of the present application relate to a routing method and system for an on-chip bus.
Background
Computer I/O technology has always been a very important key technology in the development of high performance computing technology. The technical characteristics determine the processing capacity of the computer I/O, and further determine the overall performance of the computer and the application environment. Fundamentally, I/O technology will restrict the application and development of computer technology, both now and in the future, especially in the high-end computing field.
The computer I/O peripheral mainly includes a device connected to PCIE, USB, SATA and Ethernet. In server applications, access to memory by multiple I/O devices is involved, as well as access to P2P (Peer to Peer) between multiple I/O devices. Access to these I/O devices is required to meet bandwidth requirements, as well as unified and flexible route management. In a system-on-chip or system-on-chip SoC chip, this routing management is implemented by a dedicated IO routing module, or collectively IOHUB. IOHUB connects various I/O peripherals and CPU memory, as well as routes local to remote I/O devices and memory accesses. With the increase of the demands of the server for the I/O devices, the SoC chip of the server needs to support more I/O devices, so as to meet the larger bandwidth demands.
Therefore, how to meet the simultaneous access of more peripheral devices to the CPU memory and improve the routing capability of the IOHUB is an urgent technical problem to be solved.
Disclosure of Invention
An objective of the embodiments of the present application is to provide a routing method and a system for an on-chip bus, by which bandwidth and transmission efficiency of an IOHUB can be at least effectively improved. For example, in some embodiments of the present application, by increasing the number of destination ports, more source ports may be supported, and by improving the bandwidth monitoring mechanism and flexible routing, the transmission efficiency is improved.
In a first aspect, some embodiments of the present application provide a routing method for an on-chip bus, the routing method comprising: acquiring the actual transmission bandwidth occupation amount of each destination port in at least one destination port on a bus; acquiring predicted transmission bandwidth occupation amount of each destination port in the at least one destination port in a future predicted time period; and determining a transmission strategy for the data to be transmitted from the source port according to the actual transmission bandwidth occupation amount and the predicted transmission bandwidth occupation amount.
According to the method and the device, the actual bandwidth occupation condition and the predicted bandwidth occupation condition of each destination port in the bus transmission process are monitored, so that a proper target outlet is provided for data to be transmitted, and when the data transmission direction is up, the data to be transmitted is sent to a host (or an on-chip storage unit) connected with the target port through the target outlet.
In some embodiments, the destination port is a port connected to a host or an on-chip memory, and the number of destination ports is a plurality; the determining a transmission policy for the data to be transmitted from the source port according to the actual transmission bandwidth occupation amount and the predicted transmission bandwidth occupation amount includes: determining at least one target outlet from a plurality of target ports for the data to be transmitted according to the actual transmission bandwidth occupation amount and the predicted transmission bandwidth occupation amount; the target outlet is used for providing the data to be transmitted to the host or the on-chip memory.
Some embodiments of the present application may provide more destination ports for data from more I/O devices by increasing the number of bus interfaces to which the host or internal memory is connected. By arranging a plurality of destination ports, the problem of bandwidth bottleneck caused by the fact that IO routing adopts one destination port connected with a host or an on-chip memory is solved along with the increase of I/O equipment or the increase of I/O interface speed and the increase of bandwidth requirements caused by system application.
In some embodiments, the determining at least one target exit from a plurality of the destination ports for the data to be transmitted according to the actual transmission bandwidth occupancy and the predicted transmission bandwidth occupancy includes: and determining at least one target outlet from the plurality of destination ports according to a transmission bandwidth threshold and a prediction bandwidth threshold which are respectively set for each of the plurality of destination ports.
Some embodiments of the present application may determine whether the destination ports are capable of transmitting data to be transmitted from the source port by setting a comparison threshold and comparing the monitored bandwidth consumption (e.g., including actual transmission bandwidth occupancy and expected bandwidth occupancy) to the set threshold.
In some embodiments, the determining at least one target egress from the plurality of destination ports according to a transmission bandwidth threshold and a predicted bandwidth threshold set for each of the plurality of destination ports, respectively, includes: confirming that a first actual transmission bandwidth occupation amount corresponding to a first destination port is smaller than a first transmission bandwidth threshold corresponding to the first destination port, and confirming that a first predicted transmission bandwidth occupation amount corresponding to the first destination port is smaller than a first predicted bandwidth threshold corresponding to the first destination port; and selecting the first destination port as the target outlet.
According to the method and the device, the destination port with sufficient bandwidth allowance at the current moment and more bandwidth allowance in the predictable time length which is a certain time length from the current moment is selected as the target outlet of the data to be transmitted currently, so that the data to be transmitted is output to a host or a memory connected with the destination port through the sufficient target outlet, smooth transmission of the data to be transmitted is guaranteed, and the utilization rate of the destination port resource is improved.
In some embodiments, the determining at least one target egress from the plurality of destination ports according to a transmission bandwidth threshold and a predicted bandwidth threshold set for each of the plurality of destination ports, respectively, includes: and determining the target outlet by adopting at least one of the transmission bandwidth threshold and the prediction bandwidth threshold according to the attribute characteristics of the data to be transmitted, wherein the attribute characteristics are used for representing the data quantity of the data to be transmitted.
According to the method and the device, a more reasonable target outlet is selected according to the data quantity corresponding to the data to be transmitted, and the bandwidth of each target port is fully utilized on the basis of ensuring smooth transmission of the data to be transmitted.
In some embodiments, determining the target egress using at least one of the transmission bandwidth threshold and the predictive bandwidth threshold based on the attribute characteristics of the data to be transmitted comprises: confirming that the data volume of the data to be transmitted is larger than a first set threshold value; confirming that a second predicted transmission bandwidth occupation amount corresponding to a second destination port is smaller than a second predicted bandwidth threshold corresponding to the second destination port; selecting the second destination port as the target outlet; or, confirming that the data volume of the data to be transmitted is smaller than a second set threshold value; confirming that the second actual transmission bandwidth occupation amount corresponding to the second destination port is smaller than a second transmission bandwidth threshold corresponding to the second destination port; selecting the second destination port as the target outlet; wherein the first set threshold is greater than the second set threshold.
In some embodiments of the present application, when the amount of data to be transmitted is large, a destination port with sufficient predicted bandwidth allowance (i.e., whether a certain destination port is selected as a target exit is determined according to the predicted actual transmission bandwidth occupation amount) is preferred, and when the amount of data to be transmitted is small, the current remaining bandwidth amount (i.e., whether a certain destination port is selected as a target exit is determined according to the actual transmission bandwidth occupation amount) can be preferentially selected.
In some embodiments, the actual transmission bandwidth occupancy is determined based on the transmission bandwidth of the current transmission within the transmission window.
According to the method and the device, the actual transmission bandwidth occupation amount of each destination port is determined by counting the transmission bandwidth in the transmission window, and accuracy and objectivity of bandwidth occupation condition estimation of each destination port are improved.
In some embodiments, the calculation formula of the actual transmission bandwidth occupation amount is as follows:
Figure GDA0004199193200000041
the CLK COUNT is the number of system clock cycles counted in a set transmission window, and the TRANS COUNT is used for characterizing the number of data bytes corresponding to all valid transmissions counted at the current time.
According to the method and the device, the actual transmission bandwidth occupation amount of each destination port can be determined through the formula, and the objectivity of bandwidth occupation condition estimation is improved.
In some embodiments, the value of the transmission valid signal statistics parameter TRANS COUNT in the channel is monitored.
According to the method and the device for determining the actual transmission bandwidth occupation amount, the actual transmission bandwidth occupation amount is determined and recorded by detecting the effective signals of the transmission data of each channel on the bus, and the accuracy of the actual transmission bandwidth occupation amount estimation is improved.
In some embodiments, the predicted transmission bandwidth occupancy is determined based on a transmission bandwidth of an predictable transmission within a predictable transmission window, wherein the predictable transmission window is a total number of system clock cycles determined based on a transmission length and a transmission bit width corresponding to the predictable transmission.
According to the method and the device, the predicted transmission bandwidth occupation amount of each destination port is determined by counting the expected transmission bandwidth of each destination port in the expected transmission window, and accuracy and objectivity of estimation of bandwidth occupation conditions of each destination port are improved.
In some embodiments, the calculation formula of the predicted transmission bandwidth occupation amount is as follows:
Figure GDA0004199193200000051
Wherein CLK COUNT1 is a system clock COUNT value within the expected transmission window, and EST is a COUNT value obtained by counting a transmission length carried by a current transmission request to characterize the data.
According to the method and the device, the predicted transmission bandwidth occupation amount of each destination port can be determined through the formula, and the objectivity of bandwidth occupation condition estimation is improved.
In some embodiments, the value of the EST is obtained by extracting information carried by a transmission request signal in a channel.
According to the method and the device for estimating the transmission bandwidth occupation amount, the value of the predicted transmission bandwidth occupation amount is determined by detecting the information carried by the transmission request signals of all channels on the bus, and accuracy of estimating the predicted transmission bandwidth occupation amount is improved.
In a second aspect, some embodiments of the present application provide a routing device for an on-chip bus, the routing device comprising: a bandwidth occupancy calculation module configured to: acquiring the actual transmission bandwidth occupation amount of each destination port in at least one current destination port on a bus; acquiring the predicted transmission bandwidth occupation amount of each destination port in the at least one destination port in a preset time period in the future; and the arbitration module is configured to determine a transmission strategy for the data to be transmitted from the source port according to the actual transmission bandwidth occupation amount and the predicted transmission bandwidth occupation amount.
In a third aspect, some embodiments of the present application provide a routing system for a system-on-chip, the routing system comprising: the system clock counting module is configured to add 1 to each transmission clock period when monitoring starts, clear when a set transmission window value is reached, and restart clock counting in a next monitoring transmission window; the actual transmission quantity counting module is configured to count the corresponding actual transmission quantity of the effective transmission on the channel according to the transmission effective signal; a predicted transmission amount counting module configured to count a predicted transmission amount on the channel in a predicted time period in the future according to the transmission request signal; a bandwidth calculation module configured to: determining the actual transmission bandwidth occupation amount of a destination port according to the set transmission window size and a plurality of actual transmission amounts corresponding to the destination port; determining a predicted transmission bandwidth occupation amount according to a predicted transmission window size and a plurality of predicted transmission amounts corresponding to the destination port; an arbiter configured to: comparing the actual transmission bandwidth occupation amount of the destination port with a set transmission bandwidth threshold value to obtain a first comparison result; comparing the predicted transmission bandwidth occupation amount of the destination port with a set predicted transmission bandwidth threshold value to obtain a second comparison result; generating a target outlet selection signal according to the first comparison result and the second comparison result, wherein the target outlet is an outlet which is selected from a plurality of target ports and is used for transmitting data to be transmitted; and the router is configured to receive the target outlet selection signal so as to send the data to be transmitted to the target outlet.
According to the method and the device, the actual transmission quantity, the predicted transmission quantity and the system clock quantity are counted through the plurality of technical devices, so that the actual transmission bandwidth occupation quantity and the predicted transmission bandwidth occupation quantity are determined according to the data, and the accuracy of bandwidth occupation condition estimation is improved.
In some embodiments, when the transmission valid signal is confirmed to be at a first level, the actual transmission quantity counting module is started to count once, wherein the first level is a high level or a low level.
According to the method and the device for counting the actual transmission quantity, the actual transmission quantity is triggered to start counting through the transmission of the effective signal, and accuracy of the actual transmission quantity counting result can be improved.
In some embodiments, the count step of the one count is determined from a transmission bit width of the channel.
In some embodiments of the present disclosure, comparability of bandwidth comparison results of multiple destination ports is achieved by setting a uniform step value, so as to improve rationality of an obtained transmission policy.
In some embodiments, the predicted traffic counting module is started to count once when the transmission request signal is confirmed to be at a second level, wherein the second level is a high level or a low level.
Some embodiments of the present application trigger the start statistics of the predicted transmission amount by transmitting the request signal, so that the accuracy of the statistics result of the predicted transmission amount can be improved.
In some embodiments, the counting step of the primary counting is determined according to the transmission length of the data to be transmitted at this time carried by the transmission request signal.
In some embodiments of the present disclosure, comparability of bandwidth comparison results of multiple destination ports is achieved by setting a uniform step value, so as to improve rationality of an obtained transmission policy.
In a fourth aspect, some embodiments of the present application provide a data transmission method of a system on a chip, the data transmission method including: before a valid transmission begins: setting a value of a transmission window corresponding to each destination port in a plurality of destination ports, and setting a bandwidth comparison threshold corresponding to each destination port in the plurality of destination ports, wherein the bandwidth comparison threshold comprises a transmission bandwidth comparison threshold and a predicted transmission bandwidth threshold; when effective transmission starts, continuously updating the actual transmission bandwidth occupation amount and the predicted transmission bandwidth occupation amount of each destination port in the plurality of destination ports, wherein the actual transmission bandwidth occupation amount is calculated according to the transmission window value and the counted actual transmission amount, and the predicted transmission bandwidth occupation amount is calculated according to the predicted transmission window value and the counted predicted transmission amount; and during arbitration, dynamically adjusting an output routing result according to the bandwidth comparison threshold value of each destination port in the plurality of destination ports so as to route data to be transmitted from a source port to an adjusted target outlet.
In a fifth aspect, some embodiments of the present application provide a system on a chip comprising a processor, a memory, at least one I/O device, and a routing system according to any of the above third aspects; wherein the at least one I/O device is interconnected with the processor through the routing system; alternatively, the at least one I/O device is interconnected with the memory through the routing system.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a diagram of a relationship between IO routing and multiple I/O devices and host connections provided by the related art;
FIG. 2 is an internal block diagram of IO routing provided by the related art;
fig. 3 is a schematic diagram provided in an embodiment of the present application for illustrating blocking caused by a small number of destination ports corresponding to uplink data transmission in the related art;
FIG. 4 is a flowchart of a routing method for an on-chip bus according to an embodiment of the present disclosure;
fig. 5 is a schematic diagram of transmission window movement provided in an embodiment of the present application;
FIG. 6 is a diagram illustrating the connection of IO routes of a multi-bus interface with multiple I/O devices and a host according to an embodiment of the present application;
fig. 7 is a schematic diagram of connection relationship between a source port and a destination port according to an embodiment of the present application;
fig. 8 is a schematic routing diagram of two destination ports corresponding to uplink data transmission provided in the embodiment of the present application;
FIG. 9 is a block diagram of a routing device for an on-chip bus according to an embodiment of the present application;
FIG. 10 is a block diagram of a routing system for an on-chip bus provided in an embodiment of the present application;
FIG. 11 is an internal block diagram of IO routing provided in an embodiment of the present application;
fig. 12 is a further flowchart of a routing method provided in an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.
It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures. Meanwhile, in the description of the present application, the terms "first", "second", and the like are used only to distinguish the description, and are not to be construed as indicating or implying relative importance.
The abbreviations referred to in this application have the following meanings.
SoC: system-on-a-Chip, soC, is a specialized target integrated circuit that contains the entire System and has the entire contents of embedded software.
P2P: peer-to-Peer, PCIE device end-to-end transmission.
Host: the I/O devices transfer interconnected objects, such as Data Fabric.
Monitor: and a functional module for monitoring the transmission bandwidth.
Source port: the port of the transmission request is initiated in the client port of the IO route.
Target port: the client port of the IO route receives and outputs the port of the transmission request.
The related art IOHUB and technical problems and disadvantages of these IOHUB are exemplarily described below with reference to FIGS. 1 and 2.
The related art IOHUB design interconnects I/O devices with a Host through a Bus Interface, taking 5I/O devices as an example, wherein the interconnect structure is shown in FIG. 1. The upstream I/O device 300 of fig. 1 sends data to the host 100, specifically, 5I/O devices are connected to the IOHUB (i.e. the IO router 200 of fig. 1) through the bus interface BUSIF (Bus Interface), and the IO router 200 connects the request and data from the I/O devices to the host 100 through the bus interface BUSIF connected to the host; the downstream data stream of fig. 1 is a data stream sent from the host 100 to the I/O device 300, and specifically includes a request from the host 100 and data distributed to the bus interface BUSIF connected to the I/O device through a route, so that access to 5I/O devices is achieved. In fig. 1, 5I/O devices share a host 100, and P2P access between the I/O devices may also be achieved through routing. For IOHUB, each port is a client port, and each client port can be a source port or a destination port, which is determined according to the direction of data transmission.
Design of related art IOHUB as shown in fig. 2, unlike fig. 1, fig. 2 replaces the host 100 of fig. 1 with the memory 110. The memory 110 of fig. 2 may receive and store data from I/O devices through a bus, and may also send data to I/O devices connected to the bus, so fig. 2 illustrates two bus interface units connected to the memory 110, where the two bus interface units correspond to an upstream interface (corresponding to a destination port at this time) and a downstream interface (corresponding to a source port at this time) respectively, and in some embodiments of the present application, the two bus interfaces are represented as one bus interface as shown in fig. 1. It should be noted that, in some embodiments of the present application, by increasing the number of bus interfaces connected to the host or the on-chip memory, the number of bus interfaces serving as the upstream interface may be increased.
Fig. 2 shows the main functional blocks included in the related art IOHUB, which include: the bus interface BUSIF, decoder 201 (or called decoder), bus buffer 202 (or called BUF), multiplexer 203 (or called MUX), and arbiter 204 (or called arbitration), wherein the core functions of the modules are described as follows.
The bus interface BUSIF belongs to the bus interface of the I/O device and the Memory on the bus, and is used for the I/O device and the Memory to access the IOHUB. The bus interface implements a bus protocol to handle command channels, data channels, and response channels.
Decoder 201, which essentially implements the decoding of the access request address. For the upstream direction, the decoder 201 decodes the range of the request address according to the request address to determine whether the target device is a Memory or other I/O device or a null address space. If the access space is a Memory or a remote I/O device, the destination port is a port connected with a host or a Memory; if the access space is the I/O device on the same route Hub, the destination port is a port connected with other I/O devices; and if the access space is invalid, returning an invalid response. For the downstream direction, the Host port is used as a source port, the decoder 201 decodes the address space of each I/O device configured by software, where the access address falls in the address space of which I/O device, and the destination port target port is the port of the corresponding I/O device.
The bus buffer 202 (e.g., FIFO including different lanes of the bus) is used to buffer data or buffer access requests. For example, before individual access requests and data are arbitrated and output, the FIFO buffers the access requests and data, ensuring that access is not interrupted, where the depth and bandwidth of the FIFO is related to the number of Client ports.
The arbiter 204 is primarily responsible for arbitrating when source ports from different source ports access the same destination port target port, with only one source port being routed to the destination port target port at the same time. Arbitration uses a Round Robin (Round Robin) scheduling scheme, i.e., requests from client ports are allocated to destination port target ports in turn each time.
The multiplexer 203 is responsible for selecting the source port to output to the destination port target port according to the results of the decoder 201 and the arbiter 204.
It should be noted that, in the upstream direction of fig. 2, there is only one destination port (corresponding to two data streams), so when the number of I/O devices increases, the bandwidth of this destination port may become a bottleneck restricting the I/O devices to communicate with the memory 110.
The technical drawbacks of the related art of fig. 1 and 2 are exemplarily described below in connection with fig. 3.
In the related art (e.g., fig. 1 and 2), a plurality of I/O devices can only access a host or a memory through a destination port, and an arbitration module on an IO router uses a polling mode when performing arbitration. Thus with the increase in I/O devices and bandwidth requirements, there are the following problems: first, the bandwidth is insufficient: with the increase of I/O devices, the client ports corresponding to the I/O devices mounted by the IOHUB are increased; with the increase in the corresponding interface rate of I/O devices and the increase in bandwidth requirements of system applications, a destination port target port becomes a bandwidth bottleneck. As shown in fig. 3, both the path (1) and the path (2) have a bandwidth of 32GB/s, and since the bus interface BUSIF connected to the host 100 has only one egress, the bandwidth of 32GB/s of the bus interface BUSIF becomes a bottleneck of the transmission bandwidth. When upstream transmission is performed, the calculation of the egress bandwidth should be as follows:
Figure GDA0004199193200000111
where the left hand side of the equation characterizes the bandwidth for the destination port connected to the host (only one destination port), and the right hand side of the equation characterizes the bandwidth summation for each source port connected to the I/O device. It can be seen that the more source ports are simultaneously transmitted, the greater the bandwidth pressure of the destination port target port.
Second, the arbitration mechanism of the arbitration module of the related art does not consider bandwidth. As described above, the arbiter 204 of the related art adopts the polling mechanism, and does not consider the bandwidth factor, if the transmission length of the previous client is very large and the destination port is always occupied, the next client cannot obtain the transmission of the destination port before the previous client finishes the transmission, which affects the transmission of the key data. For example, as shown in fig. 3, if the channel (1) is transmitting and the channel (2) needs to acquire the transmission right of the BUSIF connected to the host 100 as soon as possible, but the channel (1) is not transmitting and the transmission is not interrupted, the channel (2) cannot acquire the transmission right, which causes the real-time transmission of the channel (2) to be interrupted. Another transmission case is that the transmission type sent out by the path (1) each time is batch transmission, that is, the size of the data amount transmitted each time is large, and the size of the data amount transmitted by the path (2) each time is small, but the transmission frequency is high, at this time, the path (2) releases the destination port quickly each time, the transmission time of the path (1) occupying the destination port each time is long, and the average bandwidth of the path (2) cannot be guaranteed as a whole.
Because of the foregoing problems of the related art, some embodiments of the present application add a bandwidth monitoring mechanism based on the IOHUB of the related art, improve the arbitration mechanism according to the bandwidth, and increase the number of destination ports, so as to improve the egress bandwidth flowing to the host or the on-chip memory by increasing the number of destination ports, and further in some embodiments of the present application, the bandwidth of the data to be transmitted on the port destination ports of multiple destinations can be flexibly adjusted according to the bandwidth required by the source port and the monitored bandwidth of the destination port.
For example, some embodiments of the present application increase the number of bus interfaces (one port for each bus interface) that interconnect with a host or memory, thereby increasing the number of destination ports to which an I/O device is to be a data sender; some embodiments of the present application also add bandwidth monitoring mechanisms to ports, improve arbitration and routing involved in existing transmission strategies based on bandwidth monitoring results, and increase flexibility of routing.
The routing method for the on-chip bus performed by the IOHUB of the embodiments of the application is exemplarily described below in conjunction with FIG. 4.
Referring to fig. 4, fig. 4 is a schematic diagram of a routing method for an on-chip bus according to an embodiment of the present application, including: s101, acquiring the actual transmission bandwidth occupation amount of each destination port in at least one destination port on a bus; s102, obtaining the predicted transmission bandwidth occupation amount of each destination port in the at least one destination port in a future predicted time period; s103, determining a transmission strategy for the data to be transmitted from the source port according to the actual transmission bandwidth occupation amount and the predicted transmission bandwidth occupation amount.
It should be noted that, for uplink data transmission, the destination port refers to a port connected to the host or the on-chip memory; for downstream data transmission, the destination port is a port where an on-chip bus is connected to an I/O device. When the destination port is a port connected to the I/O device, the number of destination ports is one, and the transmission policy determined according to the bandwidth occupation situation (including the actual transmission bandwidth occupation amount and the predicted transmission bandwidth occupation amount) in the embodiments of the present application is used to solve the problem how to make a decision when multiple hosts transmit data to the destination port at the same time. When the destination port is a port connected to the host or the on-chip storage unit, the number of destination ports corresponding to some embodiments of the present application is increased to a plurality of destination ports, and then a transmission policy determined according to the bandwidth occupation situation (including an actual transmission bandwidth occupation amount and a predicted transmission bandwidth occupation amount) of the present application is used to select one or more outlets from the plurality of destination ports for the data to be transmitted, and the data to be transmitted can be sent to the host or the on-chip storage through the outlets.
The following exemplifies the uplink data transmission, and exemplifies the steps involved in the scheme of fig. 4.
In some embodiments of the present application, the actual transmission bandwidth occupation amount referred to in S101 is determined according to the transmission bandwidth of the current transmission within the transmission window of any destination port. For example, the calculation formula of the actual transmission bandwidth occupation amount of any one destination port is as follows:
Figure GDA0004199193200000131
the BW_TRANS is used for representing the transmission bandwidth of the current transmission in a transmission window, the CLK COUNT is used for counting the system clock cycle number in the set transmission window, and the TRANS COUNT is used for representing counting the byte number corresponding to all valid transmissions of any port.
CLK COUNT is the number of clock cycles (i.e., clock cycle number) within the range of the transmission window obtained by statistics, and the size of the transmission window (the range of the transmission cycle counted) may be obtained according to an empirical value, and the value of the transmission window is preset through a register.
The calculation window in the actual calculation of the transmission bandwidth occupation amount is shown in fig. 10, and after the software sets the transmission window size, the transmission window size is not changed in the moving process, and the bandwidth is changed along with the movement of the window. As shown in fig. 5, the actual transmission bandwidth occupation amount in the first transmission Window cal_window is larger than the actual transmission bandwidth occupation amount in the second transmission Window cal_window.
In some embodiments, the predicted transmission bandwidth occupancy referred to in S102 is determined based on the transmission bandwidth of the predictable transmission of any destination port within a predictable transmission window, wherein the predictable transmission window is a total system clock cycle number determined based on the transmission length and the transmission bit width calculation corresponding to the predictable transmission. For example, the predictable transmission window is the total system clock cycle number obtained by transmitting data with a data bit width length for each system clock cycle at the transmission start point according to the transmission length corresponding to the predictable transmission. For example, the calculation formula of the predicted transmission bandwidth occupation amount is as follows:
bw_trans=est/CLK COUNT1, where CLK COUNT1 is the system clock COUNT value within the expected transmission window, and EST is the COUNT value obtained by counting the transmission length carried by the current transmission request to characterize the data.
Note that, the expected transmission refers to a transmission amount obtained according to a transmission length, and if the transmission length in the transmission request is 256 bytes, the expected transmission is 256 bytes; the expected transmission window refers to the number of transmission cycles calculated according to the transmission length and the transmission bit width, for example, the transmission length is 256 bytes, the transmission bus bit width is 32 bytes, and the expected transmission window is 8 cycles. For specific values of the above parameters, reference is made to the following description, which is not repeated here.
S103 is exemplarily described below by taking a bus interface connected to the host or the on-chip memory as a destination port (i.e., performing upstream transmission).
In order to solve the problem that only one port is connected with a host or an on-chip memory in the related art, so that data to be transmitted from an I/O device cannot be sent to the host or the memory immediately, in some embodiments of the present application, the host or the on-chip memory is connected with an IO router through a plurality of bus interfaces, and the plurality of bus interfaces correspond to a plurality of ports when data transmission is performed.
As shown in fig. 6, unlike fig. 1 and 2, the host 100 of fig. 6 is connected to an IO router through four bus interfaces. It will be appreciated that data from the I/O device may be transferred to the host 100 through one or more of the four bus interfaces, i.e., four destination ports, which significantly increases the overall bandwidth for simultaneous transfer of data to the host or memory.
As shown in fig. 6, when the I/O device is used as the source port, the destination port is increased to four, and at this time, the output bandwidth is increased by four times as much as the original scheme, so that more I/O devices can be provided for simultaneous transmission. As shown in fig. 7, the source port and the destination port target port can implement any routing, that is, any source port can be routed to any destination port target port. In actual use, when the source port is a bus interface connected to a host or an on-chip memory, the corresponding destination port target port is a port connected to an I/O device. It should be noted that, in some embodiments of the present application, when the data stream direction is downstream, the route from the source port to the destination port is determined by the address space accessed by the source port to the destination port; when the source port is an interface connected to the I/O device and the destination port is a bus interface connected to the host or the memory, that is, when the data flow direction is upstream, the route from the source port to the destination port is determined by the bandwidth monitored in some embodiments of the present application.
As an example, when the total bandwidth of the source port is lower than the destination port target port, the source port can be routed to any destination port target port, so as to ensure that the bandwidth requirements of all source ports are satisfied, and the unused destination port target port is in an IDLE state; when the total bandwidth of the source port is equal to or higher than the destination port target port, the route from the source port to the destination port target port needs to be dynamically adjusted according to the bandwidth of the destination port target port. For example, the first Source port source_0 has a transmission to the destination port Target port, and after the transmission request is sent, the first destination port target_0port and the second destination port target_1port are both in a transmission state, and the third destination port target_2port is in an IDLE state, so that the data to be transmitted of the first Source port source_0 is routed to the third destination port target_2; if all the four destination ports are occupied, the data to be transmitted of the first Source port source_0 is routed to the destination port with the lowest bandwidth expectation (i.e. the expected transmission bandwidth occupation) according to the bandwidth calculation result (i.e. the predicted transmission bandwidth occupation is calculated).
An exemplary strategy for routing according to bandwidth is set forth below in connection with the examples.
When the I/O device performs data transmission to the host or the memory through the multiple destination ports, there is a technical problem of reasonable routing, in some embodiments of the present application, in order to ensure that each destination port of the multiple destination ports is fully loaded as much as possible to increase the amount of data transmitted to the host or the memory, correspondingly, S103 includes: determining at least one target outlet from a plurality of destination ports for data to be transmitted according to the actual transmission bandwidth occupation amount and the predicted transmission bandwidth occupation amount; the target outlet is used for providing the data to be transmitted to the host or the on-chip memory. For example, S103 may include determining at least one of the target outlets from among the plurality of destination ports according to a transmission bandwidth threshold and a predicted bandwidth threshold set for each of the plurality of destination ports, respectively.
When the actual transmission bandwidth occupation amount and the predicted transmission bandwidth occupation amount of a certain destination port are smaller than the corresponding set threshold value, S103 determines that the destination port can be used as a target outlet of the data to be transmitted. In some embodiments of the present application, the S103 includes: confirming that a first actual transmission bandwidth occupation amount corresponding to a first destination port (which is any one of a plurality of destination ports) is smaller than a first transmission bandwidth threshold corresponding to the first destination port, and confirming that a first predicted transmission bandwidth occupation amount corresponding to the first destination port is smaller than a first predicted bandwidth threshold corresponding to the first destination port; and selecting the first destination port as the target outlet.
That is, when the actual transmission bandwidth of one destination port is smaller than the transmission bandwidth threshold and the predicted transmission bandwidth is smaller than the predicted bandwidth threshold, this destination port may be regarded as a target exit of data to be transmitted. It should be noted that, if the amount of data to be transmitted from a source port is greater than the bandwidth that can be provided by the destination port, the data to be transmitted from the source port may be distributed to multiple destination outlets, where the destination outlet corresponding to the destination port is only one of the destination outlets.
As an example, as shown in fig. 8. In FIG. 8, there are three I/O devices accessing DDR in the host. Before illustratively setting forth the transmission policy, it is assumed that the device or link in fig. 8 also has the following initial parameters:
first, in FIG. 8, 3 devices (i.e., 3I/O devices) initiate write transfer requests, and the information carried by the three write transfer requests characterizes that the average bandwidth of the data to be transferred is 32GB/s.
Second, the 3 devices in fig. 8 are connected to the host through two bus interfaces, and based on the first point assumption, the number of destination ports of the present transmission is two, and the full bandwidth of the two destination ports is 48GB/s.
Thirdly, assume that the initial default transmission route of fig. 8 is s1→t1; s2- & gt T2; S3-T2
Fourth, the bus bit width of the Device is 32 bytes, i.e. the effective transmission of one system transmission cycle is 32 bytes; the bus bit width of the Host is 48 Bytes, i.e., the effective transfer of one system clock cycle is 48 Bytes.
Fifth, assume an operating frequency of 1GHz, i.e., an ideal bandwidth for the bus of 32GB/s.
Sixth, assume that each Device initiates a write transfer length of 256 Bytes.
The workflow of fig. 8 is exemplarily illustrated below in conjunction with the six-point assumption described above as follows:
s201, setting relevant transmission parameters of a Device and a Host, wherein the Device is write transmission, and the transmission length is 256 Bytes.
S202, setting the transmission bandwidth threshold of T1 as 38GB/S (80% of ideal bandwidth), and setting the prediction bandwidth threshold of T1 as 48GB/S (ideal bandwidth).
S203, the transmission bandwidth threshold of T2 is set to 38GB/S (80% of ideal bandwidth), and the prediction bandwidth threshold of T1 is set to 48GB/S (ideal bandwidth).
S204, setting the transmission window size of the monitor of T1 to 1024 (namely, setting the transmission window size to 1024 cycles), and setting the window size of the monitor of T2 to 1024.
S205, starting transmission, and starting the monitor of each destination port.
S206, when the first transmission window is finished, executing the following steps:
s206-1, wherein the actual transmission quantity TRANS COUNT of T1 is 1000*32Byte,CLK COUNT and is 1000, and the bandwidth conversion (namely the actual transmission bandwidth occupation) is 32GB/S according to the calculation formula of BW_TRANS; the predicted transmission amount EST COUNT of T1 is 256 bytes, and CLK COUNT required for completing 256Byte transmission is 8 system clock cycles, so that the bandwidth conversion (i.e., predicted transmission bandwidth occupation amount) is 32GB/s according to the bandwidth calculation formula of est_trans. Both bandwidth values are less than a preset threshold, i.e. the bandwidth on T1 is not full, and may receive transmissions from other source ports.
S206-2, the actual transmission quantity TRANS COUNT of T2 is 1000*48Byte,CLK COUNT and is 1000, and the bandwidth conversion (i.e. the actual transmission bandwidth occupation) is 48GB/S according to the calculation formula of BW_TRANS.
S206-3, the predicted transmission quantity EST COUNT of T2 from S2 is 256 bytes, CLK COUNT required for completing 256Byte transmission is 6 cycles, so the BW_EST conversion (i.e. actual transmission bandwidth occupation) of T2 from S2 is 43GB/S; t2 the predicted traffic est_count from S3 is 256 bytes and CLK COUNT required to complete 256Byte transmission is 6 cycles, so the bw_est conversion (i.e., actual transmission bandwidth occupation) from S2 for T2 is 43GB/S. The total estimated bandwidth requirement on the T2 port is 86GB/s.
S206-4, so that the actual transmission bandwidth occupation amount and the predicted transmission bandwidth occupation amount on T2 are both higher than the corresponding threshold values.
S206-5, the software changes the route of S2 to T1.
S207, at the end of the second transmission window, the bandwidth calculation results of T1 and T2 are opposite to #6, and the software changes the route of S2 to T2.
On the overall average, the bandwidth of S2 of fig. 8 is equally allocated 16GB/S on the destination port T1 and the destination port T2.
It should be noted that some embodiments of the present application determine the transmission policy by determining the characteristics of the data to be transmitted from any one of the source ports. For example, if the source port has a significant amount of data to be transmitted, then reference to a predictive transmission threshold is required to determine if a destination port can be the target exit; if the amount of data to be transmitted at the source port is small, a transmission bandwidth threshold may be employed to determine whether a destination port may be considered a target egress.
As an example, when one of the actual transmission bandwidth occupation amount and the predicted transmission bandwidth occupation amount of a certain destination port in some embodiments of the present application is smaller than the corresponding set threshold value, S103 may include: and determining the target outlet by adopting at least one of the transmission bandwidth threshold and the prediction bandwidth threshold according to the attribute characteristics of the data to be transmitted, wherein the attribute characteristics are used for representing the data quantity of the data to be transmitted. For example, it includes: confirming that the data volume of the data to be transmitted is larger than a first set threshold value; confirming that a second predicted transmission bandwidth occupation amount corresponding to a second destination port is smaller than a second predicted bandwidth threshold corresponding to the second destination port; selecting the second destination port as the target outlet; or, confirming that the data volume of the data to be transmitted is smaller than a second set threshold value; confirming that the second actual transmission bandwidth occupation amount corresponding to the second destination port is smaller than a second transmission bandwidth threshold corresponding to the second destination port; selecting the second destination port as the target outlet; wherein the first set threshold is greater than the second set threshold. The first destination port is any one of a plurality of destination ports.
Referring to fig. 9, fig. 9 shows a routing device for a bus according to an embodiment of the present application, and it should be understood that the device corresponds to the method embodiment of fig. 4, and is capable of performing the steps related to the method embodiment, and specific functions of the device may be referred to the above description, and detailed descriptions thereof are omitted herein as appropriate to avoid redundancy. The device includes at least one software functional module, which can be stored in memory in the form of software or firmware or solidified in the operating system of the device, for the routing device on the bus, comprising: the bandwidth occupation amount calculation module 801 is configured to: acquiring the actual transmission bandwidth occupation amount of each destination port in at least one destination port on a bus; acquiring the predicted transmission bandwidth occupation amount of each destination port in the at least one destination port in a preset time period in the future; an arbitration module 802 configured to determine a transmission policy for data to be transmitted from a source port based on the actual transmission bandwidth occupancy and the predicted transmission bandwidth occupancy.
It will be clear to those skilled in the art that, for convenience and brevity of description, reference may be made to the corresponding procedure in the method of fig. 4 for the specific working procedure of the above-described apparatus, and thus, the description is not repeated here.
A routing system for an on-chip bus according to an embodiment of the present application is exemplarily described below with reference to fig. 10.
Some embodiments of the present application provide a routing system for an on-chip bus, the routing system comprising: a system clock counting module 210 configured to increment 1 every transmission clock period when monitoring starts, clear when a set transmission window value is reached, and restart clock counting in a next monitoring transmission window; an actual transmission amount counting module 220 configured to statistically effectively transmit a corresponding actual transmission amount (i.e., TRANS COUNT described above) according to the transmission effective signal; a predicted transmission amount counting module 230 configured to count a predicted transmission amount (i.e., EST described above) over a future predicted period of time on the channel based on the transmission request signal; a bandwidth calculation module 240 configured to: determining an actual transmission bandwidth occupation amount (i.e., bw_trans of fig. 10) of a destination port according to the set transmission WINDOW size (i.e., reg_window of fig. 10) and a plurality of the actual transmission amounts corresponding to the destination port; determining a predicted transmission bandwidth occupancy (i.e., bw_trans_est of fig. 10) based on a predicted transmission window size and a plurality of said predicted transmission amounts corresponding to said destination ports; an arbiter 250 configured to: comparing the actual transmission bandwidth occupation amount of the destination port with a set transmission bandwidth THRESHOLD (i.e., reg_threshold of fig. 10), and obtaining a first comparison result; comparing the predicted transmission bandwidth occupation amount of the destination port with a set predicted transmission bandwidth threshold value to obtain a second comparison result; generating a target exit selection signal (i.e., trans_block of fig. 10) according to the first comparison result and the second comparison result, wherein the target exit is an exit selected from a plurality of destination ports and used for transmitting data to be transmitted; a router 260 is configured to receive the target outlet selection signal to send the data to be transmitted to the target outlet.
In some embodiments of the present application, when the transmission valid signal is confirmed to be at a first level, the actual transmission amount counting module is started to count once, where the first level is a high level or a low level. For example, the count step size of the one count is determined according to the transmission bit width of the channel.
As an example, a Valid signal, valid_transfer, is transmitted as shown in fig. 10. Valid_transfer refers to a Valid one-time bus Transfer. The valid_transfer is calculated separately for each channel on the bus, and the Valid condition is that the transmission Valid signal corresponding to the channel is Valid, and the counter of the corresponding actual transmission quantity counting module 220 is incremented once for each transmission. On the data channel, the counter is incremented by 32 for each valid transmission of the corresponding Byte number, i.e., if the data channel is 256-bit wide, the Byte enable is valid.
In some embodiments, the predicted traffic counting module is started to count once when the transmission request signal is confirmed to be at a second level, wherein the second level is a high level or a low level. For example, the counting step length of the primary counting is determined according to the transmission length of the data to be transmitted at this time carried by the transmission request signal.
As shown in fig. 10, the transmission Request signal is a Request: for predicting bandwidth usage by an impending transmission, referred to as a transmission command channel on the bus. In the transmission command channel, when the transmission Request signal Request is valid, the number of bytes increased by the counter each time is the transmission length, if the transmission length is 256 bytes, the counter is increased by 256, and the prediction mode is: in the next 256 transmission cycles, the corresponding data channel transmits 256 bytes of data. The predicted traffic counting module 230 is used to predict the bandwidth of the data channel, but does not consider the case when the transmission is blocked, so the actual bandwidth should be less than the predicted bandwidth.
It will be clear to those skilled in the art that, for convenience and brevity of description, reference may be made to the corresponding procedure in the method of fig. 4 for the specific working procedure of the above-described apparatus, and thus, the description is not repeated here.
As shown in fig. 11, the structure of interconnection between the IOHUB and the host (or the memory) and the IO device according to the embodiment of the present application is shown in the figure. Unlike fig. 2, there are four interfaces (one up signal and one down signal in fig. 11 correspond to one bus interface) for each host or memory and bus interface in fig. 11, and a monitor may be further disposed in each bus interface module in fig. 11, where an output signal (for example, the output signal includes the actual transmission bandwidth occupation bw_trans and the predicted transmission bandwidth occupation bw_trans_est in fig. 10) is buffered by the buffer BUF and then input value arbitrated, that is, arbitrated, so as to perform route arbitration, and further select a target exit with a suitable bandwidth for the transmission data from the source port. The same elements as in fig. 2 are not described here.
The bandwidth monitoring apparatus monitor of fig. 11 includes the system clock counting module 210, the actual traffic counting module 220, the predicted traffic counting module 230, and the bandwidth calculating module 240 of fig. 9. The function of these modules may be referred to above, and will not be repeated here.
Some embodiments of the present application provide a data transmission method of a system on a chip, the data transmission method including: before a valid transmission begins: setting a value of a transmission window corresponding to each destination port in a plurality of destination ports, and setting a bandwidth comparison threshold corresponding to each destination port in the plurality of destination ports, wherein the bandwidth comparison threshold comprises a transmission bandwidth comparison threshold and a predicted transmission bandwidth threshold; when effective transmission starts, continuously updating the actual transmission bandwidth occupation amount and the predicted transmission bandwidth occupation amount of each destination port in the plurality of destination ports, wherein the actual transmission bandwidth occupation amount is calculated according to the transmission window value and the counted actual transmission amount, and the predicted transmission bandwidth occupation amount is calculated according to the predicted transmission window value and the counted predicted transmission amount; and during arbitration, dynamically adjusting an output routing result according to the bandwidth comparison threshold value of each destination port in the plurality of destination ports so as to route data to be transmitted from a source port to an adjusted target outlet.
That is, the data transmission method of the system on chip of the embodiment of the present application is shown in fig. 12. Before starting transmission, software sets a transmission window size and a bandwidth comparison threshold value of bandwidth calculation through a register, calculates an update bandwidth (namely, calculates the actual transmission bandwidth occupation amount and the predicted transmission bandwidth occupation amount) when transmission starts, dynamically adjusts an output route result according to the bandwidth threshold value of each destination port during arbitration, and routes a source port to the adjusted destination port according to the bandwidth update route result. When one transmission is finished, the source port is finished if no subsequent transmission is carried out, and if the transmission is continued, the previous bandwidth calculation and route adjustment are repeated. When the software performs arbitration routing, the software needs to set reg_threshold corresponding to the destination port target port, namely, the bandwidth THRESHOLD of the port (namely, set the transmission bandwidth THRESHOLD and set the prediction bandwidth THRESHOLD), when the actual transmission bandwidth corresponding to the destination port target port is greater than or equal to the set bandwidth THRESHOLD, other source port will not be routed to the destination port, otherwise, other source port can be received and routed to the destination port.
Some embodiments of the present application provide a system on a chip comprising a processor, a memory, at least one I/O device, and the routing system and apparatus described above; wherein the at least one I/O device is interconnected with the processor through the routing system or routing apparatus; alternatively, the at least one I/O device is interconnected with the memory via the routing system or routing apparatus.
In the several embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other manners as well. The apparatus embodiments described above are merely illustrative, for example, flow diagrams and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, the functional modules in the embodiments of the present application may be integrated together to form a single part, or each module may exist alone, or two or more modules may be integrated to form a single part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The foregoing is merely exemplary embodiments of the present application and is not intended to limit the scope of the present application, and various modifications and variations may be suggested to one skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principles of the present application should be included in the protection scope of the present application. It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.
The foregoing is merely specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the present application, and the changes and substitutions are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

Claims (19)

1. The routing method for the on-chip bus is applied to an IO routing module and is characterized by comprising the following steps of:
acquiring the actual transmission bandwidth occupation amount of each destination port in a plurality of destination ports on a bus, wherein the destination ports are ports connected with an on-chip memory or a host;
acquiring predicted transmission bandwidth occupation amount of each destination port in the plurality of destination ports in a future predicted time period;
comparing at least one of the actual transmission bandwidth occupation amount and the predicted transmission bandwidth occupation amount with a corresponding set threshold value to obtain a comparison result;
and selecting at least one target outlet from the plurality of target ports at least based on the comparison result, wherein the target outlet is used for providing data to be transmitted from a source port to the host or the on-chip memory, and the source port is a port connected with various I/O devices.
2. The routing method of claim 1, wherein the determining at least one of the target outlets from a plurality of the destination ports based at least on the comparison result comprises:
and if the comparison result shows that the first actual transmission bandwidth occupation amount corresponding to the first destination port is smaller than the first transmission bandwidth threshold corresponding to the first destination port and the first predicted transmission bandwidth occupation amount corresponding to the first destination port is smaller than the first predicted bandwidth threshold corresponding to the first destination port, selecting the first destination port as the target outlet.
3. The routing method of claim 1, wherein the selecting at least one target exit from the plurality of destination ports based at least on the comparison result comprises:
and selecting at least one target outlet from the plurality of destination ports according to the attribute characteristics of the data to be transmitted and the comparison result, wherein the attribute characteristics are used for representing the data quantity of the data to be transmitted.
4. A routing method according to claim 3, wherein said selecting at least one destination outlet from said plurality of destination ports based on said attribute characteristics of said data to be transmitted and said comparison result comprises:
confirming that the data volume of the data to be transmitted is larger than a first set threshold value;
and if the comparison result shows that the second predicted transmission bandwidth occupation amount corresponding to the second destination port is smaller than the second predicted bandwidth threshold corresponding to the second destination port, selecting the second destination port as the target outlet.
5. The routing method of claim 3, wherein said selecting at least one destination outlet from said plurality of destination ports based on said attribute characteristics of said data to be transmitted and said comparison result comprises:
Confirming that the data volume of the data to be transmitted is smaller than a second set threshold value;
and if the comparison result shows that the second actual transmission bandwidth occupation amount corresponding to the second destination port is smaller than the second transmission bandwidth threshold corresponding to the second destination port, selecting the second destination port as the target outlet.
6. A routing method according to any of claims 1-5, characterized in that said actual transmission bandwidth occupation is determined on the basis of the transmission bandwidth of the current transmission within the transmission window.
7. The routing method of claim 6, wherein the actual transmission bandwidth occupation amount is calculated as follows:
Figure QLYQS_1
wherein CLK COUNT is the number of system clock cycles counted in the set transmission window,
the COUNT is used to characterize the number of bytes corresponding to all valid transmissions at the current time.
8. The routing method of claim 7, wherein the value of the transmission effective signal statistics parameter trancount is monitored.
9. The routing method of any of claims 1-5, wherein the predicted transmission bandwidth occupancy is determined based on a transmission bandwidth of an expected transmission within an expected transmission window, wherein the expected transmission window is a total number of system clock cycles determined based on a transmission length and a transmission bit width corresponding to the expected transmission.
10. The routing method of claim 9, wherein the calculation formula for predicting the transmission bandwidth occupation amount is as follows:
Figure QLYQS_2
wherein CLK COUNT1 is a system clock COUNT value within the expected transmission window, and EST is a COUNT value obtained by counting a transmission length carried by a current transmission request to characterize the data.
11. The routing method of claim 10, wherein the value of EST is obtained by extracting information carried by a transmission request signal in a channel.
12. A routing device for an on-chip bus, the routing device comprising:
a bandwidth occupancy calculation module configured to:
acquiring the actual transmission bandwidth occupation amount of each destination port in a plurality of destination ports on a bus, wherein the destination ports are ports connected with an on-chip memory or a host;
acquiring predicted transmission bandwidth occupation amount of each destination port in the plurality of destination ports in a future predicted time period;
an arbitration module configured to:
comparing at least one of the actual transmission bandwidth occupation amount and the predicted transmission bandwidth occupation amount with a corresponding set threshold value to obtain a comparison result;
And selecting at least one target outlet from the plurality of target ports at least based on the comparison result, wherein the target outlet is used for providing data to be transmitted from a source port to the host or the on-chip memory, and the source port is a port connected with various I/O devices.
13. A routing system for a system-on-chip, the routing system comprising:
the system clock counting module is configured to increase 1 every transmission clock period when monitoring starts, clear when a set transmission window value is reached, and restart clock counting in a next monitoring transmission window;
an actual transmission amount counting module configured to count an actual transmission amount corresponding to the effective transmission according to the transmission effective signal;
a predicted transmission amount counting module configured to count a predicted transmission amount in a predicted time period in the future according to the transmission request signal;
a bandwidth calculation module configured to:
acquiring the actual transmission bandwidth occupation amount of each destination port in a plurality of destination ports on a bus, wherein the actual transmission bandwidth occupation amount of the destination port is determined according to the set transmission window size and the plurality of actual transmission amounts corresponding to one destination port;
Obtaining predicted transmission bandwidth occupation amounts of each destination port in the plurality of destination ports in a future predicted time period, wherein the predicted transmission bandwidth occupation amounts are determined according to a predicted transmission window size and a plurality of predicted transmission amounts corresponding to the destination ports;
an arbiter configured to:
comparing the actual transmission bandwidth occupation amount of the destination port with a set transmission bandwidth threshold value to obtain a first comparison result;
comparing the predicted transmission bandwidth occupation amount with a set predicted transmission bandwidth threshold value to obtain a second comparison result;
generating a selection signal of a target outlet according to the first comparison result and the second comparison result, and selecting at least one target outlet from a plurality of target ports based on the selection signal, wherein the target outlet is an outlet selected from the plurality of target ports and used for transmitting data to be transmitted, the target outlet is used for providing the data to be transmitted from a source port to a host or an on-chip memory, the source port is a port connected with various I/O devices, and the target port is a port connected with the on-chip memory or the host;
And the router is configured to receive the target outlet selection signal so as to send the data to be transmitted to the target outlet.
14. The routing system for a system on a chip of claim 13, wherein the actual traffic counting module is started to count once when a transmission valid signal is confirmed to be a first level, wherein the first level is a high level or a low level.
15. The routing system for a system on a chip of claim 14, wherein the count step of the one count is determined based on a transmission bit width of a channel.
16. The routing system for a system on a chip of claim 13, wherein the predicted traffic counting module is started to count once when a transmission request signal is confirmed to be at a second level, wherein the second level is either a high level or a low level.
17. The routing system for a system on a chip of claim 16, wherein the count step of the one count is determined based on a transmission length of data to be transmitted carried by the transmission request signal.
18. A data transmission method of a system on chip, the data transmission method comprising:
Before a valid transmission begins:
setting a value of a transmission window corresponding to each destination port in a plurality of destination ports, and setting a bandwidth comparison threshold corresponding to each destination port in the plurality of destination ports, wherein the bandwidth comparison threshold comprises a transmission bandwidth comparison threshold and a predicted transmission bandwidth threshold, and the destination port is a port connected with an on-chip memory or a host;
acquiring the actual transmission bandwidth occupation amount of each destination port in the plurality of destination ports on a bus;
acquiring predicted transmission bandwidth occupation amount of each destination port in the plurality of destination ports in a future predicted time period;
when effective transmission starts, continuously updating the actual transmission bandwidth occupation amount and the predicted transmission bandwidth occupation amount value of each destination port in the plurality of destination ports, wherein the actual transmission bandwidth occupation amount is calculated according to the transmission window value and the statistical actual transmission amount, and the predicted transmission bandwidth occupation amount is calculated according to the predicted transmission window value and the statistical predicted transmission amount;
during arbitration, comparing at least one of the actual transmission bandwidth occupation amount and the predicted transmission bandwidth occupation amount with a corresponding set threshold value to obtain a comparison result; and dynamically adjusting an output routing result at least based on the comparison result to route data to be transmitted from a source port to an adjusted target outlet, wherein at least one target outlet is selected from the plurality of target ports according to the actual transmission bandwidth occupation amount and the predicted transmission bandwidth occupation amount, and the target outlet is used for providing the data to be transmitted from the source port to the host or the on-chip memory, and the source port is a port connected with various I/O devices.
19. A system on a chip comprising a processor, a memory, at least one I/O device, and a routing system according to any of claims 13-17;
wherein the at least one I/O device is interconnected with the processor through the routing system; or alternatively, the process may be performed,
the at least one I/O device is interconnected with the memory through the routing system.
CN202011342617.3A 2020-11-25 2020-11-25 Routing method and system for on-chip bus Active CN112486871B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011342617.3A CN112486871B (en) 2020-11-25 2020-11-25 Routing method and system for on-chip bus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011342617.3A CN112486871B (en) 2020-11-25 2020-11-25 Routing method and system for on-chip bus

Publications (2)

Publication Number Publication Date
CN112486871A CN112486871A (en) 2021-03-12
CN112486871B true CN112486871B (en) 2023-06-13

Family

ID=74934870

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011342617.3A Active CN112486871B (en) 2020-11-25 2020-11-25 Routing method and system for on-chip bus

Country Status (1)

Country Link
CN (1) CN112486871B (en)

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6763415B1 (en) * 2001-06-08 2004-07-13 Advanced Micro Devices, Inc. Speculative bus arbitrator and method of operation
CN106708671A (en) * 2015-11-17 2017-05-24 深圳市中兴微电子技术有限公司 Method and device for detecting bus behavior of system on chip
CN106453114B (en) * 2016-10-11 2020-03-17 刘昱 Flow distribution method and device
CN109617806B (en) * 2018-12-26 2021-06-22 新华三技术有限公司 Data traffic scheduling method and device

Also Published As

Publication number Publication date
CN112486871A (en) 2021-03-12

Similar Documents

Publication Publication Date Title
US9444740B2 (en) Router, method for controlling router, and program
US8316171B2 (en) Network on chip (NoC) with QoS features
US7624221B1 (en) Control device for data stream optimizations in a link interface
JP2011517903A (en) High-speed virtual channel for packet-switched on-chip interconnect networks
US11888710B2 (en) Technologies for managing cache quality of service
JP5894171B2 (en) Arbitration of bus transactions on the communication bus and associated power management based on the health information of the bus device
KR101478516B1 (en) Providing a fine-grained arbitration system
CN107436855B (en) QOS-aware IO management for PCIE storage systems with reconfigurable multiport
US9154438B2 (en) Port-based fairness protocol for a network element
EP2670085B1 (en) System for performing Data Cut-Through
JP2010081641A (en) System and method for regulating message flow in digital data network
US10133670B2 (en) Low overhead hierarchical connectivity of cache coherent agents to a coherent fabric
WO2018233425A1 (en) Network congestion processing method, device, and system
WO2014206078A1 (en) Memory access method, device and system
KR101226177B1 (en) System and method for efficient data transmission in a multi-processor environment
US20230401117A1 (en) Automatically optimized credit pool mechanism based on number of virtual channels and round trip path delay
JP2018520434A (en) Method and system for USB 2.0 bandwidth reservation
CN112486871B (en) Routing method and system for on-chip bus
Jian et al. Understanding and optimizing power consumption in memory networks
WO2021180127A1 (en) Capacity expansion method for network device and related apparatus
JP5715458B2 (en) Information processing system and mediation method
US9154569B1 (en) Method and system for buffer management
CN116057907A (en) On-chip router
JP6551049B2 (en) Bandwidth control circuit, arithmetic processing unit, and bandwidth control method of the device
JP5949312B2 (en) Parallel computer system, data transfer apparatus, and parallel computer system control method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
CB02 Change of applicant information

Address after: Industrial incubation-3-8, North 2-204, No. 18, Haitai West Road, Huayuan Industrial Zone, Binhai New Area, Tianjin 300450

Applicant after: Haiguang Information Technology Co.,Ltd.

Address before: 100082 industrial incubation-3-8, North 2-204, 18 Haitai West Road, Huayuan Industrial Zone, Haidian District, Beijing

Applicant before: Haiguang Information Technology Co.,Ltd.

CB02 Change of applicant information
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant