CN112084027A

CN112084027A - Network-on-chip data transmission method, device, network-on-chip, equipment and medium

Info

Publication number: CN112084027A
Application number: CN202010921630.8A
Authority: CN
Inventors: 王封; 陈贺
Original assignee: Beijing Lynxi Technology Co Ltd
Current assignee: Beijing Lynxi Technology Co Ltd
Priority date: 2020-09-04
Filing date: 2020-09-04
Publication date: 2020-12-15
Anticipated expiration: 2040-09-04
Also published as: CN112084027B

Abstract

The embodiment of the invention discloses a method and a device for transmitting network-on-chip data, a network-on-chip, equipment and a medium. The method comprises the following steps: acquiring load data of each node in the network on chip; determining the number of virtual channels matched with each node according to the load data of each node; and configuring virtual channels for the nodes according to the number of the virtual channels of the nodes, and indicating the nodes to transmit data by adopting the configured virtual channels. The embodiment of the invention can reasonably configure the transmission resources in the network on chip and improve the transmission efficiency.

Description

Network-on-chip data transmission method, device, network-on-chip, equipment and medium

Technical Field

The embodiment of the invention relates to the field of artificial intelligence, the field of a general processor or the field of high-performance computing, and the like, in particular to a method and a device for transmitting network-on-chip data, a network-on-chip, equipment and a medium.

Background

In recent years, with the rapid development of Artificial Intelligence-related applications and technologies, the requirements for computing power and power consumption efficiency are increasing, and it is a new trend that an AI algorithm is run by an AI chip.

The AI algorithm is typically run using a Many-Core (Many Core) to increase the computation speed of the AI algorithm. The cores are a set of cores that are connected together in a predetermined manner and have high-performance parallel processing capability, and the cores are large in number (thousands of cores in the future) and various in types.

Currently, in many cores, a Network On Chip (NOC) communication mode is adopted to realize communication between the cores. In the conventional network on chip, a communication path between two cores is formed by the two cores and the core therebetween. In this case, the plurality of communication paths include the same core, which results in excessive data to be transmitted by the core and low communication efficiency.

Disclosure of Invention

Embodiments of the present invention provide a method, an apparatus, a network on chip, a device, and a medium for transmitting network on chip data, which can reasonably configure transmission resources in the network on chip and improve transmission efficiency.

In a first aspect, an embodiment of the present invention provides a many-core network-on-chip data transmission method, including:

acquiring load data of each node in the network on chip;

determining the number of virtual channels matched with each node according to the load data of each node;

and configuring virtual channels for the nodes according to the number of the virtual channels of the nodes, and indicating the nodes to transmit data by adopting the configured virtual channels.

In a second aspect, an embodiment of the present invention further provides a many-core network-on-chip data transmission apparatus, including:

the load data acquisition module is used for acquiring load data of each node in the network on chip;

the virtual channel number determining module is used for determining the number of the virtual channels matched with each node according to the load data of each node;

and the data transmission module is used for configuring a virtual channel for each node according to the number of the virtual channels of each node and indicating each node to adopt the configured virtual channel to transmit data.

In a third aspect, an embodiment of the present invention further provides a network on chip, including a plurality of nodes, where each node is configured with a virtual channel for transmitting data; the number of the virtual channels of the nodes is determined according to the load data of the nodes, and the number of the virtual channels of at least two nodes is different.

In a fourth aspect, an embodiment of the present invention further provides a computer device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the computer program to implement the method for network-on-chip data transmission of many cores according to any of the embodiments of the present invention.

In a fifth aspect, the embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the network-on-chip data transmission method for many cores according to any of the embodiments of the present invention.

The number of the virtual channels of the nodes can be reasonably configured by configuring the number of the virtual channels of the nodes according to the load data of the nodes, the same physical channel can be multiplexed by the reasonable number of the virtual channels in a time-sharing mode, the situation that other data cannot be transmitted due to data blockage is reduced, the problems that data blockage is caused due to the fact that a plurality of input ports output from the same output port and the communication efficiency is low are solved, the number of the virtual channels can be reasonably configured for the nodes of the network-on-chip, the network throughput rate is improved, the utilization rate of the physical channels is improved, the waste of transmission resources is reduced, the transmission resources are reasonably configured, the communication efficiency is improved, the chip area is reduced, and the data transmission power consumption is reduced.

Drawings

Fig. 1a is a flowchart of a network-on-chip data transmission method for many cores according to a first embodiment of the present invention;

fig. 1b is a schematic diagram of a two-dimensional mesh network-on-chip according to a first embodiment of the present invention;

fig. 2a is a flowchart of a network-on-chip data transmission method for many cores according to a second embodiment of the present invention;

fig. 2b is a schematic diagram of an on-chip network area according to a second embodiment of the present invention;

FIG. 2c is a schematic diagram of a transmission line according to a second embodiment of the present invention;

fig. 2d is a schematic diagram of an on-chip network area unit according to a second embodiment of the present invention;

fig. 2e is a schematic diagram of a congestion node structure according to a second embodiment of the present invention;

fig. 2f is a schematic diagram of an application scenario of data transmission of a congestion node according to a second embodiment of the present invention;

fig. 2g is a schematic diagram of a congestion node structure according to a second embodiment of the present invention;

fig. 2h is a schematic diagram of an application scenario of data transmission of a congestion node according to a second embodiment of the present invention;

fig. 3 is a schematic structural diagram of a many-core network-on-chip data transmission apparatus according to a third embodiment of the present invention;

fig. 4 is a schematic structural diagram of a computer device in a seventh embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.

Example one

Fig. 1a is a flowchart of a network-on-chip data transmission method for many cores in an embodiment of the present invention, where this embodiment is applicable to a case where a network-on-chip is used for data transmission, and the method may be executed by a network-on-chip data transmission apparatus for many cores provided in an embodiment of the present invention, and the apparatus may be implemented in a software and/or hardware manner, and may be generally integrated into a computer device. As shown in fig. 1a, the method of this embodiment specifically includes:

s110, acquiring load data of each node in the network on chip.

The network on chip is composed of a plurality of nodes. Each node is connected to a corresponding core. Data received by the node may be sent to the core for processing. Each node is connected to at least one node, and the nodes can transmit data to the connected at least one node.

The load data is used to describe the congestion liability of the node, e.g. the amount of transmitted data sent and received by the node. Generally, the more the number of data transmitted by a node in a unit time is, the greater the throughput is, the greater the load is; the smaller the amount of data transmitted by a node per unit time, the smaller the throughput and the smaller the load. The load data of a node may comprise position data, and/or throughput data, etc. The load data of the node is related to or matched with the position of the node on the network-on-chip and/or the throughput data of the node. The location data, which may refer to the location of the node on the network on chip, is used to evaluate the distance between the location of the node and the central location. The central position may refer to a geometric central position of the network-on-chip topology, for example, the network-on-chip is a two-dimensional grid, and the central position is a diagonal intersection position. The network on chip may be placed in a coordinate system, and coordinates of each node may be determined, and a distance between the coordinates of the node and coordinates of the node at the center position may be calculated, determined as a distance between the position of the node and the center position. The throughput data may count the throughput of the node over a historical period of time, such as the number of bits that can be transmitted per second. In addition, the load data of the node may also include other contents, and the embodiment of the present invention is not particularly limited.

The topology structure of the network on chip refers to the arrangement form of network nodes and link channels, and determines the port structure of the nodes in the network on chip. For example, the topology is a two-dimensional topology, and may include a ring, a Mesh (Mesh), a tree, or the like. Optionally, the network on chip is a two-dimensional grid composed of a plurality of nodes, as shown in fig. 1 b.

By configuring the topological structure of the network-on-chip as a two-dimensional grid, the structure of the network-on-chip can be simplified, the mapping of a compiler is facilitated, the algorithm of a communication path in the network-on-chip is simplified, and the chip area of the network-on-chip is reduced, so that the cost of many cores is reduced.

In the two-dimensional grid topological structure, nodes and link channels are sequentially arranged in a two-dimensional grid shape along the X direction and the Y direction in a two-dimensional plane, and each node is connected with an adjacent node through the link channels in the X direction and the Y direction. The two-dimensional coordinate system can be established based on the two-dimensional grid, the address of the node is a two-dimensional coordinate in the two-dimensional grid, and the two-dimensional coordinate comprises a first direction coordinate x and a second direction coordinate y. The absolute values of x and y are respectively greater than or equal to 0, the absolute values of x and y represent the propagation distance, the symbol represents the propagation direction, and the transmission process of the data packet is the process that the absolute values of x and y are reduced or unchanged (always kept at 0). x and y may be arbitrarily set according to a relative position in the network on chip, which is not limited in the embodiment of the present invention.

The main data sending node adds information of a flag bit in a packet header of a data packet which needs to be sent, wherein the flag bit is used for indicating a specific destination or destinations to which the data packet needs to be sent, and then when the main data sending node sends the data packet to an intermediate node connected with the main data sending node, the intermediate node sends the data packet to the destination directly or through other intermediate nodes. Or the main data sending node sends the data packet to the multicast node through the intermediate node, after the multicast node receives the data packet, the multicast node searches the address of each multicast destination in the storage area of the multicast node according to the information of the flag bit in the packet header of the data packet, determines the number of the destinations, then the multicast node copies the data in the data packet according to the number of the destinations, each copied data is packaged with one of the searched addresses of each multicast destination, and multicasts a plurality of packaged data packets to the corresponding destinations in parallel, for example, the packaged data packet can be directly sent to the destination, or the data packet can be sent to the destination through the intermediate node.

Illustratively, determining that the propagation direction is a first direction and then a second direction, x is 3, y is 2, and the destination address in the packet is (3, 2), wherein if the propagation direction is the first direction and then the second direction, the node modifies the destination address in the packet to be (2, 2) and sends the packet to an adjacent node in the x direction; if the propagation direction is determined to be the second direction first and the first direction, the node modifies the destination address in the data packet to (3, 1) and sends the data packet to the adjacent node in the y direction. In the embodiment of the present invention, the propagation direction may be set arbitrarily, and the embodiment of the present invention does not limit the propagation direction.

And S120, determining the number of the virtual channels matched with each node according to the load data of each node.

As in the previous example, the link lanes of the network on chip are physical lanes. The output node is connected with the input node through a physical channel. Because there are multiple data sources of data received by the output node, when the physical channel is transmitting data sent by a certain data source, the data sent by other data sources cannot obtain the use right of the physical channel, and if the data source is blocked, the data sent by other data sources can be output after waiting for a set time. In the embodiment of the invention, the data transmission is carried out by adopting the virtual channel. For example, in the output node, the physical channel corresponds to a plurality of virtual channels, while in the input node, the physical channel corresponds to a plurality of virtual channels. In the data transmission process, the physical channel realizes the functions of multiple channels, for example, in a first time slice, data sent by a first data source is transmitted, in a second time slice, data sent by a second data source is transmitted, one physical channel supports multiple virtual channels to multiplex the physical channel, the transmission functions of different virtual channels are realized, and when one group of data to be transmitted is blocked, other groups of data can still obtain the use right of the physical channel, so that the utilization rate of the physical channel can be improved, the network throughput rate is improved, and the occurrence of network deadlock is avoided.

For example, in a network on chip, each node is configured with five-way physical channels, east, south, west, north and kernel (local), and each physical channel may be configured with at least one virtual channel. For example, a virtual channel corresponding to a physical channel between two adjacent nodes corresponds to a FIFO queue of one node and a FIFO queue of another node, and accordingly, the virtual channel is a communication channel between an output FIFO queue of a sending node and an input FIFO queue of a receiving node, and different virtual channels are actually combinations of different output FIFO queues and different input FIFO queues. The processing time may be divided into at least one time slice in a time division multiplexing manner, and the time slices may be allocated to corresponding virtual channels. The virtual channels correspond to the queues one by one, the number of the virtual channels is equal to that of the queues, and the queues are used for storing data which need to be transmitted through the corresponding virtual channels. In the transmission process of the node, a plurality of virtual channels corresponding to one physical channel multiplex the same physical channel in a time-sharing manner.

The process of data packet transmission may be: and the data packet received from the input port is sent to the queue of the corresponding virtual channel, and then sent to the appointed output port according to the path obtained by pre-calculation. When a plurality of input ports send data packets from the same output port, competition is generated, the node responds to the data packets with high priority according to a preset arbitration algorithm, data which are not responded are stored in a queue, and the node waits for re-request to be output from the virtual channel corresponding to the output port.

In fact, when congestion occurs in the network, all messages to be sent from one port can enter a plurality of queues, and the messages are processed according to the priority of each queue. The queue may include at least one of: first-in First-out queues (FIFOs), Priority Queues (PQ), Custom Queues (CQs), Weighted Fair Queues (WFQ), Class-Based Weighted Fair queues (CBWFQ), and so on. Optionally, in the embodiment of the present invention, the queue corresponding to the virtual channel is a FIFO queue, and the number of the virtual channels is equal to the number of the FIFO queues corresponding to the virtual channel.

The FIFO queue does not classify the messages, the messages enter the queue according to the sequence of the arrival of the messages at the interfaces, and a best-effort forwarding mode is adopted, so that when transmission data sent by the nodes are congested, data packets can be accurately sent according to a received time sequence, and congestion management is realized.

The data throughput of the node, and thus the congestion liability level of the node, may be determined based on the load data. And configuring the number of virtual channels corresponding to each physical channel of the node according to the congestion degree of the node. The number of virtual channels of a node may be understood as the number of virtual channels corresponding to each physical channel of the node.

It can be understood that, when the node congestion condition is small, and the number of the data to be transmitted is not matched with the number of the virtual channels, for example, the data amount of the data to be transmitted is small, the number of the virtual channels is large, and there is a portion of virtual channels where no data can be transmitted, at this time, no data can be transmitted in a time slice corresponding to the virtual channel, which results in a waste of transmission resources. Or when the congestion condition of the node is large and the number of the data to be transmitted is not matched with the number of the virtual channels, for example, the data amount of the data to be transmitted is large, the number of the virtual channels is small, and there is a situation that part of the data of the virtual channels is always congested, and the data to be transmitted multiplexing the virtual channels cannot obtain the use right of the physical channel and cannot transmit the data all the time, so that the data transmission efficiency is low. Therefore, reasonable configuration of the number of virtual channels can be realized by matching the congestion degree of the configuration nodes with the number of the virtual channels, the network throughput rate is improved, the utilization rate of physical channels is improved, and the waste of transmission resources is reduced.

For example, the number of virtual channels may be queried according to the value of the load data and the corresponding relationship between the preset value and the number of virtual channels. The same number of virtual channels may be configured for each physical channel of a certain node, or a different number of virtual channels may be configured for load data of each physical channel. The present invention is not limited to the above embodiments, and the present invention is not limited to the above embodiments.

Illustratively, the load data is the distance between the location of the node and the center location. For example, if the distance is [0,3], the number of virtual channels is 8; the number of virtual channels is 4 if the distance is (3, 10), and 2 if the distance is (10, 100).

The number of virtual channels of each node may be the same or different. The network on chip may be divided into a plurality of areas according to the load data of each node, and the number of virtual channels of the node in each area is determined respectively, or the number of virtual channels of each node is determined respectively according to the load data of each node, which is not limited in this disclosure.

S130, configuring virtual channels for the nodes according to the number of the virtual channels of the nodes, and indicating the nodes to transmit data by adopting the configured virtual channels.

Configuring the virtual channel may include: the method comprises the steps of correspondingly configuring a virtual channel with the number of virtual channels for each physical channel of a node, and configuring information of data transmitted by each virtual channel, time slices allocated to the virtual channels and the like. The virtual channel is configured to multiplex a few physical channels, so that the requirement on the physical channels can be reduced, the number of the physical channels required by data transmission is reduced, and the chip area is reduced. And for the same chip area, the same virtual channel number is fully configured for data transmission, and compared with the mode that a small number of nodes easy to jam are configured with a larger virtual channel number for data transmission, the power consumption of the former is larger than that of the latter.

Optionally, instructing each of the nodes to transmit data by using the configured virtual channel includes: and instructing each node to adopt the configured virtual channel to transmit data in a time division multiplexing mode.

The data transmission can be realized by multiplexing the same physical channel by adopting a plurality of virtual channels in a time division multiplexing mode, thereby achieving the purpose of multiplex transmission, improving the utilization rate of the physical channel, and simultaneously asynchronously processing the data in each queue, and preventing deadlock.

In the process of transmitting data by the nodes, in the time slices matched with the virtual channels, the data in the queues corresponding to the virtual channels are sent to the corresponding output ports, and the data output from the output ports are sent to the connected nodes through the physical channels, wherein the time slices can be divided according to the number of the virtual channels, and one time slice is matched with at least one virtual channel.

Example two

Fig. 2a is a flowchart of a multi-core network-on-chip data transmission method according to a second embodiment of the present invention, which is embodied based on the foregoing embodiments. The method of the embodiment may include:

s210, acquiring load data of each node in the network on chip.

Non-exhaustive descriptions of embodiments of the present invention may be had with reference to the foregoing embodiments.

S220, determining the priority of each node according to the position of each node on the network on chip, wherein the load data is related to the position of each node on the network on chip.

The load data of the node is related to the location of the node on the network on chip and/or the throughput data of the node. In some alternative embodiments, the load data of each node is related to its location on the network on chip. For example, to reduce the number of transmission steps of data between nodes of the network on chip, in the process of compiling the transmission path, there are more transmission paths passing through the node at the central position of the network on chip and fewer transmission paths passing through the nodes at the edge position of the network on chip, so that the congestion degree of the node at the central position of the network on chip is higher and the congestion degree of the node at the edge position of the network on chip is lower. The load data of the nodes in different position areas of the network on chip are not identical, and the load data of each node is related to the position of each node on the network on chip, so that the priority of each node can be determined according to the position of each node on the network on chip.

For example, the location of the node on the network-on-chip may be determined according to the topology of the network-on-chip. According to the position of the node on the network on chip, the distance between the position of the node and the central position can be calculated, and the priority of the node is determined according to the distance.

Illustratively, the topology of the network on chip is a two-dimensional grid, and the position of the node may be represented by coordinates in a coordinate system established by the two-dimensional grid, and the central position may be the position of the node near the diagonal intersection. For another example, the topology of the network on chip is a tree, the parent-child hierarchical relationship between nodes may represent the positions of the nodes, and the central position may be a position located at half of the total number of layers. The specific configuration may be set according to actual situations, and the embodiment of the present invention is not particularly limited. The priority of the node is used for describing the congestion tendency degree of the node, the priority of the node is in direct proportion to the congestion tendency degree of the node, the higher the priority is, the higher the congestion tendency degree is, and the more the number of virtual channels is; the lower the priority, the lower the congestion tendency, and the smaller the number of virtual lanes.

Optionally, the topology structure of the network on chip is a two-dimensional grid, and the position of the node on the network on chip is a coordinate of the node in the two-dimensional grid. And determining the priority of the node according to the distance between the position of the node and the central position.

The corresponding relation between the priority and the distance can be configured in advance, the distance between the node and the central position is calculated, and the priority of the node is determined.

Optionally, the determining the priority of each node according to the location of each node on the network on chip includes: in the network on chip, the network is divided into at least two areas from a network center position to a network periphery; and determining the priority of the nodes in each area according to the distance between each area and the network center position, wherein the priority of the node in the area positioned at the network center position is the highest, and the priority of the node in the area positioned at the outermost periphery of the network is the lowest.

The hub location may refer to the geometric center of the network. Illustratively, the topology structure of the network on chip is a two-dimensional grid, and the center position of the network is the position closest to the intersection point of the diagonals of the two-dimensional grid. And according to the distance between the position of the node and the central position of the network, carrying out region division according to the distance. For example, the regions in which the distance ranges are set are divided into the same region.

And configuring the nodes in the same area with the same priority, and determining the priority of the nodes in each area according to the position relation between each area and the network center position. The priority of the nodes in each area is configured according to the rule that the distance between the network center position and the area is inversely proportional to the priority of the nodes in the area, for example, the larger the distance between the network center position and the area is, the lower the priority of the area is; the smaller the distance between the hub location and the zone, the higher the zone priority.

Illustratively, as shown in FIG. 2b, each square represents a node, wherein the squares filled with diagonal lines represent nodes located at a central location of the network. The two-dimensional grid is divided into three areas, wherein the first area is a closed area formed by a solid line 201, the second area is a closed area between the solid line 201 and a dotted line 202, and the third area is a closed area between the dotted line 202 and a dotted line 203.

It can be understood that, in the data transmission process, in order to minimize the transmission step number, when selecting the intermediate node of the transmission channel path, each node preferentially transmits data through a relatively intermediate region. This results in different congestion levels of transmission lines in each area, and as shown in fig. 2c, all of the 4 lines use nodes represented by squares filled with oblique lines as intermediate nodes, that is, all of the transmission data of the 4 lines need to be forwarded through the nodes. Illustratively, the first zone is a most congested zone, the second zone is a more congested zone, and the third zone is an uncongested zone.

In the prior art, when a certain node on a transmission path is congested, the transmission efficiency is reduced by waiting or detouring, and the number of transmission steps is increased by detouring.

By setting different numbers of virtual channels in different areas, the transmission steps can be reduced, and the transmission efficiency can be improved.

Optionally, the determining the priority of each node according to the location of each node on the network on chip includes: dividing the network on chip into at least two regions; in each area, dividing the area into at least two area units from the central position of the area to the periphery of the area; in each area, determining the priority of the nodes in each area unit according to the distance between each area unit and the area center position, wherein the priority of the node in the area unit positioned at the area center position is the highest, and the priority of the node in the area unit positioned at the outermost periphery of the area is the lowest.

The number of nodes in the network on chip is large, and the area division may be performed on the network on chip in advance, and at this time, the area division method may be set according to needs, for example, the division may be performed into areas including nodes with the same or similar number, and the division may be performed according to other rules, which is not limited specifically in the embodiment of the present invention.

In each area, area division is performed to form at least two area units. The region center position may refer to a geometric center of the region. Illustratively, the topology structure of the network-on-chip is a two-dimensional grid, and the center position of the area is the position closest to the intersection point of the diagonals of the area. And according to the distance between the position of the node and the central position of the area, carrying out area division on the area according to the distance. For example, the area in which the distance range is set is divided into the same area unit.

And configuring the nodes in the same area unit with the same priority, and determining the priority of the nodes in each area unit according to the position relation between each area unit and the area center position. The priority of the node in each area unit is configured according to the rule that the distance between the area center position and the area unit is in inverse proportion to the priority of the node in the area unit, for example, the larger the distance between the area center position and the area unit is, the lower the priority of the node in the area unit is; the smaller the distance between the area center position and the area unit, the higher the priority of the node in the area unit.

Illustratively, each square represents a node, as shown in FIG. 2 d. The closed regions formed by the 4 broken lines 202 are pre-divided regions, and in each region, the first region unit is the closed region formed by the solid line 201, and the second region unit is the closed region between the solid line 201 and the broken line 202. The first zone unit is the most congested zone and the second zone unit is the more congested zone.

The area division is carried out on the network on chip in advance, the area division is further carried out according to the positions of the nodes in each area to form a plurality of area units, different virtual channels with different numbers are respectively arranged on different area units, the transmission step number can be reduced, the transmission efficiency can be improved, and the division modes of the virtual channel numbers can be increased to adapt to different application scenes.

And S230, determining the number of the virtual channels matched with each node according to the priority of each node and the corresponding relation between the priority of the preconfigured node and the number of the virtual channels.

The correspondence of the priority to the number of virtual channels may be preconfigured. The corresponding relationship may be set according to actual conditions, and the embodiment of the present disclosure is not particularly limited. The number of virtual channels is generally proportional to the number of transmission directions between nodes. Illustratively, the number of virtual channels of the high-priority node > the number of virtual channels of the medium-priority node > the number of virtual channels of the low-priority node. The number of transmission directions from node to node is 4. For example, the correspondence may be: the nodes with high priority correspond to the number 4 of virtual channels, the nodes with medium priority correspond to the number 2 of virtual channels, and the nodes with low priority correspond to the number 1 of virtual channels.

S240, configuring a virtual channel for each node according to the number of the virtual channels of each node, and instructing each node to transmit data by adopting the configured virtual channels.

In the prior art, the structure of the node n1 is shown in fig. 2 e. The queues are used for receiving data requests from other nodes n or cores c, and the status register is used for controlling on and off, and is specifically used for selecting data in which queue (b1 or b2, etc.) to perform data transmission, namely for gating which virtual channel. The arbitration module is used for arbitrating the received data and determining the transmission sequence of the data. The switching module is used to control data exchange, for example, a data packet may be sent to other node n or core c through the switching module.

For example, the application scenario of data transmission between node n1 and node n2 is shown in fig. 2 f. If the data of the node n1 is congested, even if the physical channel between the node n1 and the node n2 is free, but the queue b1 and the queue b2 in the node n1 are congested, the node n2 cannot transmit the data to the node n1 in the b1 and the b2, and the node n1 cannot transmit the congested data to the node n2, so that the data cannot be transmitted between the node n1 and the node n 2.

For example, the node n1 is the node of the first area, i.e., the node n1 has the highest priority and the highest congestion tendency. The node n2, the node n3, the node n4 and the node n5 are nodes in the second area, namely the nodes n2 to n5 have medium priorities and low congestion tendency. The node n1 structure is now as shown in FIG. 2 g. At this time, the number of virtual channels of the node n1 is 4, and the number of virtual channels of the nodes n2-n5 is 2. The situation that the nodes n1-n5 are bypassed in data congestion can be avoided, so that circuits between the nodes can be utilized to a large extent, transmission step number reduction and routing congestion reduction are achieved, meanwhile, queues with the same number of virtual channels are prevented from being added to all the nodes, and the chip area occupied by resources required by the queues of the nodes without congestion can be reduced while the transmission efficiency is improved.

For example, an application scenario of data transmission between node n1 and node n2 is shown in fig. 2 h. If data of the node n1 is congested, a physical channel between the node n1 and the node n2 is free, a queue of the node n1 is added, and queues b1 and b2 in the node n1 are congested, the node n2 cannot transmit the data to b1 and b2 in the node n1, but queues b3 and b4 in the node n1 are not congested, and the node n2 can transmit the data to b1 and b2 in the node n1, so that a line between the node n1 and the node n2 is utilized to a large extent, and therefore when some queues in the node n1 are congested, the data can still be transmitted to the queues which are not congested in the node n1, detours are reduced, transmission steps are reduced, transmission efficiency is improved, chip area required by the whole nodes for increasing the number of queues is reduced, and power consumption required by the whole nodes for increasing the number of queues is reduced.

The embodiment of the invention determines the priority of the node according to the position of the node on the network on chip, determines the number of the virtual channels matched with the node according to the corresponding relation between the number of the virtual channels of the node and the priority, determines the priority of the number of the node according to the position of the node, distributes the data volume of the matched virtual channels to the node, can accurately distribute the number of the virtual channels according to the data congestion condition of the node, improves the distribution flexibility of the number of the virtual channels, reasonably configures the number of the virtual channels, improves the adaptability of the network on chip, increases the service mode of the network on chip and improves the utilization rate of the physical channels of each node in the network on chip.

EXAMPLE III

Fig. 3 is a schematic diagram of a network-on-chip data transmission apparatus with multiple cores according to a third embodiment of the present invention. The third embodiment is a corresponding apparatus for implementing the many-core network-on-chip data transmission method provided by the foregoing embodiments of the present invention, and the apparatus may be implemented in a software and/or hardware manner, and may generally be integrated into a computer device, and the like.

A load data obtaining module 310, configured to obtain load data of each node in the network on chip;

a virtual channel number determining module 320, configured to determine, according to the load data of each node, a number of virtual channels matched with each node;

a data transmission module 330, configured to configure a virtual channel for each node according to the number of virtual channels of each node, and instruct each node to transmit data using the configured virtual channel.

Further, the virtual channel number determining module 320 includes: a priority determining unit of a node, configured to determine a priority of each node according to a location of each node on the network on chip, where the load data is related to the location of the node on the network on chip; and determining the number of the virtual channels matched with each node according to the priority of each node and the corresponding relation between the priority of the preconfigured node and the number of the virtual channels.

Further, the priority determining unit of the node includes: the position center area dividing unit is used for dividing the network center position to the periphery of the network into at least two areas in the network-on-chip; and determining the priority of the nodes in each area according to the distance between each area and the network center position, wherein the priority of the node in the area positioned at the network center position is the highest, and the priority of the node in the area positioned at the outermost periphery of the network is the lowest.

Further, the priority determining unit of the node includes: a location center area unit dividing subunit, configured to divide the network on chip into at least two areas; in each area, dividing the area into at least two area units from the central position of the area to the periphery of the area; in each area, determining the priority of the nodes in each area unit according to the distance between each area unit and the area center position, wherein the priority of the node in the area unit positioned at the area center position is the highest, and the priority of the node in the area unit positioned at the outermost periphery of the area is the lowest.

Further, the topology structure of the network on chip is a two-dimensional grid.

Further, the data transmission module 330 includes: and the time division multiplexing transmission unit is used for indicating each node to adopt the configured virtual channel to transmit data in a time division multiplexing mode.

Further, the number of the virtual channels is equal to the number of the first-in first-out queues corresponding to the virtual channels.

The image generation device can execute the many-core network-on-chip data transmission method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the executed image generation method.

Example four

Fig. 1b is a schematic structural diagram of a network on chip according to a fourth embodiment of the present invention. Fig. 2e is a schematic structural diagram of a node in an on-chip network according to a fourth embodiment of the present invention.

The network on chip comprises a plurality of nodes, and each node is provided with a virtual channel for transmitting data; the virtual channel number of the node is determined according to the load data of the node, and the virtual channel numbers of at least two nodes are different.

In fact, before the chip is put into use, the number of virtual channels of the nodes in the network on chip configured on the chip is configured. In general, after a chip is put into use, the chip can be arranged in any electronic device, in this case, the number of virtual channels of each node is fixed, and the node performs data transmission through the virtual channels of the number of virtual channels arranged in advance.

Optionally, each node is configured with a buffer, and one virtual channel corresponds to one buffer.

The buffers correspond to the queues one to one, and one buffer stores the data packets buffered by one queue. The number of virtual channels of at least two nodes is different, the number of buffers in a node is equal to the number of virtual channels of the node, and correspondingly, the number of buffers of at least two nodes is different. One virtual channel corresponds to one buffer and to one buffer in the other node. That is, a virtual channel may be established between the first queue of the first node and the second queue of the second node, and when the virtual channel is turned on, the data packet in the first queue is sent to the second queue for buffering.

The buffer corresponds to the queue and stores the data packets cached by the queue, and the data packets to be sent can be cached so as to send the data packets in sequence according to the priority, thereby realizing time-sharing multiplexing of the virtual channel, improving the utilization rate of the physical channel and improving the transmission efficiency.

Optionally, the network on chip includes at least two regions, where the regions are divided from a network center position to a network periphery in the network on chip, nodes in each region are configured with priorities, the priorities of the nodes are determined according to a distance between the region where the node is located and the network center position, the priorities of the nodes are used to determine the number of virtual channels matched with the nodes, and the priority of the node in the region located at the network center position is the highest, and the priority of the node in the region located at the network periphery is the lowest.

Optionally, the network on chip includes at least two regions, where each region includes at least two region units, where the region units are divided from a region center position to a region periphery in the region, a node in each region unit is configured with a priority, the priority of the node is determined according to a distance between the region unit where the node is located and the region center position, the priority of the node is used to determine the number of virtual channels matched with the node, the priority of the node in the region unit located at the region center position is the highest, and the priority of the node in the region unit located at the region periphery is the lowest.

By the mode, in the network on chip, the number of the node virtual channels with large load is large, the number of the node virtual channels with small load is small, the number of the virtual channels of the node can be configured in a targeted manner according to the load condition of the node, transmission resources can be reasonably configured, and the waste of the transmission resources is reduced.

The embodiment of the invention configures a plurality of nodes on the network on chip, each node is configured with a virtual channel for transmitting data, and the number of the virtual channels of at least two nodes in the network on chip is different, so that the number of the virtual channels with different numbers can be respectively configured for each node aiming at the load data of each node, thereby improving the utilization rate of physical channels, reducing the waste of transmission resources, realizing reasonable configuration of transmission resources, improving the communication efficiency, reducing the chip area and reducing the power consumption of data transmission.

EXAMPLE five

Fig. 4 is a schematic structural diagram of a computer device according to a fifth embodiment of the present invention. As shown in fig. 4, the computer apparatus includes a processor 41, a memory 42, an input device 43, and an output device 44; the number of processors 41 in the computer device may be one or more, and one processor 41 is taken as an example in fig. 4; the processor 41, the memory 42, the input device 43 and the output device 44 in the computer apparatus may be connected by a bus or other means, and the connection by the bus is exemplified in fig. 4.

The memory 42 serves as a computer-readable storage medium, and may be used to store software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to the network-on-chip data transmission method of the many-core in the embodiment of the present invention (for example, a load data acquisition module, a virtual channel number determination module, and a data transmission module in the network-on-chip data transmission apparatus of the many-core). The processor 41 executes various functional applications and data processing of the computer device by executing software programs, instructions and modules stored in the memory 42, that is, implements the network-on-chip data transmission method of many cores described above.

The memory 42 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the terminal, and the like. Further, the memory 42 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, memory 42 may further include memory located remotely from processor 41, which may be connected to a computer device over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The input device 43 may be used to receive input numeric or character information and to generate key signal inputs related to user settings and function controls of the computer apparatus. The output device 44 may include an output device such as a display screen.

EXAMPLE six

A sixth embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements a multi-core network-on-chip data transmission method according to all embodiments of the present invention:

that is, the program when executed by the processor implements: acquiring load data of each node in the network on chip; determining the number of virtual channels matched with each node according to the load data of each node; and configuring virtual channels for the nodes according to the number of the virtual channels of the nodes, and indicating the nodes to transmit data by adopting the configured virtual channels.

Computer storage media for embodiments of the invention may employ any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a RAM, a Read-Only Memory (ROM), an Erasable Programmable Read-Only Memory (EPROM), a flash Memory, an optical fiber, a portable CD-ROM, an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, Radio Frequency (RF), etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a LAN or a WAN, or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims

1. A many-core network-on-chip data transmission method is characterized by comprising the following steps:

acquiring load data of each node in the network on chip;

2. The method of claim 1, wherein the load data relates to a location of the node on a network on chip;

the determining the number of the virtual channels matched with each node according to the load data of each node includes:

determining the priority of each node according to the position of each node on the network on chip;

and determining the number of the virtual channels matched with each node according to the priority of each node and the corresponding relation between the priority of the preconfigured node and the number of the virtual channels.

3. The method of claim 2, wherein determining the priority of each of the nodes based on its location on the network-on-chip comprises:

in the network on chip, the network is divided into at least two areas from a network center position to a network periphery;

and determining the priority of the nodes in each area according to the distance between each area and the network center position, wherein the priority of the node in the area positioned at the network center position is the highest, and the priority of the node in the area positioned at the outermost periphery of the network is the lowest.

4. The method of claim 2, wherein determining the priority of each of the nodes based on its location on the network-on-chip comprises:

dividing the network on chip into at least two regions;

in each area, dividing the area into at least two area units from the central position of the area to the periphery of the area;

in each area, determining the priority of the nodes in each area unit according to the distance between each area unit and the area center position, wherein the priority of the node in the area unit positioned at the area center position is the highest, and the priority of the node in the area unit positioned at the outermost periphery of the area is the lowest.

5. The method according to claim 1, characterized in that the topology of the network on chip is a two-dimensional mesh.

6. The method of claim 1, wherein instructing each of the nodes to transmit data using the configured virtual channel comprises:

and instructing each node to adopt the configured virtual channel to transmit data in a time division multiplexing mode.

7. The method of claim 1, wherein the number of virtual channels is equal to the number of FIFO queues corresponding to the virtual channels.

8. A many-core network-on-chip data transmission apparatus, comprising:

9. A network on chip, comprising a plurality of nodes, each of said nodes configured with a virtual channel for transmitting data; the number of the virtual channels of the nodes is determined according to the load data of the nodes, and the number of the virtual channels of at least two nodes is different.

10. The network on chip of claim 9, wherein each of the nodes is configured with a buffer, and wherein one virtual channel corresponds to one buffer.

11. The network on chip of claim 9, wherein the network on chip comprises at least two areas, the areas are divided from a network center position to a network periphery in the network on chip, nodes in each area are configured with priorities, the priorities of the nodes are determined according to a distance between the area where the nodes are located and the network center position, the priorities of the nodes are used for determining the number of virtual channels matched with the nodes, the priority of the node in the area located at the network center position is the highest, and the priority of the node in the area located at the network periphery is the lowest.

12. The network on chip of claim 9, wherein the network on chip comprises at least two zones, the zones comprise at least two zone units, the zone units are divided from a zone center position to a zone periphery in the zones, nodes in each zone unit are configured with priorities, the priorities of the nodes are determined according to distances between the zone units where the nodes are located and the zone center position, the priorities of the nodes are used for determining the number of virtual channels matched by the nodes, the priority of the node in the zone unit located at the zone center position is the highest, and the priority of the node in the zone unit located at the zone periphery is the lowest.

13. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program implements the network-on-chip data transmission method of a many-core as claimed in any of claims 1-7.

14. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the network-on-chip data transmission method of a many-core according to any one of claims 1 to 7.