CN114844757B - Network-on-chip design method for distributed parallel operation algorithm - Google Patents

Network-on-chip design method for distributed parallel operation algorithm Download PDF

Info

Publication number
CN114844757B
CN114844757B CN202210174904.0A CN202210174904A CN114844757B CN 114844757 B CN114844757 B CN 114844757B CN 202210174904 A CN202210174904 A CN 202210174904A CN 114844757 B CN114844757 B CN 114844757B
Authority
CN
China
Prior art keywords
network
node
multicast
data
unicast
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210174904.0A
Other languages
Chinese (zh)
Other versions
CN114844757A (en
Inventor
黄乐天
邓子阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yangtze River Delta Research Institute of UESTC Huzhou
Original Assignee
Yangtze River Delta Research Institute of UESTC Huzhou
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yangtze River Delta Research Institute of UESTC Huzhou filed Critical Yangtze River Delta Research Institute of UESTC Huzhou
Priority to CN202210174904.0A priority Critical patent/CN114844757B/en
Publication of CN114844757A publication Critical patent/CN114844757A/en
Priority to US18/068,710 priority patent/US20230269200A1/en
Application granted granted Critical
Publication of CN114844757B publication Critical patent/CN114844757B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/10Packet switching elements characterised by the switching fabric construction
    • H04L49/109Integrated on microchip, e.g. switch-on-chip
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/04Network management architectures or arrangements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/20Support for services
    • H04L49/201Multicast operation; Broadcast operation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/30Peripheral units, e.g. input or output ports
    • H04L49/3063Pipelined operation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/104Peer-to-peer [P2P] networks
    • H04L67/1074Peer-to-peer [P2P] networks for supporting data block transmission mechanisms
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/54Store-and-forward switching systems 
    • H04L12/56Packet switching systems
    • H04L12/5601Transfer mode dependent, e.g. ATM
    • H04L2012/5638Services, e.g. multimedia, GOS, QOS
    • H04L2012/564Connection-oriented
    • H04L2012/5641Unicast/point-to-point
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/54Store-and-forward switching systems 
    • H04L12/56Packet switching systems
    • H04L12/5601Transfer mode dependent, e.g. ATM
    • H04L2012/5638Services, e.g. multimedia, GOS, QOS
    • H04L2012/564Connection-oriented
    • H04L2012/5642Multicast/broadcast/point-multipoint, e.g. VOD
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Multi Processors (AREA)

Abstract

The application relates to the technical field of computer algorithms, in particular to a network-on-chip design method for a distributed parallel computing algorithm, which divides the network-on-chip into two layers according to the distributed parallel computing algorithm of the network-on-chip, wherein the network-on-chip comprises a unicast network and a multicast network, the unicast network realizes point-to-point propagation among nodes, and independent computing data required by each computing node is transmitted to each computing node in a unicast mode; the multicast network is a customized multicast network facing the distributed parallel computing algorithm, is used for transmitting common operation data to all operation nodes, realizes the efficient transmission of data packets in the network through the combination of the unicast network and the multicast network, sets a bidirectional replication node or a receiving node at each operation node through designing a multicast tree-shaped transmission architecture facing the distributed parallel computing algorithm, and is different from the conventional multicast network-on-chip in that each node is provided with a multicast transmitting and receiving module, so that the use of on-chip resources is reduced to the greatest extent.

Description

Network-on-chip design method for distributed parallel operation algorithm
Technical Field
The application relates to the technical field of computer algorithms, in particular to a network-on-chip design method for a distributed parallel operation algorithm.
Background
Distributed parallel computing can be defined as an algorithm which has the same operation steps and no data dependence among different computing data in the computing process and can be executed in parallel. Typical distributed operations include distance operations between two coordinate vectors, various matrix multiplications, convolution operations in a deep learning algorithm, and the like.
The distributed parallel operation is characterized in that the operation is dense and decentralizing, the operation among data is independent, and the actual operation efficiency is very low because a large number of operations are involved in the current general purpose processor (CPU) and General Purpose Graphics Processor (GPGPU), so the patent designs an on-chip network architecture aiming at the operations, and the operations are accelerated by adopting a customized hardware acceleration mode.
The most common method for designing a hardware accelerator for distributed parallel operation is to use a plurality of operation units, each unit is responsible for a part of operation, all units operate in parallel together, and then the final result is integrated together. However, the biggest problem brought by the method is that in the process of integrating and storing the calculation result into the storage unit, the number of the operation units is large, so that the control signal of the storage unit is excessively decoded and the combination logic is selected during the result storage, and the time sequence is poor. This affects the highest frequency clock, thereby degrading overall performance.
Aiming at the problem that the parallel operation combination logic delay of a plurality of operation units is overlarge, the interconnection among the operation units is usually carried out in an on-chip network mode instead of a bus or a switching matrix in the industry, and compared with a bus, the on-chip many-core system of the networked communication structure has the advantages that: the method can support concurrent data transmission, has a topology structure which is easier to expand, and has a larger communication bandwidth. The networked communication architecture also provides a rich redundancy resource with more options in reliability design. Network-on-chip is widely contemplated and used as a representative of a networked communication structure. Fig. 1 is a 2D-Mesh structure common to network on chip, which mainly comprises a router, a link, and a network interface, wherein a processing unit may comprise a memory interface, a general processor, a hardware acceleration unit, an IO port, and the like.
The transmission between the network-on-chip is mainly in the form of a receiving and transmitting packet, and the router is a main component of the network-on-chip and is mainly responsible for temporary storage and orientation of the data packet, and can be understood as a transfer station of data transmission in the network. The links connect the components of the network-on-chip into a connected network that implements the transceiving of packets through the connection of the upstream router output register stage and the downstream router input buffer. The network interface is responsible for packaging and transmitting the data of the processing unit and for disassembling and transmitting the packets transmitted by the router to the processing unit.
The network-on-chip data packets are sent by a source node, one or more destination nodes of which may be provided, and when only one destination node is called unicast, a plurality of destination nodes are called multicast. Because the multicast data packet needs to store a plurality of destination node positions, the data packet format is more complex than that of the unicast data packet, and the current common multicast strategy comprises performing multicast operation in a unicast mode, namely sequentially sending the unicast data packet to a plurality of destination nodes, but the scheme is simple to implement, but the problem brought is that the network traffic can be greatly increased. Another way is called virtual circuit multicast tree (VCTM), which adds a routing table to each routing table, and before each multicast is started, the configuration packet of the multicast is sent to the routing table of the corresponding node in a unicast mode, and when the multicast packet is sent, the branching direction and whether the router branches through are configured according to the corresponding same index ID of the routing table. A problem with such a general multicast network is that it increases the packet load in the network and greatly increases the wiring resource consumption of the network on chip.
Both current general purpose processors (CPUs) and general purpose image processors (GPGPUs) have difficulty meeting the real-time requirements of distributed parallel computing type algorithms. Therefore we need to design custom hardware for the characteristics of the algorithm.
The application solves the problem of lower clock frequency caused by overlarge bus interconnection combination logic delay of the traditional hardware accelerator comprising a plurality of operation units by designing the customized network-on-chip oriented to the algorithm, and also solves the problems of low network communication efficiency, high network consumption hardware resources and the like caused by sharing one network by the unicast and the multicast of the general network-on-chip.
Because the network on chip is oriented to the distributed parallel operation type algorithm, the algorithm has a similar operation structure, and the operation can be split into a plurality of groups, for example, a plurality of typical algorithms in the algorithm: all distance calculation among the coordinates is carried out in the two coordinate vectors, and calculation between one coordinate M and different coordinates N is carried out in sequence; multiplication of two matrixes, namely multiplying rows P by different columns Q; an algorithm of the type of convolving … … between the same convolution kernel and different matrices in a convolution operation uses the same data to calculate this characteristic corresponding to the multicast scenario of the network on chip, i.e. only the same operational data is sent from the data receiving node to each operational node. While all nodes of the traditional multicast method can send multicast packets, the method occupies a large amount of on-chip resources in the implementation process, and simultaneously causes redundancy of hardware resources.
In order to save on-chip resources while maximally guaranteeing the multicast efficiency of the distributed parallel computing algorithm realized by the on-chip network, the application provides a novel network structure of a unicast network and a directional multicast network, and designs a multicast network oriented to the distributed parallel computing algorithm on the basis of a common mesh network. The multicast network is a directional multicast network, and multicast data is sent to each operation node by taking a data input node as a source. The application realizes the rapid transmission of the multicast data by designing the tree copy circuit unit aiming at the multicast scene without consuming more on-chip resources, thereby effectively improving the overall communication efficiency of the network.
Disclosure of Invention
First, the technical problem to be solved
The method solves the problems of low network communication efficiency and high network consumption hardware resources caused by the fact that the traditional hardware accelerator comprising a plurality of operation units is low in clock frequency due to overlarge bus interconnection combination logic delay, and meanwhile, the method for designing the network on chip for the distributed parallel operation algorithm is provided.
(II) technical scheme
According to the network-on-chip distributed parallel computing algorithm, the network-on-chip is divided into two layers, including a unicast network and a multicast network, the unicast network realizes point-to-point propagation among nodes, and independent operation data required by each operation node is transmitted to each operation node in a unicast mode; the multicast network is a customized multicast network oriented to a distributed parallel computing algorithm and is used for transmitting common operation data to all operation nodes, and the efficient transmission of data packets in the network is realized through the combination of the unicast network and the multicast network.
As a preferred technical scheme, the multicast network comprises two nodes, namely a two-way copy node and a receiving node, wherein the next stage of each two-way copy node is connected with the two-way copy node or the receiving node, all nodes in the multicast network form a tree node diagram together, each multicast operation is transmitted to the lowest node of all the trees from the topmost node of the tree, and the two-way copy node and the receiving node are reasonably designed, so that better performance can be ensured when the resource usage is smaller.
As a preferable technical scheme, the bidirectional copying node copies and transmits the data packet to two nodes at the lower stage while decoding and storing the data in the multicast packet sent by the upper stage, and the node at the last stage is a receiving node for receiving and decoding the multicast packet and storing the data.
As a preferred technical solution, the whole network-on-chip operation flow is as follows:
s1, when one algorithm operation starts, a data input node receives multicast data and unicast data sent by a sensor, the node packages the multicast data and performs multicast operation through a multicast network, the multicast data is sent to each operation node, and the unicast data is packaged in sequence and is sent to the corresponding operation node through unicast operation in the unicast network;
s2, each operation node starts operation after receiving corresponding multicast data and unicast data, and continuously packages and transmits operation results to the storage node in the operation process until all distributed parallel operation is completed, and the RISC-V processor node accesses the stored data in a unicast network mode.
(III) beneficial effects
The application has the beneficial effects that:
1. the network-on-chip is oriented to a distributed parallel computing type algorithm, and a network-on-chip hardware acceleration scheme of the type of algorithm is provided.
2. The network on chip separates multicast and unicast behaviors by designing independent multicast networks, and solves the problems of large flow and easy network blockage in a single network.
3. By designing a multicast tree-shaped transmission architecture oriented to a distributed parallel computing algorithm, only a two-way replication node or a receiving node is arranged at each operation node, the architecture is different from the traditional multicast network-on-chip, each node is provided with a multicast sending and receiving module, the use of on-chip resources is reduced to the greatest extent, the exponential growth characteristic of the number of the nodes mounted at each stage of the tree-shaped structure also effectively reduces the total time delay of a multicast data packet transmitted from the uppermost stage to the lowermost stage, and the real-time performance of the network-on-chip operation algorithm is effectively improved.
Drawings
In order to more clearly illustrate the embodiments of the application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a typical block diagram of a network on a MESH chip;
FIG. 2 is a diagram of a network architecture on a two-layer sheet;
FIG. 3 is a two-way replication node microarchitecture;
Detailed Description
The application relates to a network-on-chip design method for a distributed parallel operation algorithm, which is further described by referring to the accompanying drawings, and the application is further described by referring to the embodiment:
according to the network-on-chip distributed parallel computing algorithm, the network-on-chip is divided into two layers, including a unicast network and a multicast network, the unicast network realizes point-to-point propagation among nodes, and independent operation data required by each operation node is transmitted to each operation node in a unicast mode; the multicast network is a customized multicast network oriented to a distributed parallel computing algorithm and is used for transmitting common operation data to all operation nodes, and the efficient transmission of data packets in the network is realized through the combination of the unicast network and the multicast network.
Further, the multicast network includes two kinds of nodes, namely a two-way copy node and a receiving node, the next stage of each two-way copy node is connected with two-way copy nodes or receiving nodes, all nodes in the multicast network form a tree node diagram together, each multicast operation is transmitted to the lowest node of all the trees from the topmost node of the tree, and the two-way copy node and the receiving node are reasonably designed, so that better performance can be ensured when the resource usage is smaller.
Further, the two-way copying node copies and transmits the data packet to two nodes at the lower level while decoding and storing the data in the multicast packet sent by the upper level, and the node at the last level is a receiving node for receiving and decoding the multicast packet and storing the data.
Further, the whole network-on-chip operation flow is as follows:
s1, when one algorithm operation starts, a data input node receives multicast data and unicast data sent by a sensor, the node packages the multicast data and performs multicast operation through a multicast network, the multicast data is sent to each operation node, and the unicast data is packaged in sequence and is sent to the corresponding operation node through unicast operation in the unicast network;
s2, each operation node starts operation after receiving corresponding multicast data and unicast data, and continuously packages and transmits operation results to the storage node in the operation process until all distributed parallel operation is completed, and the RISC-V processor node accesses the stored data in a unicast network mode.
Working principle: as shown in fig. 2, the unicast network adopts a Mesh network topology of n×n. The nodes in the unicast network in the network have the following steps: 1. the data input node is responsible for receiving newly detected data transmitted by a sensor or a network upper stage, packaging the data into a unicast data packet and a multicast data packet correspondingly, and transmitting the data packets to the corresponding operation nodes through the unicast network and the multicast network respectively. 2. The node comprises an operation unit, which is responsible for unpacking and storing the data packet after receiving the unicast and multicast data packet sent to the node, then the operation unit calls the data corresponding to the multicast packet and the unicast packet to operate, and packages the calculation result and sends the calculation result to the corresponding storage unit. 3. And the nodes are only responsible for receiving and transmitting packets, and the nodes are only responsible for transmitting the packets in the unicast network according to the destination nodes in the X direction or the Y direction, and do not comprise unpacking and data storage units. 4. The node comprises a storage unit, the node stores all effective results and supports other nodes to send requests to the node, and the node returns a packet containing data to the node after receiving the requests. 5. A node comprising a RISC-V processor, on which a RISC-V processor is mounted, the processor being configured to perform an algorithm other than the operation content of the network-on-chip computing unit, for example, after the network-on-chip performs a convolution operation in a deep learning algorithm, the RISC-V processor may invoke data in a storage node to perform subsequent operations such as pooling, full concatenation, and the like.
The multicast network comprises two-way copy nodes and receiving nodes, the next stage of each two-way copy node is connected with two-way copy nodes or receiving nodes, and all nodes in the multicast network form a tree node diagram together. Each multicast operation is transmitted from the topmost node of the tree to the lowest nodes of all the trees. The micro architecture of the bidirectional replication node is shown In fig. 3, and comprises two parts, namely control logic and a double-port cache, wherein after the control logic receives a start_in signal, the control logic represents that the B end of the double-port cache at the upper level starts to transmit data, then the control logic at the present level sends written addresses and enabling signals to the A port of the double-port cache, and stores the data sent from the upper level until the upper level sends finish_in signal, so that the storage of all data is completed. And then the control logic of the stage can send a start_Out signal and Start sending a read address and a read enabling signal to the B port stored in the dual port until all data sent by the previous stage are sent and then finish_Out signals are sent, and after the stage finishes the multicast operation, the control logic can call the read operation of the A port again, read Out the effective data in the multicast packet, and call the operation unit to Finish the operation by combining the data in the unicast packet.
The above examples are merely illustrative of the preferred embodiments of the present application and are not intended to limit the spirit and scope of the present application, and various modifications and improvements made by those skilled in the art to the technical solutions of the present application should fall within the scope of protection of the present application, and the technical content claimed by the present application is fully described in the claims.

Claims (2)

1. A network-on-chip design method for a distributed parallel operation algorithm is characterized in that: dividing the network on chip into two layers according to a network on chip distributed parallel computing algorithm, wherein the network on chip comprises a unicast network and a multicast network, the unicast network realizes point-to-point propagation among nodes, and independent operation data required by each operation node is transmitted to each operation node in a unicast mode; the multicast network is a customized multicast network facing to a distributed parallel computing algorithm and is used for transmitting common operation data to all operation nodes, the multicast network comprises two nodes, namely a two-way copy node and a receiving node, the next stage of each two-way copy node is connected with two-way copy nodes or receiving nodes, all nodes in the multicast network jointly form a tree node diagram, each multicast operation is transmitted to the lowest node of all trees from the topmost node of the tree, the two-way copy node decodes and stores data in a multicast packet transmitted by the previous stage and simultaneously copies and transmits the data packet to two nodes of the next stage, and the node of the last stage is a receiving node for receiving and decoding the multicast packet and storing the data.
2. The network-on-chip design method for the distributed parallel operation algorithm as claimed in claim 1, wherein: the whole network-on-chip operation flow is as follows:
s1, when one algorithm operation starts, a data input node receives multicast data and unicast data sent by a sensor, the node packages the multicast data and performs multicast operation through a multicast network, the multicast data is sent to each operation node, and the unicast data is packaged in sequence and is sent to the corresponding operation node through unicast operation in the unicast network;
s2, each operation node starts operation after receiving corresponding multicast data and unicast data, and continuously packages and transmits operation results to the storage node in the operation process until all distributed parallel operation is completed, and the RISC-V processor node accesses the stored data in a unicast network mode.
CN202210174904.0A 2022-02-24 2022-02-24 Network-on-chip design method for distributed parallel operation algorithm Active CN114844757B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202210174904.0A CN114844757B (en) 2022-02-24 2022-02-24 Network-on-chip design method for distributed parallel operation algorithm
US18/068,710 US20230269200A1 (en) 2022-02-24 2022-12-20 On-chip network design method for distributed parallel operation algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210174904.0A CN114844757B (en) 2022-02-24 2022-02-24 Network-on-chip design method for distributed parallel operation algorithm

Publications (2)

Publication Number Publication Date
CN114844757A CN114844757A (en) 2022-08-02
CN114844757B true CN114844757B (en) 2023-11-24

Family

ID=82561436

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210174904.0A Active CN114844757B (en) 2022-02-24 2022-02-24 Network-on-chip design method for distributed parallel operation algorithm

Country Status (2)

Country Link
US (1) US20230269200A1 (en)
CN (1) CN114844757B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102883277A (en) * 2012-10-25 2013-01-16 赵久旸 Cooperative communication method based on reliable multicast MAC (Media Access Control) layer protocol
CN103124420A (en) * 2013-01-21 2013-05-29 电子科技大学 Wireless on-chip network structuring method
CN107046500A (en) * 2017-05-19 2017-08-15 合肥工业大学 A kind of two-stage applied to stratification network-on-chip splits router and its routing algorithm
CN108256628A (en) * 2018-01-15 2018-07-06 合肥工业大学 Convolutional neural networks hardware accelerator and its working method based on multicast network-on-chip
CN108924055A (en) * 2018-08-23 2018-11-30 北京理工大学 A kind of name data network multi-broadcast routing method based on steiner tree
CN111786911A (en) * 2020-05-26 2020-10-16 重庆邮电大学 Hybrid wireless optical network-on-chip architecture and multicast routing algorithm thereof
CN112468401A (en) * 2020-11-26 2021-03-09 中国人民解放军国防科技大学 Network-on-chip routing communication method for brain-like processor and network-on-chip
CN112729395A (en) * 2020-12-23 2021-04-30 电子科技大学 On-chip sensor reading system for complex SoC reliability monitoring

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
IL150281A0 (en) * 2002-06-18 2002-12-01 Teracross Ltd Method and system for multicast and unicast scheduling
US9813327B2 (en) * 2014-09-23 2017-11-07 Cavium, Inc. Hierarchical hardware linked list approach for multicast replication engine in a network ASIC

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102883277A (en) * 2012-10-25 2013-01-16 赵久旸 Cooperative communication method based on reliable multicast MAC (Media Access Control) layer protocol
CN103124420A (en) * 2013-01-21 2013-05-29 电子科技大学 Wireless on-chip network structuring method
CN107046500A (en) * 2017-05-19 2017-08-15 合肥工业大学 A kind of two-stage applied to stratification network-on-chip splits router and its routing algorithm
CN108256628A (en) * 2018-01-15 2018-07-06 合肥工业大学 Convolutional neural networks hardware accelerator and its working method based on multicast network-on-chip
CN108924055A (en) * 2018-08-23 2018-11-30 北京理工大学 A kind of name data network multi-broadcast routing method based on steiner tree
CN111786911A (en) * 2020-05-26 2020-10-16 重庆邮电大学 Hybrid wireless optical network-on-chip architecture and multicast routing algorithm thereof
CN112468401A (en) * 2020-11-26 2021-03-09 中国人民解放军国防科技大学 Network-on-chip routing communication method for brain-like processor and network-on-chip
CN112729395A (en) * 2020-12-23 2021-04-30 电子科技大学 On-chip sensor reading system for complex SoC reliability monitoring

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
mTREE: A Customized Multicast-Enabled Tree-Based Network on Chip for AI Chips;Yong Zheng; Haigang Yang; Yi Shu; Yiping Jia; Zhihong Huang;《IEEE Embedded Systems Letters》;第第14卷卷(第第3期期);全文 *
同构与异构片上多核系统的演进过程;黄乐天,别丽华;《电子技术应用》;第第43卷卷(第第3期期);全文 *

Also Published As

Publication number Publication date
CN114844757A (en) 2022-08-02
US20230269200A1 (en) 2023-08-24

Similar Documents

Publication Publication Date Title
CN110610236B (en) Device and method for executing neural network operation
US6795886B1 (en) Interconnect switch method and apparatus
US5175733A (en) Adaptive message routing for multi-dimensional networks
US8769459B2 (en) High-end fault-tolerant computer system and method for same
EP0439693B1 (en) Multiprocessing packet switching connection system having provision for error correction and recovery
Kumar et al. Optimization of all-to-all communication on the blue gene/l supercomputer
CN101488922B (en) Network-on-chip router having adaptive routing capability and implementing method thereof
JP2642049B2 (en) Method and apparatus for transferring information with arbitrarily changing data stride
US7889725B2 (en) Computer cluster
CN109873771B (en) On-chip network system and communication method thereof
JP2004525449A (en) Interconnect system
CN114647602B (en) Cross-chip access control method, device, equipment and medium
US20240045869A1 (en) A method and device of data transmission
CN112189324B (en) Bandwidth matched scheduler
CN106844263B (en) Configurable multiprocessor-based computer system and implementation method
CN112367279A (en) Routing method and system based on two-dimensional mesh structure multi-core chipset
CN103902505A (en) Configurable FFT processor circuit structure based on switching network
CN116383114B (en) Chip, chip interconnection system, data transmission method, electronic device and medium
CN114844757B (en) Network-on-chip design method for distributed parallel operation algorithm
CN111901257B (en) Switch, message forwarding method and electronic equipment
US8291033B2 (en) Combining multiple hardware networks to achieve low-latency high-bandwidth point-to-point communication
CN104035896A (en) Off-chip accelerator applicable to fusion memory of 2.5D (2.5 dimensional) multi-core system
CN114138707B (en) Data transmission system based on FPGA
Bay et al. Deterministic on-line routing on area-universal networks
EP3229145A1 (en) Parallel processing apparatus and communication control method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant