WO2021147721A1 - Network-on-chip interconnection structure of many-core system, data transmission method, board card, electronic device, and computer-readable storage medium - Google Patents

Network-on-chip interconnection structure of many-core system, data transmission method, board card, electronic device, and computer-readable storage medium Download PDF

Info

Publication number
WO2021147721A1
WO2021147721A1 PCT/CN2021/071449 CN2021071449W WO2021147721A1 WO 2021147721 A1 WO2021147721 A1 WO 2021147721A1 CN 2021071449 W CN2021071449 W CN 2021071449W WO 2021147721 A1 WO2021147721 A1 WO 2021147721A1
Authority
WO
WIPO (PCT)
Prior art keywords
chip
data
block
inter
core
Prior art date
Application number
PCT/CN2021/071449
Other languages
French (fr)
Chinese (zh)
Inventor
陈贺
王封
Original Assignee
北京灵汐科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京灵汐科技有限公司 filed Critical 北京灵汐科技有限公司
Publication of WO2021147721A1 publication Critical patent/WO2021147721A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication
    • G06F15/173Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
    • G06F15/17306Intercommunication techniques
    • G06F15/17312Routing techniques specific to parallel machines, e.g. wormhole, store and forward, shortest path problem congestion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication
    • G06F15/173Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7807System on chip, i.e. computer system on a single chip; System in package, i.e. computer system on one or more chips in a single package
    • G06F15/7825Globally asynchronous, locally synchronous, e.g. network on chip
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the embodiments of the present disclosure relate to the field of artificial intelligence technology, and in particular to a many-core system on-chip network interconnection structure, data transmission method, board card, electronic device, and computer-readable storage medium.
  • a many-core system includes one or more chips (processors), and a chip usually integrates multiple complete cores (computing engines, cores), and the cores within a chip or among multiple chips can work together.
  • the interaction between the board-level (on a board) chip and the chip, as well as the signals and data between the cores inside the chip, is very important to the many-core system, so the realization of the above interaction structure is important for the entire many-core system.
  • the performance also plays a vital role.
  • the cores in the many-core system basically use a fixed data route to transmit data, and the cores receive and process the data, and then the transceiver module transmits the data.
  • This transmission mode is fixed, and the data routing for communication between the cores is single.
  • the data path will be congested at a certain node, and the data will be waiting and cannot be received and sent in time.
  • this method cannot maximize the use of the core's computing power, and it takes a long time to transmit data, which will result in reduced data throughput and performance degradation.
  • the purpose of the embodiments of the present disclosure is to provide an on-chip network interconnection structure, data transmission method, board card, electronic device, and computer-readable storage medium of a many-core system.
  • the embodiments of the present disclosure provide a many-core system on-chip network interconnection structure.
  • the many-core system includes at least one chip, and each chip integrates multiple cores.
  • the on-chip network interconnection structure includes:
  • each block includes at least one core
  • It is configured as an on-chip network that interacts with each inter-chip routing module and exchanges data between each core.
  • At least part of the inter-chip routing modules corresponding to adjacent blocks are configured to interact.
  • At least part of the block is connected to a data interface for transferring external data
  • the on-chip network interconnection structure is configured to realize the receiving and processing of external data and the transmission of the processed data between the cores of a single chip.
  • the inter-chip routing module corresponding to the block connected to the data interface is used to receive and process the external data transmitted by the data interface, and transmit the processed data to the on-chip in the block where the target core is located.
  • the on-chip network node in the block where the target core is located is used to receive the processed data and transfer the processed data to the target core;
  • the block connected with the data interface is the block where the target core is located.
  • the inter-chip routing module corresponding to the block connected to the data interface is used to receive and process the external data transmitted by the data interface, and transmit the processed data to other inter-chip routing modules until the destination Inter-chip routing modules corresponding to adjacent blocks of the block where the core is located;
  • the inter-chip routing module corresponding to the adjacent block of the block where the target core is located is used to receive the processed data and transfer the processed data to the adjacent block where the target core is located.
  • the on-chip network node in a block adjacent to the target core is used to receive the processed data and transfer the processed data to the on-chip network node in the block where the target core is located;
  • the on-chip network node in the block where the target core is located is used to receive the processed data and transfer the processed data to the target core;
  • the block connected with the data interface is the block where the target core is located.
  • the inter-chip routing module corresponding to the block connected to the data interface is used to receive and process the external data transmitted by the data interface, and transmit the processed data to the block connected to the data interface.
  • the on-chip network node in the block connected to the data interface is used to receive the processed data and transfer the processed data to the on-chip network node in the block where the target core is located;
  • the on-chip network node in the block where the target core is located is used to receive the processed data and transfer the processed data to the target core;
  • the block connected with the data interface is not the block where the target core is located.
  • the inter-chip routing module corresponding to the block connected to the data interface is used to receive and process the external data transmitted by the data interface, and transmit the processed data to other inter-chip routing modules until the destination Inter-chip routing module corresponding to the block where the core is located;
  • the inter-chip routing module corresponding to the block where the target core is located is used to receive the processed data and transfer the processed data to the on-chip network node in the block where the target core is located;
  • the on-chip network node in the block where the target core is located is used to receive the processed data and transfer the processed data to the target core;
  • the block connected with the data interface is not the block where the target core is located.
  • the target core includes:
  • the routing receiving module is used to receive data
  • the calculation module is used to perform calculations based on the received data.
  • the on-chip network interconnection structure is configured to implement data transmission between cores of multiple chips.
  • the inter-chip routing module corresponding to a block of the first chip is connected to the inter-chip routing module corresponding to a block of the second chip, so that the block of the first chip is connected to the second chip.
  • the block connection Of the block connection;
  • the source core of the first chip is used to transfer data to the on-chip network node in the block where the source core of the first chip is located;
  • An on-chip network node in a block where the source core of the first chip is located configured to receive the data and transfer the data to an inter-chip routing module corresponding to the block where the source core of the first chip is located;
  • the inter-chip routing module corresponding to the block where the source core of the first chip is located is used to receive the data and transfer the data to other inter-chip routing modules of the first chip until the first chip connected to the second chip Inter-chip routing module corresponding to the block of a chip;
  • the inter-chip routing module corresponding to the block of the first chip connected to the second chip is used to receive the data and transfer the data to the inter-chip routing corresponding to the block where the target core of the second chip is located Module
  • An inter-chip routing module corresponding to the block where the target core of the second chip is located, configured to receive the data and transfer the data to the on-chip network node in the block where the target core of the second chip is located;
  • the on-chip network node in the block where the target core of the second chip is located is used to receive the data and transfer the data to the target core of the second chip.
  • the inter-chip routing module corresponding to a block of the first chip is connected to the inter-chip routing module corresponding to a block of the second chip, so that the block of the first chip is connected to the second chip.
  • the block connection Of the block connection;
  • the source core of the first chip is used to transfer data to the on-chip network node in the block where the source core of the first chip is located;
  • the on-chip network node in the block where the source core of the first chip is located configured to receive the data and transfer the data to the on-chip network node in other blocks of the first chip;
  • the on-chip network nodes in other blocks of the first chip are used to receive the data and transmit the data to the inter-chip routing module corresponding to the block of the first chip connected to the second chip;
  • the inter-chip routing module corresponding to the block of the first chip connected to the second chip is used to receive the data and transfer the data to the inter-chip routing corresponding to the block where the target core of the second chip is located Module
  • An inter-chip routing module corresponding to the block where the target core of the second chip is located, configured to receive the data and transfer the data to the on-chip network node in the block where the target core of the second chip is located;
  • the on-chip network node in the block where the target core of the second chip is located is used to receive the data and transfer the data to the target core of the second chip.
  • the embodiments of the present disclosure provide a data transmission method applied to a many-core system, wherein the many-core system includes at least one chip, each chip integrates multiple cores, and each chip is provided with at least two blocks. Each block includes at least one core, and each block corresponds to an inter-chip routing module.
  • the data transmission method includes:
  • Data transmission between blocks is realized through the inter-chip routing module, data transmission with the inter-chip routing module and data transmission between the cores are realized through the on-chip network.
  • the realization of data transmission between blocks through the inter-chip routing module includes:
  • the data transmission between adjacent blocks is realized by the inter-chip routing module.
  • At least part of the blocks are connected to the data interface used to transfer external data; the inter-chip routing module is used to realize data transmission between blocks, and the data transmission and data transmission with the inter-chip routing module is realized through the on-chip network. Transmission between cores, including:
  • the external data is received and processed through the inter-chip routing module corresponding to the block connected to the data interface, and the processed data is transmitted to the target core through the on-chip network and/or other inter-chip routing modules to realize the reception and processing of external data
  • the data is transmitted between the cores of a single chip.
  • the external data is received and processed through the inter-chip routing module corresponding to the block connected to the data interface, and the processed data is transmitted to the target core through the on-chip network and/or other inter-chip routing modules, including:
  • the block connected with the data interface is the block where the target core is located.
  • the external data is received and processed through the inter-chip routing module corresponding to the block connected to the data interface, and the processed data is transmitted to the target core through the on-chip network and/or other inter-chip routing modules, including:
  • the block connected with the data interface is the block where the target core is located.
  • the external data is received and processed through the inter-chip routing module corresponding to the block connected to the data interface, and the processed data is transmitted to the target core through the on-chip network and/or other inter-chip routing modules, including:
  • the block connected with the data interface is not the block where the target core is located.
  • the external data is received and processed through the inter-chip routing module corresponding to the block connected to the data interface, and the processed data is transmitted to the target core through the on-chip network and/or other inter-chip routing modules, including:
  • the block connected with the data interface is not the block where the target core is located.
  • the data transmission method further includes: a routing receiving module in the destination core receives data, and a calculation module in the destination core performs calculations based on the received data.
  • data transmission between blocks is realized through the inter-chip routing module
  • data transmission with the inter-chip routing module and data transmission between cores are realized through the on-chip network, including:
  • the data is transmitted from the source core of one chip to the destination core of another chip through the inter-chip routing modules and the on-chip network, so as to realize the transmission of data among the cores of multiple chips.
  • the inter-chip routing module corresponding to a block of the first chip is connected to the inter-chip routing module corresponding to a block of the second chip, so that the block of the first chip is connected to the second chip.
  • the block is connected; the data is transmitted from the source core of one chip to the destination core of another chip through the inter-chip routing modules and the on-chip network, including:
  • the data is transferred to the target core of the second chip through an on-chip network node in the block where the target core of the second chip is located.
  • the inter-chip routing module corresponding to a block of the first chip is connected to the inter-chip routing module corresponding to a block of the second chip, so that the block of the first chip is connected to the second chip.
  • the block is connected; the data is transmitted from the source core of one chip to the destination core of another chip through the inter-chip routing modules and the on-chip network, including:
  • the data is transferred to the target core of the second chip through an on-chip network node in the block where the target core of the second chip is located.
  • each chip includes a plurality of inter-chip routing modules arranged along its circumference; the realization of data transmission between blocks through the inter-chip routing module includes:
  • an embodiment of the present disclosure provides a board, and any one of the on-chip network interconnection structures of the embodiments of the present disclosure is integrated on the board.
  • an embodiment of the present disclosure provides an electronic device including a memory and a processor, wherein the memory is used to store one or more computer instructions, and the one or more computer instructions can be executed by the processor to achieve Any data transmission method in the embodiments of the present disclosure.
  • embodiments of the present disclosure provide a computer-readable storage medium on which computer program instructions are stored, and when executed by a processor, the computer program instructions implement any of the embodiments of the present disclosure Data transmission method.
  • the different cores of the many-core system can only exchange data "directly” through the network-on-chip, or exchange data "indirectly” through the inter-chip routing module and the network-on-chip, that is, there are at least two of them.
  • Different types of data routing thus, when data transmission, data routing can be selected according to needs. For example, when the data path of a data routing is congested at a certain node, another data routing can be used to avoid data waiting. Ensure that data can be received and sent in time, reduce the time consumed for data transmission, make full use of the computing power of the core, increase data throughput and processing speed, and improve the performance of the many-core system.
  • FIG. 1 is a schematic diagram of data routing for data transmission between cores of a single chip according to an embodiment of the present disclosure, wherein the block connected with the PCIE interface is the block where the target core is located;
  • FIG. 2 is a schematic diagram of data routing for data transmission between cores of a single chip according to an embodiment of the present disclosure, wherein the block connected to the PCIE interface is not the block where the target core is located;
  • FIG. 3 is a schematic diagram of data routing for data transmission between cores of multiple chips according to an embodiment of the disclosure.
  • the directional indication is only used to explain that it is in a specific posture (as shown in the drawings). If the specific posture changes, the relative positional relationship, movement, etc. of the components below will also change the directional indication accordingly.
  • the embodiments of the present disclosure provide an on-chip network interconnection structure of a many-core system.
  • the many-core system includes at least one chip (processor), and each chip integrates multiple cores (computing engine, core).
  • the on-chip network interconnection structure includes: At least two blocks located on the chip, each block includes at least one core; an inter-chip routing module corresponding to each block, and each inter-chip routing module is configured to interact with at least one other inter-chip routing module ( Including one-way data interaction, or two-way data interaction); and, a network on chip configured to interact with each inter-chip routing module and exchange data between each core.
  • the on-chip network interconnection structure of the embodiments of the present disclosure is used in a many-core system, or the on-chip network interconnection structure is a part of the many-core system. That is, the many-core system includes multiple chips, and each chip includes multiple cores, and the on-chip network interconnection structure can realize data interaction between different cores (including different cores in one chip and cores in different chips), so that more The individual cores and the on-chip network interconnect structure together constitute a many-core system that can work together.
  • the embodiments of the present disclosure actually provide a many-core system, which includes the above-mentioned on-chip network interconnection structure.
  • each chip is divided into multiple blocks in terms of area, and each block has at least one core, and each block corresponds to an inter-chip routing module; and the on-chip network interconnection structure also includes on-chip network, on-chip
  • the network can realize data interaction between different cores (including data interaction between different cores in a block, and data interaction between cores in different blocks), and can also realize data interaction with inter-chip routing modules, so the on-chip network can be realized
  • the data interaction between each core in a block and the corresponding inter-chip routing module of the block; at the same time, at least part of the inter-chip routing modules can also realize data interaction, so as to realize the corresponding block (that is, the core in the corresponding block). ) Data exchange between.
  • the network on chip may include multiple network on chip nodes (NoC, Network on chip).
  • NoC network on chip nodes
  • Each on-chip network node is distributed in each block of each chip, and is connected to each other to form a network topology (can also be divided into different levels, such as three levels), and at least part of the on-chip network nodes are also connected to core and/or inter-chip routing modules Connection, so that data interaction between different cores and between cores and inter-chip routing modules can be realized through the network topology of the on-chip network nodes.
  • the on-chip network node can use some related technologies to complete the design, which will not be described in detail here.
  • the different cores of the many-core system can only exchange data "directly” through the network-on-chip, or exchange data "indirectly” through the inter-chip routing module and the network-on-chip, that is, there are at least two of them.
  • Different types of data routing thus, when data transmission, data routing can be selected according to needs. For example, when the data path of a data routing is congested at a certain node, another data routing can be used to avoid data waiting. Ensure that data can be received and sent in time, reduce the time consumed for data transmission, make full use of the computing power of the core, increase data throughput and processing speed, and improve the performance of the many-core system.
  • At least part of the inter-chip routing modules corresponding to adjacent blocks are configured to interact.
  • block bank0 block bank1 and block bank3 are adjacent blocks, where the inter-chip routing module CR0 corresponding to block bank0 and the chip corresponding to bank1 Data exchange can be realized between the inter-chip routing modules CR1, and the inter-chip routing module CR0 corresponding to the block bank0 and the inter-chip routing module CR3 corresponding to the block bank3 may not be able to exchange data.
  • At least part of the block is connected to a data interface for transferring external data; the on-chip network interconnection structure is configured to realize the reception and processing of external data and the transmission of processed data between cores of a single chip.
  • the many-core system may also include a data interface connected to the block (for example, the inter-chip routing module corresponding to the block).
  • the data interface is used to receive data from the outside (external data), so that the on-chip network interconnection structure can also receive external Data and process it, and send the processed data to the corresponding core (destination core).
  • the target core corresponds to specific external data, that is, the target core may be different in different data transmission processes.
  • the optimal data routing in the many-core system can be obtained through AI model training to achieve optimal data transmission and minimize the time consumed for data transmission, and different AI models may be trained Different optimal data routing.
  • Chip for a Chip (chip), it can be divided into four banks (blocks), each bank corresponds to a CR (inter-chip routing module), and some banks are connected to the PCIE interface (data interface) , PCIE interface is connected to the server (Server), and the network on chip is specifically implemented by NoC (Network Node on Chip).
  • CR internal-chip routing module
  • PCIE interface data interface
  • server server
  • NoC Network Node on Chip
  • the data routing when transmitting data in the above chip may include:
  • the first data routing is a first data routing:
  • the data routing is to use CR to send data to the NoC in the bank, and pass it to the target core through different levels of NoC for calculation.
  • the inter-chip routing module corresponding to the block connected to the data interface is used to receive and process the external data transmitted by the data interface , And deliver the processed data to the on-chip network node in the block where the target core is located; the on-chip network node in the block where the target core is located is used to receive the processed data and transfer the processed data to the target core.
  • the specific data routing can be as follows: the data first passes through the inter-chip routing module set corresponding to the block where the target core is located, then is transferred to the on-chip network node in the block where the target core is located, and finally transferred to the target core for calculation.
  • the inter-chip routing module corresponding to the block connected to the data interface is used to receive and process the external data transmitted by the data interface , And transfer the processed data to the on-chip network node in the block connected to the data interface; the on-chip network node in the block connected to the data interface is used to receive the processed data and pass the processed data to the destination The on-chip network node in the block where the core is located; the on-chip network node in the block where the target core is located, used to receive processed data and transfer the processed data to the target core.
  • the specific data routing can be as follows: the data first passes through the inter-chip routing module corresponding to the block connected to the data interface, and then passes the on-chip network node in the block to the on-chip network node in the block where the target core is located, and finally passes to the destination In the core, perform calculations.
  • the second data routing is a first data routing:
  • CR->CRx->NoC in the bank
  • the data routing is transmitted in the form of loop CR (that is, the data sequentially passes through the inter-chip routing module), and the data is transmitted to the bank where the destination Core is located through loop CR. Then transfer by the NoC in the bank.
  • CRx represents one or more inter-chip routing modules.
  • the inter-chip routing module corresponding to the block connected to the data interface is used to receive and process the external data transmitted by the data interface , And transfer the processed data to other inter-chip routing modules until the inter-chip routing module corresponding to the adjacent block of the target core block;
  • the inter-chip routing module corresponding to the adjacent block of the target core block Used to receive processed data, and transfer the processed data to the on-chip network node in the adjacent block of the target core block;
  • the on-chip network node in the adjacent block of the target core block is used for receiving and processing And transfer the processed data to the on-chip network node in the block where the target core is located;
  • the on-chip network node in the block where the target core is located is used to receive the processed data and transfer the processed data to Purpose nuclear.
  • the specific data routing can be as follows: the data first passes through the inter-chip routing module corresponding to the block where the target core is located, and then passes through other inter-chip routing modules in turn, until the inter-chip routing module corresponding to the adjacent block of the target core block. Then it is passed to the on-chip network node in the block where the target core is located through the on-chip network node in the adjacent block of the block where the target core is located, and finally transferred to the target core for calculation.
  • the inter-chip routing module corresponding to the block connected to the data interface is used to receive and process the external data transmitted by the data interface , And pass the processed data to other inter-chip routing modules until the inter-chip routing module corresponding to the block where the target core is located; the inter-chip routing module corresponding to the block where the target core is located is used to receive the processed data, And transfer the processed data to the on-chip network node in the block where the target core is located; the on-chip network node in the block where the target core is located is used to receive the processed data and transfer the processed data to the target core.
  • the specific data routing can be as follows: the data first passes through the inter-chip routing module corresponding to the block connected to the data interface, then passes through other inter-chip routing modules in turn, until the inter-chip routing module corresponding to the block where the target core is located, and finally passes through the target core
  • the on-chip network nodes in the block are passed to the target core for calculation.
  • Both of the above two data routing schemes can send data to the destination core, which can solve the problem of performance degradation caused by data congestion and reduce the time of data transmission.
  • the target core includes: a routing receiving module, which is used to receive data; and a calculation module, which is used to perform calculations based on the received data.
  • the routing receiving module in the target core receives the data, and the calculation module in the target core performs calculations.
  • the on-chip network interconnection structure is configured to implement data transmission between cores of multiple chips.
  • communication between cores in different chips can also be realized, such as sending data from the source core of one chip to the destination core of another chip.
  • the source core and the target core correspond to the starting point and the end point of specific data, that is, the source core, the target core, and the chips on which they are located in different data transmission processes may be different.
  • a many-core system can include multiple Chips. For each Chip, it can be divided into four banks (blocks). Each bank corresponds to a CR (Inter-Chip Routing Module). The data routing is completed by NoC topologically, and the data routing between banks is completed by CR.
  • CR Inter-Chip Routing Module
  • the inter-chip routing module corresponding to a block of the first chip is connected to the inter-chip routing module corresponding to a block of the second chip, so that the block of the first chip is connected to the second chip. Of the block connection.
  • a block of the first chip and a block of the second chip are connected through respective inter-chip routing modules.
  • CR2 of Chip0 is connected to CR3 of Chip1, so that bank2 of Chip0 is connected to Chip1.
  • the bank3 is connected to achieve cross-chip data transmission.
  • the data routing during data transmission in the above many-core system may include:
  • the first data routing is a first data routing:
  • CRx represents one or more inter-chip routing modules
  • chip0 is the first chip
  • Chip1 is the second chip
  • Core (src) indicates the source core
  • Core (dst) indicates the destination core.
  • the source core of the first chip is used to transfer data to the on-chip network node in the block where the source core of the first chip is located; the on-chip network node in the block where the source core of the first chip is located is used To receive data and transfer the data to the inter-chip routing module corresponding to the block where the source core of the first chip is located; the inter-chip routing module corresponding to the block where the source core of the first chip is located is used to receive data and transfer the data The other inter-chip routing modules to the first chip, to the inter-chip routing module corresponding to the block of the first chip connected to the second chip; the inter-chip routing module corresponding to the block of the first chip connected to the second chip The module is used to receive data and transfer the data to the inter-chip routing module corresponding to the block where the target core of the second chip is located; the inter-chip routing module corresponding to the block where the target core of the second chip is located is used to receive data and The data is transferred to the on-chip network node in the block where the target core
  • the specific data routing can be as follows: the data is first transmitted through the source core of the first chip (Chip0) to the on-chip network node in the block where the source core is located, and then through the inter-chip routing module corresponding to the block where the source core is located, and then sequentially pass through The other inter-chip routing modules of the first chip (Chip0) until the inter-chip routing module corresponding to the block on the first chip (Chip0) connected to the block where the target core of the second chip (Chip1) is located, and then pass through the second chip (Chip1)
  • the inter-chip routing module corresponding to the block where the target core is located is finally passed to the target core through the on-chip network node in the block where the target core is located for calculation.
  • the second data routing is a first data routing:
  • CRx represents one or more inter-chip routing modules
  • chip0 is the first chip
  • Chip1 is the second chip
  • Core (src) is the source core
  • Core (dst) is the destination core.
  • the source core of the first chip is used to transfer data to the on-chip network node in the block where the source core of the first chip is located; the on-chip network node in the block where the source core of the first chip is located is used It is used to receive data and transfer the data to the on-chip network nodes in other blocks of the first chip; the on-chip network nodes in other blocks of the first chip are used to receive data and transfer the data to the second chip connected to the second chip.
  • the on-chip network node in the block where the target core of the second chip is located is used to receive data and transfer the data to the target core of the second chip.
  • the specific data routing can be as follows: data is first transferred to the on-chip network node in the block where the source core is located through the source core of the first chip (Chip0), and then transferred to the on-chip network nodes in the other blocks of the first chip (Chip0)
  • the routing module it is finally transferred to the target core through the on-chip network node in the block where the target core is located for calculation.
  • the data can be first transferred to the inter-chip routing module (CR2 set corresponding to bank2 of Chip0) in the block connected to the second chip in the first chip, and then transferred to the second chip In the inter-chip routing module corresponding to the block connected to the first chip (CR3 corresponding to bank3 of Chip1), and finally transferred to the target core of the second chip.
  • the inter-chip routing module CR2 set corresponding to bank2 of Chip0
  • the second chip In the inter-chip routing module corresponding to the block connected to the first chip (CR3 corresponding to bank3 of Chip1)
  • the above two data routing schemes can send data from the source core to the destination core, which can solve the problem of performance degradation caused by data congestion and reduce the time of data transmission.
  • the block in the second chip where the target core is located that is, the block in the second chip that is connected to the first chip (bank3 of Chip1), and the block in the second chip where the target core is located is correspondingly set
  • the inter-chip routing module and the inter-chip routing module corresponding to the block connected to the first chip in the second chip are also the same inter-chip routing module (CR3 of Chip1).
  • the target core is in another block of the second chip, after the data is transferred to the inter-chip routing module corresponding to the block connected to the first chip in the second chip, the above data can be further followed
  • the method of transmission within a single chip, transmission to the target core (including indirect transmission through the inter-chip routing module and the on-chip network, or direct transmission through the on-chip network), will not be described in detail here.
  • first chip where the source core is located and the second chip where the destination core is located are not directly connected, but indirectly connected through other chips, it is also feasible; that is, during the data transmission process, other chips need to pass through.
  • data can be transmitted through the inter-chip routing module or through the on-chip network.
  • the embodiments of the present disclosure provide a data transmission method, which is applied to a many-core system.
  • the many-core system includes at least one chip, each chip integrates multiple cores, and each chip is provided with at least two blocks, and each block includes At least one core, each block corresponds to an inter-chip routing module, and the data transmission method includes:
  • the data transmission between the blocks is realized through the inter-chip routing module, and the data transmission with the inter-chip routing module and the data transmission between the cores are realized through the on-chip network.
  • the inter-chip routing module when data transmission is performed, can be used to realize data transmission between blocks (or between cores of different blocks), and the data transmission with the inter-chip routing module can be realized through the on-chip network And data transmission between cores, so as to solve the problem of performance degradation caused by data congestion and reduce the time of data transmission.
  • realizing data transmission between blocks through the inter-chip routing module includes: realizing data transmission between adjacent blocks through the inter-chip routing module.
  • every two adjacent blocks must correspond to a "pair (two)" inter-chip routing module, and among multiple "pairs" inter-chip routing modules, there are at least partially “pairs” of two slices. Data interaction can be realized between the routing modules.
  • At least part of the blocks are connected to the data interface used to transfer external data; the data transmission between the blocks is realized through the inter-chip routing module, and the data transmission with the inter-chip routing module is realized through the on-chip network, and the data is transmitted in each core.
  • Inter-transmission including:
  • the external data is received and processed through the inter-chip routing module corresponding to the block connected to the data interface, and the processed data is transmitted to the target core through the on-chip network and/or other inter-chip routing modules to realize the reception and processing of external data
  • the data is transmitted between the cores of a single chip.
  • a default data route (such as the one obtained through AI model training) can be set in advance.
  • Optimal data route when the data path of the data route is congested, another data route is switched as a backup.
  • the data routing when transmitting data between the cores of the above single chip may include:
  • the first data routing is a first data routing:
  • the data routing is to use CR to send data to the NoC in the bank, and then pass it to the target core through different levels of NoC for calculation.
  • the block connected to the data interface is the block where the target core is located
  • the external data is received and processed through the inter-chip routing module corresponding to the block connected to the data interface.
  • inter-chip routing modules transmit the processed data to the target core, including:
  • External data is received and processed through the inter-chip routing module corresponding to the block connected to the data interface; the processed data is transmitted to the on-chip network in the block where the target core is located through the inter-chip routing module corresponding to the block connected to the data interface Node: The processed data is delivered to the target core through the on-chip network node in the block where the target core is located.
  • the specific data routing can be as follows: the data first passes through the inter-chip routing module set corresponding to the block where the target core is located, then is transferred to the on-chip network node in the block where the target core is located, and finally transferred to the target core for calculation.
  • the external data is received and processed through the inter-chip routing module corresponding to the block connected to the data interface.
  • And/or other inter-chip routing modules transmit the processed data to the target core, including: receiving and processing external data through the inter-chip routing module corresponding to the block connected to the data interface;
  • the inter-chip routing module transfers the processed data to the on-chip network node in the block connected to the data interface; passes the processed data to the on-chip in the block where the target core is located through the on-chip network node in the block connected to the data interface
  • Network node The processed data is delivered to the target core through the on-chip network node in the block where the target core is located.
  • the specific data routing can be as follows: the data first passes through the inter-chip routing module corresponding to the block connected to the data interface, and then passes the on-chip network node in the block to the on-chip network node in the block where the target core is located, and finally passes to the destination In the core, perform calculations.
  • the second data routing is a first data routing:
  • CR->CRx->NoC in the bank
  • the data routing is transmitted in the form of loop CR (that is, the data sequentially passes through the inter-chip routing module), and the data is transmitted to the bank where the destination Core is located through loop CR. Then transfer by the NoC in the bank.
  • CRx represents one or more inter-chip routing modules.
  • the block connected to the data interface is the block where the target core is located
  • the external data is received and processed through the inter-chip routing module corresponding to the block connected to the data interface.
  • And/or other inter-chip routing modules transmit the processed data to the target core to realize the reception and processing of external data and the transmission of processed data between the cores of a single chip, including:
  • the inter-chip routing module corresponding to the adjacent block of the block where the target core is located; the inter-chip routing module corresponding to the adjacent block where the target core is located transmits the processed data to the adjacent block of the target core block
  • On-chip network node through the on-chip network node in the adjacent block where the target core is located, the processed data is transferred to the on-chip network node in the block where the target core is located; processed by the on-chip network node in the block where the target core is located
  • the later data is passed to the destination core.
  • the specific data routing can be as follows: the data first passes through the inter-chip routing module corresponding to the block where the target core is located, and then passes through other inter-chip routing modules in turn, until the inter-chip routing module corresponding to the adjacent block of the target core block. Then it is passed to the on-chip network node in the block where the target core is located through the on-chip network node in the adjacent block of the block where the target core is located, and finally transferred to the target core for calculation.
  • the external data is received and processed through the inter-chip routing module corresponding to the block connected to the data interface, and the external data is processed through the network on chip. And/or other inter-chip routing modules transmit the processed data to the target core, including:
  • the inter-chip routing module corresponding to the block where the target core is located; the inter-chip routing module corresponding to the block where the target core is located transmits the processed data to the on-chip network node in the block where the target core is located;
  • the on-chip network node delivers the processed data to the target core.
  • the specific data routing can be as follows: the data first passes through the inter-chip routing module corresponding to the block connected to the data interface, then passes through other inter-chip routing modules in turn, until the inter-chip routing module corresponding to the block where the target core is located, and finally passes through the target core.
  • the on-chip network nodes in the block are passed to the target core for calculation.
  • the data transmission method further includes: a routing receiving module in the target core receives data, and a calculation module in the target core performs calculations based on the received data.
  • the data transmission between blocks is realized through the inter-chip routing module, and the data transmission with the inter-chip routing module and the data transmission between the cores are realized through the on-chip network, including:
  • the data is transmitted from the source core of one chip to the destination core of another chip through the inter-chip routing modules and the on-chip network, so as to realize the transmission of data among the cores of multiple chips.
  • the data transmission method of the embodiment of the present disclosure is aimed at a many-core system, so it can also realize the transmission of data between cores of different chips, that is, data is transmitted from the source core of one chip to another through the inter-chip routing modules and the on-chip network.
  • the target core of the chip is aimed at a many-core system, so it can also realize the transmission of data between cores of different chips, that is, data is transmitted from the source core of one chip to another through the inter-chip routing modules and the on-chip network.
  • the target core of the chip is aimed at a many-core system, so it can also realize the transmission of data between cores of different chips, that is, data is transmitted from the source core of one chip to another through the inter-chip routing modules and the on-chip network.
  • the target core of the chip is aimed at a many-core system, so it can also realize the transmission of data between cores of different chips, that is, data is transmitted from the source core of one chip to another through the inter-chip routing modules and the
  • a default data route can be set in advance (for example, through AI model training).
  • the optimal data route obtained) when the data path of the data route is congested, another data route is switched as a backup.
  • the inter-chip routing module corresponding to a block of the first chip is connected to the inter-chip routing module corresponding to a block of the second chip, so that the block of the first chip is connected to the second chip. Of the block connection.
  • the data routing during data transmission in the above many-core system may include:
  • the first data routing is a first data routing:
  • CRx represents one or more inter-chip routing modules
  • chip0 is the first chip
  • Chip1 is the second chip
  • Core (src) indicates the source core
  • Core (dst) indicates the destination core.
  • transmitting data from the source core of one chip to the destination core of another chip through the inter-chip routing modules and the on-chip network includes:
  • the data is transferred to the on-chip network node in the block where the source core of the first chip is located through the source core of the first chip; the data is transferred to the source of the first chip through the on-chip network node in the block where the source core of the first chip is located
  • the inter-chip routing module corresponding to the block where the core is located; the inter-chip routing module corresponding to the block where the source core of the first chip is located transmits data to other inter-chip routing modules of the first chip; through other chips of the first chip
  • the inter-chip routing module transmits data to the inter-chip routing module corresponding to the block of the first chip connected to the second chip; the data is transmitted through the inter-chip routing module corresponding to the block of the first chip connected to the second chip
  • the inter-chip routing module corresponding to the block where the target core of the second chip is located; the inter-chip routing module corresponding to the block where the target core of the second chip is located is used to transfer data to the block where the target core of the second chip is located
  • the specific data routing can be as follows: the data is first transmitted through the source core of the first chip (Chip0) to the on-chip network node in the block where the source core is located, and then through the inter-chip routing module corresponding to the block where the source core is located, and then sequentially pass through The other inter-chip routing modules of the first chip (Chip0) until the inter-chip routing module corresponding to the block on the first chip (Chip0) connected to the block where the target core of the second chip (Chip1) is located, and then pass through the second chip (Chip1)
  • the inter-chip routing module corresponding to the block where the target core is located is finally delivered to the target core through the on-chip network node in the block where the target core is located.
  • the second data routing is a first data routing:
  • CRx represents one or more inter-chip routing modules
  • chip0 is the first chip
  • Chip1 is the second chip
  • Core (src) is the source core
  • Core (dst) is the destination core.
  • transmitting data from the source core of one chip to the destination core of another chip through the inter-chip routing modules and the on-chip network includes:
  • the data is transferred to the on-chip network node in the block where the source core of the first chip is located through the source core of the first chip; the data is transferred to the other on the first chip through the on-chip network node in the block where the source core of the first chip is located On-chip network nodes in a block; through on-chip network nodes in other blocks of the first chip to transmit data to the inter-chip routing module corresponding to the block of the first chip connected to the second chip; The inter-chip routing module corresponding to the block of the connected first chip transmits data to the inter-chip routing module corresponding to the block where the target core of the second chip is located; the chip corresponding to the block where the target core of the second chip is located The inter-routing module transfers the data to the on-chip network node in the block where the target core of the second chip is located; and transfers the data to the target core of the second chip through the on-chip network node in the block where the target core of the second chip is located.
  • the specific data routing can be as follows: data is first transferred to the on-chip network node in the block where the source core is located through the source core of the first chip (Chip0), and then transferred to the on-chip network node in other blocks of the first chip (Chip0)
  • the routing module it is finally transferred to the target core through the on-chip network node in the block where the target core is located for calculation.
  • the block in the second chip where the target core is located that is, the block in the second chip that is connected to the first chip (bank3 of Chip1), and the block in the second chip where the target core is located is correspondingly set
  • the inter-chip routing module and the inter-chip routing module corresponding to the block connected to the first chip in the second chip are also the same inter-chip routing module (CR3 of Chip1).
  • the target core is in another block of the second chip, after the data is transferred to the inter-chip routing module corresponding to the block connected to the first chip in the second chip, the above data can be further followed
  • the method of transmission within a single chip, transmission to the target core (including indirect transmission through the inter-chip routing module and the on-chip network, or direct transmission through the on-chip network), will not be described in detail here.
  • first chip where the source core is located and the second chip where the destination core is located are not directly connected, but indirectly connected through other chips, it is also feasible; that is, during the data transmission process, it is also necessary to pass through other chips.
  • data can be transmitted through the inter-chip routing module or through the on-chip network.
  • each chip includes a plurality of inter-chip routing modules arranged along its circumference; the realization of data transmission between blocks through the inter-chip routing module includes:
  • the data is transmitted in a predetermined clockwise direction.
  • each chip includes a plurality of (for example, 3 or more) inter-chip routing modules, and these inter-chip routing modules are divided into "a circle” along the circumferential direction of the chip, wherein any two in the circumferential direction A data connection in a certain direction is formed between two adjacent inter-chip routing modules, so that in these inter-chip routing modules, data is transmitted in a predetermined clockwise direction (clockwise or counterclockwise), or in other words, forms a "ring ( loop)".
  • clockwise and counterclockwise directions are related to the direction of "seeing” the chip, but no matter which direction you “see” from, the actual data transmission direction between the inter-chip routing modules will not change.
  • each chip includes 4 inter-chip routing modules, and in these two figures, data can pass through multiple inter-chip routing modules in a clockwise direction, that is, transmitted from CR0 to CR1. Transfer from CR1 to CR2, from CR2 to CR3, and from CR3 to CR0 (not shown by arrows in the transfer diagram); it cannot be direct back propagation, for example, data cannot be directly transferred from CR0 to CR3.
  • the above limitation only indicates the transmission direction of data in the inter-chip routing modules of a chip, and referring to Figure 3, the inter-chip routing module on one chip can also be connected with the inter-chip routing modules of other chips to transmit data To other chips, the data transmission between the inter-chip routing modules of different chips does not necessarily conform to a specific clockwise direction.
  • data transmission between inter-chip routing modules usually has a large bandwidth and low delay, so even if it is only transmitted in a predetermined clockwise direction, it will not cause a significant increase in transmission time.
  • the data transmission direction is not limited (for example, the data between the adjacent inter-chip routing modules can be transmitted in "two-way"), or the "network-like" connection between the inter-chip routing modules is also feasible, which can further improve the data transmission Speed and performance of many-core systems.
  • the entire data transmission process can be divided into two parts: the AI model part and the board (many-core system) data processing part;
  • AI model part The work of the AI model is completed on the server side (such as a server or a computer), and the AI model will be trained according to certain rules to obtain data routing information (that is, the optimal data routing).
  • the server will combine the data to be transmitted to the board through the PCIE interface into the form of "data part (128bits) + data header (128bit)" according to the compilation rules of the tool chain, and send it to the board via the PCIE interface To the board.
  • the "data part” includes the content of the actual data to be transmitted, and the "data header” contains the data routing information obtained by training, so that the many-core system can select the default data routing based on the data routing information.
  • the PCIE interface of the board receives the data (external data) sent by the server and converts it into data that can be received and processed by the inter-chip routing module (CR).
  • the inter-chip routing module receives the data and unpacks and packs the data.
  • the next step of data transmission is carried out, such as sending the data to the network node on chip (NoC) or inter-chip routing module, etc., and finally transmitted to the target core for processing.
  • the actual data route used can refer to the foregoing, for example, the best data route in the data route information of the "data header" is used by default, and other data routes are selected when congested.
  • the embodiments of the present disclosure provide a board (such as a printed circuit board) on which any one of the on-chip network interconnection structures of the embodiments of the present disclosure is integrated.
  • an embodiment of the present disclosure provides an electronic device including a memory and a processor, where the memory is used to store one or more computer instructions, and one or more computer instructions can be executed by the processor to implement the embodiments of the present disclosure Any of the data transmission methods.
  • the electronic device may be a server, a terminal, or the like.
  • the electronic device includes: at least one processor; a memory communicatively connected with the at least one processor; and a communication component communicatively connected with other external storage media, the communication component receiving and sending data under the control of the processor
  • the memory stores instructions that can be executed by at least one processor, and the instructions are executed by at least one processor to implement the data transmission method of the many-core system in the foregoing embodiment.
  • the memory as a non-volatile computer-readable storage medium, can be used to store non-volatile software programs, non-volatile computer-executable programs, and modules.
  • the processor executes various functional applications and data processing of the electronic device by running non-volatile software programs, non-volatile computer executable programs and modules stored in the memory, that is, realizing the data transmission method of the above-mentioned many-core system .
  • the memory may include a program storage area and a data storage area, where the program storage area may store an operating system and an application program required by at least one function; the storage data area may store a list of options and the like.
  • the memory may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, or other non-volatile solid-state storage devices.
  • a non-volatile memory such as at least one magnetic disk storage device, a flash memory device, or other non-volatile solid-state storage devices.
  • the memory may optionally include a memory remotely arranged with respect to the processor, and these remote memories may be connected to an external device through a network.
  • networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
  • one or more modules are stored in the memory, and when the one or more modules are executed by the processor, the data transmission method of the many-core system in any of the foregoing method embodiments is executed.
  • the above-mentioned electronic device can execute the data transmission method of the many-core system provided in the embodiment of the present disclosure, and has the corresponding functional modules and beneficial effects of the execution method.
  • the data transmission method of the many-core system provided in the embodiment of the present disclosure refer to the data transmission method of the many-core system provided in the embodiment of the present disclosure.
  • embodiments of the present disclosure provide a computer-readable storage medium on which computer program instructions are stored, and the computer program instructions implement any data transmission method in the embodiments of the present disclosure when executed by a processor.
  • the computer-readable storage medium is used to store a computer-readable program
  • the computer-readable program is used for a computer to execute some or all of the above-mentioned embodiments of the many-core system data transmission method.
  • the computer-readable program to be executed can be written in any combination of one or more programming languages.
  • the programming languages include: object-oriented programming languages such as C++, etc.; and conventional process programming languages such as " C" programming language or similar assembly language.
  • the program is stored in a storage medium and includes several instructions to enable a device (It may be a single-chip microcomputer, a chip, etc.) or a processor (processor) executes all or part of the steps of the data transmission method of the embodiment of the present disclosure.
  • the aforementioned storage media include: on-chip memory or Flash and other media that can store program codes.
  • Embodiment 1 the on-chip network interconnection structure of the embodiment of the present disclosure is adopted to realize the communication between the cores of a single chip. As shown in FIG. 1, data can be transmitted and sent to the target core through two kinds of data routes.
  • the first type PCIE->CR0->NoC(bank0)->target Core (target core).
  • the data packet (PCIE protocol data packet) of the server Server passes through the inter-chip routing module CR0 corresponding to the block bank0, and then passes the on-chip network node in the block bank0 to the target Core for calculation.
  • the data packet (PCIE protocol data packet) of the server server first passes through the inter-chip routing module CR0 corresponding to block bank0, and then sequentially passes through the inter-chip routing module CR1 corresponding to block bank1, block bank2, and block bank3.
  • the inter-chip routing module CR2 and the inter-chip routing module CR3 are then passed to the on-chip network node in bank0 through the on-chip network node in block bank3, and finally to the target Core for calculation.
  • the server will select one of the two data routes according to the results of the AI model training, and send the data to the board through the PCIE interface.
  • the AI model training results choose the first data route by default.
  • this type of data routing for example, when a certain on-chip network node data needs to wait before being transmitted, the data will be transmitted to the destination core through the second type of data routing.
  • the on-chip network interconnection structure of the embodiment of the present disclosure is adopted to realize the communication between the cores of a single chip.
  • data can be transmitted and sent to the target core through two kinds of data routes.
  • the data packet (PCIE protocol data packet) of the server Server passes through the inter-chip routing module CR0 corresponding to the block bank0, and then passes the on-chip network node in the block bank0 to the on-chip network node in the block bank3, and finally to the Perform calculations in the target Core.
  • the data packet (PCIE protocol data packet) of the server server first passes through the inter-chip routing module CR0 corresponding to block bank0, and then sequentially passes through the inter-chip routing module CR1 corresponding to block bank1, block bank2, and block bank3. , Inter-chip routing module CR2 and inter-chip routing module CR3, and finally passed to the target Core through the on-chip network node in block bank3 for calculation.
  • Embodiment 3 the on-chip network interconnection structure of the embodiment of the present disclosure is adopted to realize the communication between cores of different chips. As shown in FIG. 3, data can be transmitted and sent to the target core through two kinds of data routes.
  • the data sent by the source core Core (Chip0src) of the first chip Chip0 is first transmitted to the inter-chip routing module corresponding to the block bank0 of the first chip Chip0 through the on-chip network node in the block bank0 of the first chip Chip0 CR0, and then pass through the inter-chip routing module CR1 and the inter-chip routing module CR2 corresponding to the bank1 and bank2 of the first chip Chip0, and then pass the inter-chip routing corresponding to the bank3 of the second chip Chip1
  • the module CR3 is finally transferred to the target core Core (Chip1 dst) of the second chip Chip1 through the on-chip network node in the block bank3 of the second chip Chip1 for calculation.
  • the data sent by the source core Core (Chip0src) of the first chip Chip0 is first transmitted to the on-chip network node in the block bank3 of the first chip Chip0 through the on-chip network node in the block bank0 of the first chip Chip0, and then transmitted To the on-chip network node in the bank2 of the first chip Chip0, and then to the inter-chip routing module CR2 corresponding to the bank2 of the first chip Chip0, and then to the corresponding setting of the bank3 of the second chip Chip1
  • the inter-chip routing module CR3 of the second chip is finally transferred to the target core Core (Chip1 dst) of the second chip Chip1 through the on-chip network node in the block bank3 of the second chip Chip1 for calculation.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Computing Systems (AREA)
  • Microelectronics & Electronic Packaging (AREA)
  • Multi Processors (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

A network-on-chip interconnection structure of a many-core system. The applicable many-core system comprises at least one chip. Multiple cores are integrated in each chip. The network-on-chip interconnection structure comprises: at least two blocks located on a chip, wherein each block comprises at least one core; an interchip routing module arranged corresponding to each block, wherein each interchip routing module is configured to interact with at least one of the other interchip routing modules; and a network-on-chip configured to interact with various interchip routing modules and exchange data between various cores.

Description

众核系统的片上网络互联结构、数据传输方法、板卡、电子设备、计算机可读存储介质Many-core system on-chip network interconnection structure, data transmission method, board, electronic equipment, computer readable storage medium 技术领域Technical field
本公开实施例涉及人工智能技术领域,具体涉及一种众核系统的片上网络互联结构、数据传输方法、板卡、电子设备、计算机可读存储介质。The embodiments of the present disclosure relate to the field of artificial intelligence technology, and in particular to a many-core system on-chip network interconnection structure, data transmission method, board card, electronic device, and computer-readable storage medium.
背景技术Background technique
众核系统包括一个或多个芯片(处理器),并且一个芯片中通常集成多个完整的核(计算引擎,内核),一个芯片内或多个芯片间的核可相互协同工作。A many-core system includes one or more chips (processors), and a chip usually integrates multiple complete cores (computing engines, cores), and the cores within a chip or among multiple chips can work together.
因此,板级(一个板卡上的)芯片和芯片之间,以及芯片内部的各核之间的信号、数据等的交互对众核系统十分重要,故实现以上交互的结构对整个众核系统的性能也起着至关重要的作用。Therefore, the interaction between the board-level (on a board) chip and the chip, as well as the signals and data between the cores inside the chip, is very important to the many-core system, so the realization of the above interaction structure is important for the entire many-core system. The performance also plays a vital role.
相关技术中,众核系统中各核间基本是用固定数据路由传输数据,而核接收数据并处理,再由收发模块将数据传输出去。这种传输方式模式固定,各核间通信的数据路由选择单一,在一些时刻,会造成数据通路在某一节点出现拥堵,数据会处于等待中,不能及时的被接收和发送。对于众核系统来说,采用这种方式不能最大限度的将核的计算能力利用起来,在数据传输上耗费时间较长,会导致数据吞吐量降低,性能下降。In related technologies, the cores in the many-core system basically use a fixed data route to transmit data, and the cores receive and process the data, and then the transceiver module transmits the data. This transmission mode is fixed, and the data routing for communication between the cores is single. At some moments, the data path will be congested at a certain node, and the data will be waiting and cannot be received and sent in time. For many-core systems, this method cannot maximize the use of the core's computing power, and it takes a long time to transmit data, which will result in reduced data throughput and performance degradation.
发明内容Summary of the invention
为解决上述问题,本公开实施例的目的在于提供一种众核系统的片上网络互联结构、数据传输方法、板卡、电子设备、计算机可读存储介质。In order to solve the above problems, the purpose of the embodiments of the present disclosure is to provide an on-chip network interconnection structure, data transmission method, board card, electronic device, and computer-readable storage medium of a many-core system.
第一方面,本公开实施例提供一种众核系统的片上网络互联结构, 众核系统包括至少一个芯片,每个芯片集成多核,所述片上网络互联结构包括:In the first aspect, the embodiments of the present disclosure provide a many-core system on-chip network interconnection structure. The many-core system includes at least one chip, and each chip integrates multiple cores. The on-chip network interconnection structure includes:
位于芯片上的至少两个区块,每个区块包括至少一个核;At least two blocks located on the chip, each block includes at least one core;
与每个区块对应设置的片间路由模块,每个片间路由模块配置为与至少一个其他片间路由模块交互;以及,An inter-chip routing module corresponding to each block, and each inter-chip routing module is configured to interact with at least one other inter-chip routing module; and,
配置为与各片间路由模块交互及交互各核间数据的片上网络。It is configured as an on-chip network that interacts with each inter-chip routing module and exchanges data between each core.
在一些实施例中,至少部分相邻区块对应的片间路由模块配置为进行交互。In some embodiments, at least part of the inter-chip routing modules corresponding to adjacent blocks are configured to interact.
在一些实施例中,至少部分区块连接用于传递外部数据的数据接口;In some embodiments, at least part of the block is connected to a data interface for transferring external data;
所述片上网络互联结构配置为实现外部数据的接收处理及处理后的数据在单个芯片各核间的传输。The on-chip network interconnection structure is configured to realize the receiving and processing of external data and the transmission of the processed data between the cores of a single chip.
在一些实施例中,与数据接口相连区块对应设置的片间路由模块,用于接收并处理所述数据接口传递的外部数据,并将处理后的数据传递至目的核所在区块内的片上网络节点;In some embodiments, the inter-chip routing module corresponding to the block connected to the data interface is used to receive and process the external data transmitted by the data interface, and transmit the processed data to the on-chip in the block where the target core is located. Network node
所述目的核所在区块内的片上网络节点,用于接收所述处理后的数据,并将所述处理后的数据传递至目的核;The on-chip network node in the block where the target core is located is used to receive the processed data and transfer the processed data to the target core;
其中,所述与数据接口相连区块为目的核所在区块。Wherein, the block connected with the data interface is the block where the target core is located.
在一些实施例中,与数据接口相连区块对应设置的片间路由模块,用于接收并处理所述数据接口传递的外部数据,并将处理后的数据传递至其他片间路由模块,直至目的核所在区块相邻区块对应设置的片间路由模块;In some embodiments, the inter-chip routing module corresponding to the block connected to the data interface is used to receive and process the external data transmitted by the data interface, and transmit the processed data to other inter-chip routing modules until the destination Inter-chip routing modules corresponding to adjacent blocks of the block where the core is located;
所述目的核所在区块相邻区块对应设置的片间路由模块,用于接收所述处理后的数据,并将所述处理后的数据传递至目的核所在区块相邻区块内的片上网络节点;The inter-chip routing module corresponding to the adjacent block of the block where the target core is located is used to receive the processed data and transfer the processed data to the adjacent block where the target core is located. On-chip network node;
所述目的核所在区块相邻区块内的片上网络节点,用于接收所述处理后的数据,并将所述处理后的数据传递至目的核所在区块内的片上网络节点;The on-chip network node in a block adjacent to the target core is used to receive the processed data and transfer the processed data to the on-chip network node in the block where the target core is located;
所述目的核所在区块内的片上网络节点,用于接收所述处理后的数据,并将所述处理后的数据传递至目的核;The on-chip network node in the block where the target core is located is used to receive the processed data and transfer the processed data to the target core;
其中,所述与数据接口相连区块为目的核所在区块。Wherein, the block connected with the data interface is the block where the target core is located.
在一些实施例中,与数据接口相连区块对应设置的片间路由模块,用于接收并处理所述数据接口传递的外部数据,并将处理后的数据传递至与数据接口相连区块内的片上网络节点;In some embodiments, the inter-chip routing module corresponding to the block connected to the data interface is used to receive and process the external data transmitted by the data interface, and transmit the processed data to the block connected to the data interface. On-chip network node;
所述与数据接口相连区块内的片上网络节点,用于接收所述处理后的数据,并将所述处理后的数据传递至目的核所在区块内的片上网络节点;The on-chip network node in the block connected to the data interface is used to receive the processed data and transfer the processed data to the on-chip network node in the block where the target core is located;
所述目的核所在区块内的片上网络节点,用于接收所述处理后的数据,并将所述处理后的数据传递至目的核;The on-chip network node in the block where the target core is located is used to receive the processed data and transfer the processed data to the target core;
其中,所述与数据接口相连区块不是目的核所在区块。Wherein, the block connected with the data interface is not the block where the target core is located.
在一些实施例中,与数据接口相连区块对应设置的片间路由模块,用于接收并处理所述数据接口传递的外部数据,并将处理后的数据传递至其他片间路由模块,直至目的核所在区块对应设置的片间路由模块;In some embodiments, the inter-chip routing module corresponding to the block connected to the data interface is used to receive and process the external data transmitted by the data interface, and transmit the processed data to other inter-chip routing modules until the destination Inter-chip routing module corresponding to the block where the core is located;
所述目的核所在区块对应设置的片间路由模块,用于接收所述处理后的数据,并将所述处理后的数据传递至目的核所在区块内的片上网络节点;The inter-chip routing module corresponding to the block where the target core is located is used to receive the processed data and transfer the processed data to the on-chip network node in the block where the target core is located;
所述目的核所在区块内的片上网络节点,用于接收所述处理后的数据,并将所述处理后的数据传递至目的核;The on-chip network node in the block where the target core is located is used to receive the processed data and transfer the processed data to the target core;
其中,所述与数据接口相连区块不是目的核所在区块。Wherein, the block connected with the data interface is not the block where the target core is located.
在一些实施例中,所述目的核包括:In some embodiments, the target core includes:
路由接收模块,用于接收数据;The routing receiving module is used to receive data;
计算模块,用于根据接收到的数据进行计算。The calculation module is used to perform calculations based on the received data.
在一些实施例中,所述片上网络互联结构配置为实现数据在多个芯片各核间的传输。In some embodiments, the on-chip network interconnection structure is configured to implement data transmission between cores of multiple chips.
在一些实施例中,第一芯片的一个区块对应设置的片间路由模块与第二芯片的一个区块对应设置的片间路由模块相连,以使第一芯片的该区块与第二芯片的该区块连接;In some embodiments, the inter-chip routing module corresponding to a block of the first chip is connected to the inter-chip routing module corresponding to a block of the second chip, so that the block of the first chip is connected to the second chip. Of the block connection;
第一芯片的源核,用于将数据传递至所述第一芯片的源核所在区块内的片上网络节点;The source core of the first chip is used to transfer data to the on-chip network node in the block where the source core of the first chip is located;
所述第一芯片的源核所在区块内的片上网络节点,用于接收所述数据并将所述数据传递至所述第一芯片的源核所在区块对应设置的片间路由模块;An on-chip network node in a block where the source core of the first chip is located, configured to receive the data and transfer the data to an inter-chip routing module corresponding to the block where the source core of the first chip is located;
所述第一芯片的源核所在区块对应设置的片间路由模块,用于接收所述数据并将所述数据传递至第一芯片的其他片间路由模块,直至与第二芯片连接的第一芯片的区块对应设置的片间路由模块;The inter-chip routing module corresponding to the block where the source core of the first chip is located is used to receive the data and transfer the data to other inter-chip routing modules of the first chip until the first chip connected to the second chip Inter-chip routing module corresponding to the block of a chip;
所述与第二芯片连接的第一芯片的区块对应设置的片间路由模块,用于接收所述数据并将所述数据传递至第二芯片的目的核所在区块对应设置的片间路由模块;The inter-chip routing module corresponding to the block of the first chip connected to the second chip is used to receive the data and transfer the data to the inter-chip routing corresponding to the block where the target core of the second chip is located Module
所述第二芯片的目的核所在区块对应设置的片间路由模块,用于接收所述数据并将所述数据传递至所述第二芯片的目的核所在区块内的片上网络节点;An inter-chip routing module corresponding to the block where the target core of the second chip is located, configured to receive the data and transfer the data to the on-chip network node in the block where the target core of the second chip is located;
所述第二芯片的目的核所在区块内的片上网络节点,用于接收所述数据并将所述数据传递至所述第二芯片的目的核。The on-chip network node in the block where the target core of the second chip is located is used to receive the data and transfer the data to the target core of the second chip.
在一些实施例中,第一芯片的一个区块对应设置的片间路由模块与第二芯片的一个区块对应设置的片间路由模块相连,以使第一芯片的该区块与第二芯片的该区块连接;In some embodiments, the inter-chip routing module corresponding to a block of the first chip is connected to the inter-chip routing module corresponding to a block of the second chip, so that the block of the first chip is connected to the second chip. Of the block connection;
第一芯片的源核,用于将数据传递至所述第一芯片的源核所在区块内的片上网络节点;The source core of the first chip is used to transfer data to the on-chip network node in the block where the source core of the first chip is located;
所述第一芯片的源核所在区块内的片上网络节点,用于接收所述数据并将所述数据传递至第一芯片的其他区块内的片上网络节点;The on-chip network node in the block where the source core of the first chip is located, configured to receive the data and transfer the data to the on-chip network node in other blocks of the first chip;
所述第一芯片的其他区块内的片上网络节点,用于接收所述数据并将所述数据传递至与第二芯片连接的第一芯片的区块对应设置的片 间路由模块;The on-chip network nodes in other blocks of the first chip are used to receive the data and transmit the data to the inter-chip routing module corresponding to the block of the first chip connected to the second chip;
所述与第二芯片连接的第一芯片的区块对应设置的片间路由模块,用于接收所述数据并将所述数据传递至第二芯片的目的核所在区块对应设置的片间路由模块;The inter-chip routing module corresponding to the block of the first chip connected to the second chip is used to receive the data and transfer the data to the inter-chip routing corresponding to the block where the target core of the second chip is located Module
所述第二芯片的目的核所在区块对应设置的片间路由模块,用于接收所述数据并将所述数据传递至所述第二芯片的目的核所在区块内的片上网络节点;An inter-chip routing module corresponding to the block where the target core of the second chip is located, configured to receive the data and transfer the data to the on-chip network node in the block where the target core of the second chip is located;
所述第二芯片的目的核所在区块内的片上网络节点,用于接收所述数据并将所述数据传递至所述第二芯片的目的核。The on-chip network node in the block where the target core of the second chip is located is used to receive the data and transfer the data to the target core of the second chip.
第二方面,本公开实施例提供一种数据传输方法,应用于众核系统,其中,所述众核系统包括至少一个芯片,每个芯片集成多核,每个芯片设置至少两个区块,每个区块包括至少一个核,每个区块对应设置片间路由模块,所述数据传输方法包括:In a second aspect, the embodiments of the present disclosure provide a data transmission method applied to a many-core system, wherein the many-core system includes at least one chip, each chip integrates multiple cores, and each chip is provided with at least two blocks. Each block includes at least one core, and each block corresponds to an inter-chip routing module. The data transmission method includes:
通过所述片间路由模块实现区块间的数据传输,通过片上网络实现与片间路由模块的数据传输及数据在各核间的传输。Data transmission between blocks is realized through the inter-chip routing module, data transmission with the inter-chip routing module and data transmission between the cores are realized through the on-chip network.
在一些实施例中,所述通过所述片间路由模块实现区块间的数据传输包括:In some embodiments, the realization of data transmission between blocks through the inter-chip routing module includes:
所述通过所述片间路由模块实现相邻区块间的数据传输。The data transmission between adjacent blocks is realized by the inter-chip routing module.
在一些实施例中,至少部分区块连接用于传递外部数据的数据接口;通过所述片间路由模块实现区块间的数据传输,通过片上网络实现与片间路由模块的数据传输及数据在各核间的传输,包括:In some embodiments, at least part of the blocks are connected to the data interface used to transfer external data; the inter-chip routing module is used to realize data transmission between blocks, and the data transmission and data transmission with the inter-chip routing module is realized through the on-chip network. Transmission between cores, including:
通过与数据接口相连区块对应设置的片间路由模块接收外部数据并处理,通过片上网络和/或其他片间路由模块将处理后的数据传输至目的核,实现外部数据的接收处理及处理后的数据在单个芯片各核间的传输。The external data is received and processed through the inter-chip routing module corresponding to the block connected to the data interface, and the processed data is transmitted to the target core through the on-chip network and/or other inter-chip routing modules to realize the reception and processing of external data The data is transmitted between the cores of a single chip.
在一些实施例中,通过与数据接口相连区块对应设置的片间路由模块接收外部数据并处理,通过片上网络和/或其他片间路由模块将处 理后的数据传输至目的核,包括:In some embodiments, the external data is received and processed through the inter-chip routing module corresponding to the block connected to the data interface, and the processed data is transmitted to the target core through the on-chip network and/or other inter-chip routing modules, including:
通过与数据接口相连区块对应设置的片间路由模块接收所述外部数据并处理;Receiving and processing the external data through an inter-chip routing module corresponding to the block connected to the data interface;
通过所述与数据接口相连区块对应设置的片间路由模块将处理后的数据传递至目的核所在区块内的片上网络节点;Transfer the processed data to the on-chip network node in the block where the target core is located through the inter-chip routing module corresponding to the block connected to the data interface;
通过所述目的核所在区块内的片上网络节点将所述处理后的数据传递至目的核;Transfer the processed data to the target core through the on-chip network node in the block where the target core is located;
其中,所述与数据接口相连区块为目的核所在区块。Wherein, the block connected with the data interface is the block where the target core is located.
在一些实施例中,通过与数据接口相连区块对应设置的片间路由模块接收外部数据并处理,通过片上网络和/或其他片间路由模块将处理后的数据传输至目的核,包括:In some embodiments, the external data is received and processed through the inter-chip routing module corresponding to the block connected to the data interface, and the processed data is transmitted to the target core through the on-chip network and/or other inter-chip routing modules, including:
通过与数据接口相连区块对应设置的片间路由模块接收所述外部数据并处理;Receiving and processing the external data through an inter-chip routing module corresponding to the block connected to the data interface;
通过所述与数据接口相连区块对应设置的片间路由模块将处理后的数据传递至其他片间路由模块,直至目的核所在区块相邻区块对应设置的片间路由模块;Transfer the processed data to other inter-chip routing modules through the inter-chip routing module corresponding to the block connected to the data interface until the inter-chip routing module corresponding to the adjacent block of the block where the target core is located;
通过所述目的核所在区块相邻区块对应设置的片间路由模块将所述处理后的数据传递至目的核所在区块相邻区块内的片上网络节点;Transmitting the processed data to the on-chip network node in the adjacent block of the target core through the inter-chip routing module corresponding to the adjacent block of the target core;
通过所述目的核所在区块相邻区块内的片上网络节点将所述处理后的数据传递至目的核所在区块内的片上网络节点;Transferring the processed data to the on-chip network node in the block where the target core is located through the on-chip network node in the adjacent block of the block where the target core is located;
通过所述目的核所在区块内的片上网络节点将所述处理后的数据传递至目的核;Transfer the processed data to the target core through the on-chip network node in the block where the target core is located;
其中,所述与数据接口相连区块为目的核所在区块。Wherein, the block connected with the data interface is the block where the target core is located.
在一些实施例中,通过与数据接口相连区块对应设置的片间路由模块接收外部数据并处理,通过片上网络和/或其他片间路由模块将处理后的数据传输至目的核,包括:In some embodiments, the external data is received and processed through the inter-chip routing module corresponding to the block connected to the data interface, and the processed data is transmitted to the target core through the on-chip network and/or other inter-chip routing modules, including:
通过与数据接口相连区块对应设置的片间路由模块接收所述外部数据并处理;Receiving and processing the external data through an inter-chip routing module corresponding to the block connected to the data interface;
通过所述与数据接口相连区块对应设置的片间路由模块将处理后的数据传递至与数据接口相连区块内的片上网络节点;Transfer the processed data to the on-chip network node in the block connected to the data interface through the inter-chip routing module corresponding to the block connected to the data interface;
通过所述与数据接口相连区块内的片上网络节点将所述处理后的数据传递至目的核所在区块内的片上网络节点;Transferring the processed data to the on-chip network node in the block where the target core is located through the on-chip network node in the block connected to the data interface;
通过所述目的核所在区块内的片上网络节点将所述处理后的数据传递至目的核;Transfer the processed data to the target core through the on-chip network node in the block where the target core is located;
其中,所述与数据接口相连区块不是目的核所在区块。Wherein, the block connected with the data interface is not the block where the target core is located.
在一些实施例中,通过与数据接口相连区块对应设置的片间路由模块接收外部数据并处理,通过片上网络和/或其他片间路由模块将处理后的数据传输至目的核,包括:In some embodiments, the external data is received and processed through the inter-chip routing module corresponding to the block connected to the data interface, and the processed data is transmitted to the target core through the on-chip network and/or other inter-chip routing modules, including:
通过与数据接口相连区块对应设置的片间路由模块接收所述外部数据并处理;Receiving and processing the external data through an inter-chip routing module corresponding to the block connected to the data interface;
通过所述与数据接口相连区块对应设置的片间路由模块将处理后的数据传递至其他片间路由模块,直至目的核所在区块对应设置的片间路由模块;Transfer the processed data to other inter-chip routing modules through the inter-chip routing module corresponding to the block connected to the data interface until the inter-chip routing module corresponding to the block where the target core is located;
通过所述目的核所在区块对应设置的片间路由模块将所述处理后的数据传递至目的核所在区块内的片上网络节点;Transmitting the processed data to the on-chip network node in the block where the target core is located through the inter-chip routing module corresponding to the block where the target core is located;
通过所述目的核所在区块内的片上网络节点将所述处理后的数据传递至目的核;Transfer the processed data to the target core through the on-chip network node in the block where the target core is located;
其中,所述与数据接口相连区块不是目的核所在区块。Wherein, the block connected with the data interface is not the block where the target core is located.
在一些实施例中,所述数据传输方法还包括:目的核内的路由接收模块接收数据,目的核内的计算模块根据接收到的数据进行计算。In some embodiments, the data transmission method further includes: a routing receiving module in the destination core receives data, and a calculation module in the destination core performs calculations based on the received data.
在一些实施例中,通过所述片间路由模块实现区块间的数据传输,通过片上网络实现与片间路由模块的数据传输及数据在各核间的传输,包括:In some embodiments, data transmission between blocks is realized through the inter-chip routing module, data transmission with the inter-chip routing module and data transmission between cores are realized through the on-chip network, including:
通过各片间路由模块和片上网络将数据从一芯片的源核传输至另一芯片的目的核,实现数据在多个芯片各核间的传输。The data is transmitted from the source core of one chip to the destination core of another chip through the inter-chip routing modules and the on-chip network, so as to realize the transmission of data among the cores of multiple chips.
在一些实施例中,第一芯片的一个区块对应设置的片间路由模块 与第二芯片的一个区块对应设置的片间路由模块相连,以使第一芯片的该区块与第二芯片的该区块连接;通过各片间路由模块和片上网络将数据从一芯片的源核传输至另一芯片的目的核,包括:In some embodiments, the inter-chip routing module corresponding to a block of the first chip is connected to the inter-chip routing module corresponding to a block of the second chip, so that the block of the first chip is connected to the second chip. The block is connected; the data is transmitted from the source core of one chip to the destination core of another chip through the inter-chip routing modules and the on-chip network, including:
通过第一芯片的源核将数据传递至所述第一芯片的源核所在区块内的片上网络节点;Transmitting data to the on-chip network node in the block where the source core of the first chip is located through the source core of the first chip;
通过所述第一芯片的源核所在区块内的片上网络节点将所述数据传递至所述第一芯片的源核所在区块对应设置的片间路由模块;Transmitting the data to an inter-chip routing module corresponding to the block where the source core of the first chip is located through an on-chip network node in the block where the source core of the first chip is located;
通过所述第一芯片的源核所在区块对应设置的片间路由模块将所述数据传递至第一芯片的其他片间路由模块;Transferring the data to other inter-chip routing modules of the first chip through an inter-chip routing module corresponding to the block where the source core of the first chip is located;
通过所述第一芯片的其他片间路由模块将所述数据传递至与第二芯片连接的第一芯片的区块对应设置的片间路由模块;Transmitting the data to the inter-chip routing module corresponding to the block of the first chip connected to the second chip through other inter-chip routing modules of the first chip;
通过所述与第二芯片连接的第一芯片的区块对应设置的片间路由模块将所述数据传递至第二芯片的目的核所在区块对应设置的片间路由模块;Transmitting the data to the inter-chip routing module corresponding to the block where the target core of the second chip is located through the inter-chip routing module provided corresponding to the block of the first chip connected to the second chip;
通过所述第二芯片的目的核所在区块对应设置的片间路由模块将所述数据传递至第二芯片的目的核所在区块内的片上网络节点;Transmitting the data to an on-chip network node in the block where the target core of the second chip is located through an inter-chip routing module corresponding to the block where the target core of the second chip is located;
通过所述第二芯片的目的核所在区块内的片上网络节点将所述数据传递至第二芯片的目的核。The data is transferred to the target core of the second chip through an on-chip network node in the block where the target core of the second chip is located.
在一些实施例中,第一芯片的一个区块对应设置的片间路由模块与第二芯片的一个区块对应设置的片间路由模块相连,以使第一芯片的该区块与第二芯片的该区块连接;通过各片间路由模块和片上网络将数据从一芯片的源核传输至另一芯片的目的核,包括:In some embodiments, the inter-chip routing module corresponding to a block of the first chip is connected to the inter-chip routing module corresponding to a block of the second chip, so that the block of the first chip is connected to the second chip. The block is connected; the data is transmitted from the source core of one chip to the destination core of another chip through the inter-chip routing modules and the on-chip network, including:
通过第一芯片的源核将数据传递至所述第一芯片的源核所在区块内的片上网络节点;Transmitting data to the on-chip network node in the block where the source core of the first chip is located through the source core of the first chip;
通过所述第一芯片的源核所在区块内的片上网络节点将所述数据传递至第一芯片的其他区块内的片上网络节点;Transmitting the data to network nodes on a chip in other blocks of the first chip through a network node on a chip in the block where the source core of the first chip is located;
通过所述第一芯片的其他区块内的片上网络节点将所述数据传递至与第二芯片连接的第一芯片的区块对应设置的片间路由模块;Transmitting the data to the inter-chip routing module corresponding to the block of the first chip connected to the second chip through on-chip network nodes in other blocks of the first chip;
通过所述与第二芯片连接的第一芯片的区块对应设置的片间路由模块将所述数据传递至第二芯片的目的核所在区块对应设置的片间路由模块;Transmitting the data to the inter-chip routing module corresponding to the block where the target core of the second chip is located through the inter-chip routing module provided corresponding to the block of the first chip connected to the second chip;
通过所述第二芯片的目的核所在区块对应设置的片间路由模块将所述数据传递至第二芯片的目的核所在区块内的片上网络节点;Transmitting the data to an on-chip network node in the block where the target core of the second chip is located through an inter-chip routing module corresponding to the block where the target core of the second chip is located;
通过所述第二芯片的目的核所在区块内的片上网络节点将所述数据传递至第二芯片的目的核。The data is transferred to the target core of the second chip through an on-chip network node in the block where the target core of the second chip is located.
在一些实施例中,每个芯片包括多个沿其周向设置的多个片间路由模块;所述通过片间路由模块实现区块间的数据传输包括:In some embodiments, each chip includes a plurality of inter-chip routing modules arranged along its circumference; the realization of data transmission between blocks through the inter-chip routing module includes:
数据在传输过程中,在同一个芯片的多个片间路由模块中,数据按照预定的时针方向传输。During data transmission, in multiple inter-chip routing modules on the same chip, data is transmitted in a predetermined clockwise direction.
第三方面,本公开实施例提供一种板卡,所述板卡上集成有本公开实施例的任意一种片上网络互联结构。In a third aspect, an embodiment of the present disclosure provides a board, and any one of the on-chip network interconnection structures of the embodiments of the present disclosure is integrated on the board.
第四方面,本公开实施例提供一种电子设备,包括存储器和处理器,其中,所述存储器用于存储一条或多条计算机指令,所述一条或多条计算机指令能被处理器执行以实现本公开实施例的任意一种数据传输方法。In a fourth aspect, an embodiment of the present disclosure provides an electronic device including a memory and a processor, wherein the memory is used to store one or more computer instructions, and the one or more computer instructions can be executed by the processor to achieve Any data transmission method in the embodiments of the present disclosure.
第五方面,本公开实施例提供一种计算机可读存储介质,所述计算机可读存储介质上存储计算机程序指令,所述计算机程序指令在被处理器执行时实现本公开实施例的任意一种数据传输方法。In a fifth aspect, embodiments of the present disclosure provide a computer-readable storage medium on which computer program instructions are stored, and when executed by a processor, the computer program instructions implement any of the embodiments of the present disclosure Data transmission method.
本公开实施例中,众核系统的不同的核间可仅通过片上网络进行“直接”的数据交互,也可通过片间路由模块和片上网络进行“间接”的数据交互,即其中存在至少两种不同的数据路由;从而进行数据传输时,可根据需要选择数据路由,如当一种数据路由的数据通路在某 一节点出现拥堵时,可采用另外的数据路由,从而避免数据处于等待中,保证数据能及时的被接收和发送,减少数据传输所消耗的时间,充分利用核的计算能力,提高数据吞吐量和处理速度,改善众核系统的性能。In the embodiments of the present disclosure, the different cores of the many-core system can only exchange data "directly" through the network-on-chip, or exchange data "indirectly" through the inter-chip routing module and the network-on-chip, that is, there are at least two of them. Different types of data routing; thus, when data transmission, data routing can be selected according to needs. For example, when the data path of a data routing is congested at a certain node, another data routing can be used to avoid data waiting. Ensure that data can be received and sent in time, reduce the time consumed for data transmission, make full use of the computing power of the core, increase data throughput and processing speed, and improve the performance of the many-core system.
附图说明Description of the drawings
为了更清楚地说明本公开实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍。显而易见地,下面描述中的附图仅仅是本公开实施例的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to more clearly describe the technical solutions in the embodiments of the present disclosure or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the drawings in the following description are only some of the embodiments of the present disclosure. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without creative labor. .
图1为本公开实施例所述的单个芯片各核间数据传输的数据路由示意图,其中,与PCIE接口相连区块为目的核所在区块;FIG. 1 is a schematic diagram of data routing for data transmission between cores of a single chip according to an embodiment of the present disclosure, wherein the block connected with the PCIE interface is the block where the target core is located;
图2为本公开实施例所述的单个芯片各核间数据传输的数据路由示意图,其中,与PCIE接口相连区块不是目的核所在区块;2 is a schematic diagram of data routing for data transmission between cores of a single chip according to an embodiment of the present disclosure, wherein the block connected to the PCIE interface is not the block where the target core is located;
图3为本公开实施例所述的多个芯片各核间数据传输的数据路由示意图。3 is a schematic diagram of data routing for data transmission between cores of multiple chips according to an embodiment of the disclosure.
具体实施方式Detailed ways
下面将结合本公开中的附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本公开的一部分实施例,而不是全部的实施例。基于本公开中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本公开保护的范围。The technical solutions in the embodiments of the present disclosure will be clearly and completely described below in conjunction with the drawings in the present disclosure. Obviously, the described embodiments are only a part of the embodiments of the present disclosure, rather than all the embodiments. Based on the embodiments in the present disclosure, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of the present disclosure.
需要说明,若本公开实施例中有涉及方向性指示(诸如上、下、左、右、前、后……),则该方向性指示仅用于解释在某一特定姿态(如附图所示)下各部件之间的相对位置关系、运动情况等,如果该特定姿态发生改变时,则该方向性指示也相应地随之改变。It should be noted that if there are directional indications (such as up, down, left, right, front, back...) involved in the embodiments of the present disclosure, the directional indication is only used to explain that it is in a specific posture (as shown in the drawings). If the specific posture changes, the relative positional relationship, movement, etc. of the components below will also change the directional indication accordingly.
另外,在本公开实施例的描述中,所用术语仅用于说明目的,并非旨在限制本公开的范围。术语“包括”和/或“包含”用于指定所述元件、步骤、操作和/或组件的存在,但并不排除存在或添加一个或多个其他元件、步骤、操作和/或组件的情况。术语“第一”、“第二”等可能用于描述各种元件,不代表顺序,且不对这些元件起限定作用。此外,在本公开的描述中,除非另有说明,“多个”的含义是两个及两个以上。这些术语仅用于区分一个元素和另一个元素。结合以下附图,这些和/或其他方面变得显而易见,并且,本领域普通技术人员更容易理解关于本公开所述实施例的说明。附图仅出于说明的目的用来描绘本公开所述实施例。本领域技术人员将很容易地从以下说明中认识到,在不背离本公开所述原理的情况下,可以采用本公开实施例所示结构和方法的替代实施例。In addition, in the description of the embodiments of the present disclosure, the terms used are only for illustrative purposes, and are not intended to limit the scope of the present disclosure. The terms "including" and/or "including" are used to specify the existence of the described elements, steps, operations and/or components, but do not exclude the presence or addition of one or more other elements, steps, operations and/or components . The terms "first", "second", etc. may be used to describe various elements, do not represent the order, and do not limit these elements. In addition, in the description of the present disclosure, unless otherwise specified, "plurality" means two or more. These terms are only used to distinguish one element from another. These and/or other aspects become obvious in conjunction with the following drawings, and it is easier for those of ordinary skill in the art to understand the description of the embodiments of the present disclosure. The drawings are used for illustration purposes only to depict the embodiments of the present disclosure. Those skilled in the art will easily recognize from the following description that, without departing from the principles described in the present disclosure, alternative embodiments of the structure and method shown in the embodiments of the present disclosure may be adopted.
第一方面,本公开实施例提供一种众核系统的片上网络互联结构,众核系统包括至少一个芯片(处理器),每个芯片集成多核(计算引擎,内核),片上网络互联结构包括:位于芯片上的至少两个区块,每个区块包括至少一个核;与每个区块对应设置的片间路由模块,每个片间路由模块配置为与至少一个其他片间路由模块交互(包括单向的数据交互,或双向的数据交互);以及,配置为与各片间路由模块交互及交互各核间数据的片上网络。In the first aspect, the embodiments of the present disclosure provide an on-chip network interconnection structure of a many-core system. The many-core system includes at least one chip (processor), and each chip integrates multiple cores (computing engine, core). The on-chip network interconnection structure includes: At least two blocks located on the chip, each block includes at least one core; an inter-chip routing module corresponding to each block, and each inter-chip routing module is configured to interact with at least one other inter-chip routing module ( Including one-way data interaction, or two-way data interaction); and, a network on chip configured to interact with each inter-chip routing module and exchange data between each core.
本公开实施例的片上网络互联结构用于众核系统,或者说片上网络互联结构是众核系统的一部分。即,众核系统包括多个芯片,每个芯片包括多个核,而片上网络互联结构可实现不同核(包括一个芯片内的不同核,以及不同芯片中的核)间的数据交互,从而多个核与片上网络互联结构共同构成能协同工作的众核系统。The on-chip network interconnection structure of the embodiments of the present disclosure is used in a many-core system, or the on-chip network interconnection structure is a part of the many-core system. That is, the many-core system includes multiple chips, and each chip includes multiple cores, and the on-chip network interconnection structure can realize data interaction between different cores (including different cores in one chip and cores in different chips), so that more The individual cores and the on-chip network interconnect structure together constitute a many-core system that can work together.
因此,本公开实施例实际还提供了一种众核系统,其包括以上的片上网络互联结构。Therefore, the embodiments of the present disclosure actually provide a many-core system, which includes the above-mentioned on-chip network interconnection structure.
其中,每个芯片从面积上被划分为多个区块,而每个区块内有至少一个核,且每个区块对应一个片间路由模块;而片上网络互联结构 还包括片上网络,片上网络能实现不同核间的数据交互(包括一个区块内不同核的数据交互,以及不同区块内的核的数据交互),也能实现与片间路由模块的数据交互,故片上网络可实现一个区块中的各核与该区块对应的片间路由模块的数据交互;同时,至少部分片间路由模块间也可实现数据交互,从而实现对应区块(也就是对应区块中的核)间的数据交互。Among them, each chip is divided into multiple blocks in terms of area, and each block has at least one core, and each block corresponds to an inter-chip routing module; and the on-chip network interconnection structure also includes on-chip network, on-chip The network can realize data interaction between different cores (including data interaction between different cores in a block, and data interaction between cores in different blocks), and can also realize data interaction with inter-chip routing modules, so the on-chip network can be realized The data interaction between each core in a block and the corresponding inter-chip routing module of the block; at the same time, at least part of the inter-chip routing modules can also realize data interaction, so as to realize the corresponding block (that is, the core in the corresponding block). ) Data exchange between.
具体的,片上网络可包括多个片上网络节点(NoC,Network on chip)。各片上网络节点分布在各芯片的各区块中,且相互连接构成网络拓扑(还可分为不同层级,如分为三级),且至少部分片上网络节点还与核和/或片间路由模块连接,从而通过片上网络节点的网络拓扑可实现不同核之间,以及核与片间路由模块间的数据交互。其中,片上网络节点可采用一些相关技术完成设计,在此不再详细描述。Specifically, the network on chip may include multiple network on chip nodes (NoC, Network on chip). Each on-chip network node is distributed in each block of each chip, and is connected to each other to form a network topology (can also be divided into different levels, such as three levels), and at least part of the on-chip network nodes are also connected to core and/or inter-chip routing modules Connection, so that data interaction between different cores and between cores and inter-chip routing modules can be realized through the network topology of the on-chip network nodes. Among them, the on-chip network node can use some related technologies to complete the design, which will not be described in detail here.
本公开实施例中,众核系统的不同的核间可仅通过片上网络进行“直接”的数据交互,也可通过片间路由模块和片上网络进行“间接”的数据交互,即其中存在至少两种不同的数据路由;从而进行数据传输时,可根据需要选择数据路由,如当一种数据路由的数据通路在某一节点出现拥堵时,可采用另外的数据路由,从而避免数据处于等待中,保证数据能及时的被接收和发送,减少数据传输所消耗的时间,充分利用核的计算能力,提高数据吞吐量和处理速度,改善众核系统的性能。In the embodiments of the present disclosure, the different cores of the many-core system can only exchange data "directly" through the network-on-chip, or exchange data "indirectly" through the inter-chip routing module and the network-on-chip, that is, there are at least two of them. Different types of data routing; thus, when data transmission, data routing can be selected according to needs. For example, when the data path of a data routing is congested at a certain node, another data routing can be used to avoid data waiting. Ensure that data can be received and sent in time, reduce the time consumed for data transmission, make full use of the computing power of the core, increase data throughput and processing speed, and improve the performance of the many-core system.
在一些实施例中,至少部分相邻区块对应的片间路由模块配置为进行交互。In some embodiments, at least part of the inter-chip routing modules corresponding to adjacent blocks are configured to interact.
参照图1至图3,由于区块是对芯片的“面积”划分得到的,故有些区块在位置上是“相邻”的,而每两个相邻的区块必然对应“一对(两个)”片间路由模块,而多“对”片间路由模块中,有至少部分“对”的两个片间路由模块间可实现数据交互。Referring to Figures 1 to 3, since the blocks are obtained by dividing the "area" of the chip, some blocks are "adjacent" in position, and every two adjacent blocks must correspond to "a pair ( Two)" inter-chip routing modules, and among multiple "pairs" inter-chip routing modules, at least part of the "pairs" of the two inter-chip routing modules can realize data exchange.
例如,参照图1、图2,对区块bank0而言,区块bank1和区块bank3均是与其相邻的区块,其中区块bank0对应的片间路由模块CR0与区 块bank1对应的片间路由模块CR1间可实现数据交互,而区块bank0对应的片间路由模块CR0与区块bank3对应的片间路由模块CR3间可无法进行数据交互。For example, referring to Figures 1 and 2, for block bank0, block bank1 and block bank3 are adjacent blocks, where the inter-chip routing module CR0 corresponding to block bank0 and the chip corresponding to bank1 Data exchange can be realized between the inter-chip routing modules CR1, and the inter-chip routing module CR0 corresponding to the block bank0 and the inter-chip routing module CR3 corresponding to the block bank3 may not be able to exchange data.
在一些实施例中,至少部分区块连接用于传递外部数据的数据接口;片上网络互联结构配置为实现外部数据的接收处理及处理后的数据在单个芯片各核间的传输。In some embodiments, at least part of the block is connected to a data interface for transferring external data; the on-chip network interconnection structure is configured to realize the reception and processing of external data and the transmission of processed data between cores of a single chip.
众核系统还可包括与区块连接(如与区块对应的片间路由模块连接)的数据接口,数据接口用于接收来自外部的数据(外部数据),从而片上网络互联结构还可接收外部数据并对其进行处理,且将处理后的数据发送给相应的核(目的核)。The many-core system may also include a data interface connected to the block (for example, the inter-chip routing module corresponding to the block). The data interface is used to receive data from the outside (external data), so that the on-chip network interconnection structure can also receive external Data and process it, and send the processed data to the corresponding core (destination core).
应当理解,目的核是与特定的外部数据对应的,即在不同的数据传输过程中的目的核可以是不同的。It should be understood that the target core corresponds to specific external data, that is, the target core may be different in different data transmission processes.
在不同的具体情况下,众核系统中最优的数据路由可通过AI模型训练得到,以实现性能最优的数据传输,尽量减少数据传输所消耗的时间,且不同的AI模型可能会训练出不同的最优的数据路由。In different specific situations, the optimal data routing in the many-core system can be obtained through AI model training to achieve optimal data transmission and minimize the time consumed for data transmission, and different AI models may be trained Different optimal data routing.
例如,参照图1、图2,对于一个Chip(芯片),可分为四个bank(区块),每一个bank对应一个CR(片间路由模块),有部分bank连接PCIE接口(数据接口),PCIE接口连接服务器(Server),而片上网络具体通过NoC(片上网络节点)实现。For example, referring to Figure 1 and Figure 2, for a Chip (chip), it can be divided into four banks (blocks), each bank corresponds to a CR (inter-chip routing module), and some banks are connected to the PCIE interface (data interface) , PCIE interface is connected to the server (Server), and the network on chip is specifically implemented by NoC (Network Node on Chip).
示例性的,以上芯片中传输数据时的数据路由可包括:Exemplarily, the data routing when transmitting data in the above chip may include:
第一种数据路由:The first data routing:
CR->NoC(bank内)->Core,该数据路由是使用CR将数据发往bank内的NoC上,通过不同层级的NoC传递到目的核中,进行计算。CR->NoC (in the bank)->Core, the data routing is to use CR to send data to the NoC in the bank, and pass it to the target core through different levels of NoC for calculation.
在一些实施例中,参照图1,当与数据接口相连区块为目的核所在区块时,与数据接口相连区块对应设置的片间路由模块,用于接收并处理数据接口传递的外部数据,并将处理后的数据传递至目的核所在区块内的片上网络节点;目的核所在区块内的片上网络节点,用于接收处理后的数据,并将处理后的数据传递至目的核。In some embodiments, referring to FIG. 1, when the block connected to the data interface is the block where the target core is located, the inter-chip routing module corresponding to the block connected to the data interface is used to receive and process the external data transmitted by the data interface , And deliver the processed data to the on-chip network node in the block where the target core is located; the on-chip network node in the block where the target core is located is used to receive the processed data and transfer the processed data to the target core.
具体数据路由可以为:数据首先通过与目的核所在区块对应设置的片间路由模块,然后传递至目的核所在区块内的片上网络节点,最后传递至目的核中,进行计算。The specific data routing can be as follows: the data first passes through the inter-chip routing module set corresponding to the block where the target core is located, then is transferred to the on-chip network node in the block where the target core is located, and finally transferred to the target core for calculation.
在一些实施例中,参照图2,当与数据接口相连区块不是目的核所在区块时,与数据接口相连区块对应设置的片间路由模块,用于接收并处理数据接口传递的外部数据,并将处理后的数据传递至与数据接口相连区块内的片上网络节点;与数据接口相连区块内的片上网络节点,用于接收处理后的数据,并将处理后的数据传递至目的核所在区块内的片上网络节点;目的核所在区块内的片上网络节点,用于接收处理后的数据,并将处理后的数据传递至目的核。In some embodiments, referring to Figure 2, when the block connected to the data interface is not the block where the target core is located, the inter-chip routing module corresponding to the block connected to the data interface is used to receive and process the external data transmitted by the data interface , And transfer the processed data to the on-chip network node in the block connected to the data interface; the on-chip network node in the block connected to the data interface is used to receive the processed data and pass the processed data to the destination The on-chip network node in the block where the core is located; the on-chip network node in the block where the target core is located, used to receive processed data and transfer the processed data to the target core.
具体数据路由可以为:数据首先通过与数据接口相连区块对应设置的片间路由模块,然后通过该区块内的片上网络节点传递至目的核所在区块内的片上网络节点,最后传递至目的核中,进行计算。The specific data routing can be as follows: the data first passes through the inter-chip routing module corresponding to the block connected to the data interface, and then passes the on-chip network node in the block to the on-chip network node in the block where the target core is located, and finally passes to the destination In the core, perform calculations.
第二种数据路由:The second data routing:
CR->CRx->NoC(bank内)->Core,该数据路由采用loop CR(即数据依次通过片间路由模块)的形式进行传输,通过loop CR将数据传送到目的Core所在的bank中,再由bank内的NoC进行传输。其中,CRx表示其中一个或多个片间路由模块。CR->CRx->NoC (in the bank)->Core, the data routing is transmitted in the form of loop CR (that is, the data sequentially passes through the inter-chip routing module), and the data is transmitted to the bank where the destination Core is located through loop CR. Then transfer by the NoC in the bank. Among them, CRx represents one or more inter-chip routing modules.
在一些实施例中,参照图1,当与数据接口相连区块为目的核所在区块时,与数据接口相连区块对应设置的片间路由模块,用于接收并处理数据接口传递的外部数据,并将处理后的数据传递至其他片间路由模块,直至目的核所在区块相邻区块对应设置的片间路由模块;目的核所在区块相邻区块对应设置的片间路由模块,用于接收处理后的数据,并将处理后的数据传递至目的核所在区块相邻区块内的片上网络节点;目的核所在区块相邻区块内的片上网络节点,用于接收处理后的数据,并将处理后的数据传递至目的核所在区块内的片上网络节点;目的核所在区块内的片上网络节点,用于接收处理后的数据,并将处理后的数据传递至目的核。In some embodiments, referring to FIG. 1, when the block connected to the data interface is the block where the target core is located, the inter-chip routing module corresponding to the block connected to the data interface is used to receive and process the external data transmitted by the data interface , And transfer the processed data to other inter-chip routing modules until the inter-chip routing module corresponding to the adjacent block of the target core block; the inter-chip routing module corresponding to the adjacent block of the target core block, Used to receive processed data, and transfer the processed data to the on-chip network node in the adjacent block of the target core block; the on-chip network node in the adjacent block of the target core block is used for receiving and processing And transfer the processed data to the on-chip network node in the block where the target core is located; the on-chip network node in the block where the target core is located is used to receive the processed data and transfer the processed data to Purpose nuclear.
具体数据路由可以为:数据首先通过与目的核所在区块对应设置 的片间路由模块,然后依次通过其他片间路由模块,直至目的核所在区块相邻区块对应设置的片间路由模块,再通过目的核所在区块相邻区块内的片上网络节点传递至目的核所在区块内的片上网络节点,最后传递至目的核中,进行计算。The specific data routing can be as follows: the data first passes through the inter-chip routing module corresponding to the block where the target core is located, and then passes through other inter-chip routing modules in turn, until the inter-chip routing module corresponding to the adjacent block of the target core block. Then it is passed to the on-chip network node in the block where the target core is located through the on-chip network node in the adjacent block of the block where the target core is located, and finally transferred to the target core for calculation.
在一些实施例中,参照图2,当与数据接口相连区块不是目的核所在区块时,与数据接口相连区块对应设置的片间路由模块,用于接收并处理数据接口传递的外部数据,并将处理后的数据传递至其他片间路由模块,直至目的核所在区块对应设置的片间路由模块;目的核所在区块对应设置的片间路由模块,用于接收处理后的数据,并将处理后的数据传递至目的核所在区块内的片上网络节点;目的核所在区块内的片上网络节点,用于接收处理后的数据,并将处理后的数据传递至目的核。In some embodiments, referring to Figure 2, when the block connected to the data interface is not the block where the target core is located, the inter-chip routing module corresponding to the block connected to the data interface is used to receive and process the external data transmitted by the data interface , And pass the processed data to other inter-chip routing modules until the inter-chip routing module corresponding to the block where the target core is located; the inter-chip routing module corresponding to the block where the target core is located is used to receive the processed data, And transfer the processed data to the on-chip network node in the block where the target core is located; the on-chip network node in the block where the target core is located is used to receive the processed data and transfer the processed data to the target core.
具体数据路由可以为:数据首先通过与数据接口相连区块对应设置的片间路由模块,然后依次通过其他片间路由模块,直至目的核所在区块对应设置的片间路由模块,最后通过目的核所在区块内的片上网络节点传递至目的核中,进行计算。The specific data routing can be as follows: the data first passes through the inter-chip routing module corresponding to the block connected to the data interface, then passes through other inter-chip routing modules in turn, until the inter-chip routing module corresponding to the block where the target core is located, and finally passes through the target core The on-chip network nodes in the block are passed to the target core for calculation.
以上两种数据路由的方案,都可以将数据发送到目的核,这样可以解决数据拥堵带来的性能下降的问题,减少数据传输的时间。Both of the above two data routing schemes can send data to the destination core, which can solve the problem of performance degradation caused by data congestion and reduce the time of data transmission.
在一些实施例中,目的核包括:路由接收模块,用于接收数据;计算模块,用于根据接收到的数据进行计算。In some embodiments, the target core includes: a routing receiving module, which is used to receive data; and a calculation module, which is used to perform calculations based on the received data.
从而数据传输至目的核后,由目的核内的路由接收模块接收数据,并由目的核内的计算模块进行计算。Therefore, after the data is transmitted to the target core, the routing receiving module in the target core receives the data, and the calculation module in the target core performs calculations.
在一些实施例中,片上网络互联结构配置为实现数据在多个芯片各核间的传输。In some embodiments, the on-chip network interconnection structure is configured to implement data transmission between cores of multiple chips.
对于众核系统,还可以实现不同芯片内的核间的通信,如将数据从一个芯片的源核发送至另一个芯片的目的核。For many-core systems, communication between cores in different chips can also be realized, such as sending data from the source core of one chip to the destination core of another chip.
应当理解,源核、目的核是与特定的数据的出发点和终点对应的, 即在不同的数据传输过程中的源核、目的核以及它们所在的芯片都可以是不同的。It should be understood that the source core and the target core correspond to the starting point and the end point of specific data, that is, the source core, the target core, and the chips on which they are located in different data transmission processes may be different.
例如,参照图3,众核系统可包括多个Chip(芯片),对于每一个Chip,可分为四个bank(区块),每一个bank对应一个CR(片间路由模块),bank内的数据路由由NoC完成拓扑,bank间的数据路由通过CR完成。For example, referring to Figure 3, a many-core system can include multiple Chips. For each Chip, it can be divided into four banks (blocks). Each bank corresponds to a CR (Inter-Chip Routing Module). The data routing is completed by NoC topologically, and the data routing between banks is completed by CR.
在一些实施例中,第一芯片的一个区块对应设置的片间路由模块与第二芯片的一个区块对应设置的片间路由模块相连,以使第一芯片的该区块与第二芯片的该区块连接。In some embodiments, the inter-chip routing module corresponding to a block of the first chip is connected to the inter-chip routing module corresponding to a block of the second chip, so that the block of the first chip is connected to the second chip. Of the block connection.
其中,第一芯片的一个区块与第二芯片的一个区块之间,通过各自的片间路由模块相连,例如参照图3,Chip0的CR2与Chip1的CR3相连,从而使Chip0的bank2与Chip1的bank3相连,以实现跨芯片的数据传输。Among them, a block of the first chip and a block of the second chip are connected through respective inter-chip routing modules. For example, referring to Figure 3, CR2 of Chip0 is connected to CR3 of Chip1, so that bank2 of Chip0 is connected to Chip1. The bank3 is connected to achieve cross-chip data transmission.
示例性的,以上众核系统中传输数据时的数据路由可包括:Exemplarily, the data routing during data transmission in the above many-core system may include:
第一种数据路由:The first data routing:
Core(Chip0 src)->NoC(Chip0 bank内)->CRx(Chip0)->…CRx(Chip0)->CRx(Chip1)->NoC(Chip1 bank内)->Core(Chip1 dst)。其中,CRx表示其中一个或多个片间路由模块,chip0为第一芯片,Chip1为第二芯片,Core(src)表示源核,Core(dst)表示目的核。Core(Chip0 src)->NoC(Chip0 bank)->CRx(Chip0)->…CRx(Chip0)->CRx(Chip1)->NoC(Chip1 bank)->Core(Chip1 dst). Among them, CRx represents one or more inter-chip routing modules, chip0 is the first chip, Chip1 is the second chip, Core (src) indicates the source core, and Core (dst) indicates the destination core.
在一些实施例中,第一芯片的源核,用于将数据传递至第一芯片的源核所在区块内的片上网络节点;第一芯片的源核所在区块内的片上网络节点,用于接收数据并将数据传递至第一芯片的源核所在区块对应设置的片间路由模块;第一芯片的源核所在区块对应设置的片间路由模块,用于接收数据并将数据传递至第一芯片的其他片间路由模块,直至与第二芯片连接的第一芯片的区块对应设置的片间路由模块;与第二芯片连接的第一芯片的区块对应设置的片间路由模块,用于接收数据并将数据传递至第二芯片的目的核所在区块对应设置的片间路由模块;第二芯片的目的核所在区块对应设置的片间路由模块,用于 接收数据并将数据传递至第二芯片的目的核所在区块内的片上网络节点;第二芯片的目的核所在区块内的片上网络节点,用于接收数据并将数据传递至第二芯片的目的核。In some embodiments, the source core of the first chip is used to transfer data to the on-chip network node in the block where the source core of the first chip is located; the on-chip network node in the block where the source core of the first chip is located is used To receive data and transfer the data to the inter-chip routing module corresponding to the block where the source core of the first chip is located; the inter-chip routing module corresponding to the block where the source core of the first chip is located is used to receive data and transfer the data The other inter-chip routing modules to the first chip, to the inter-chip routing module corresponding to the block of the first chip connected to the second chip; the inter-chip routing module corresponding to the block of the first chip connected to the second chip The module is used to receive data and transfer the data to the inter-chip routing module corresponding to the block where the target core of the second chip is located; the inter-chip routing module corresponding to the block where the target core of the second chip is located is used to receive data and The data is transferred to the on-chip network node in the block where the target core of the second chip is located; the on-chip network node in the block where the target core of the second chip is located is used to receive the data and transfer the data to the target core of the second chip.
具体数据路由可以为:数据首先通过第一芯片(Chip0)的源核传递至该源核所在区块内的片上网络节点,然后通过源核所在区块对应设置的片间路由模块,再依次通过第一芯片(Chip0)的其他片间路由模块,直至与第二芯片(Chip1)目的核所在区块连接的第一芯片(Chip0)上区块对应设置的片间路由模块,再通过第二芯片(Chip1)的目的核所在区块对应设置的片间路由模块,最后通过目的核所在区块内的片上网络节点传递至目的核中,进行计算。The specific data routing can be as follows: the data is first transmitted through the source core of the first chip (Chip0) to the on-chip network node in the block where the source core is located, and then through the inter-chip routing module corresponding to the block where the source core is located, and then sequentially pass through The other inter-chip routing modules of the first chip (Chip0) until the inter-chip routing module corresponding to the block on the first chip (Chip0) connected to the block where the target core of the second chip (Chip1) is located, and then pass through the second chip (Chip1) The inter-chip routing module corresponding to the block where the target core is located is finally passed to the target core through the on-chip network node in the block where the target core is located for calculation.
第二种数据路由:The second data routing:
Core(Chip0 src)->NoC(Chip0 bank内)->…->NoC(Chip0 bank内)->CRx(Chip0)->CRx(Chip1)->NoC(Chip1 bank内)->Core(Chip1(dst)。其中,CRx表示其中一个或多个片间路由模块,chip0为第一芯片,Chip1为第二芯片,Core(src)表示源核,Core(dst)表示目的核。Core(Chip0 src)->NoC(Chip0 bank)->…->NoC(Chip0 bank)->CRx(Chip0)->CRx(Chip1)->NoC(Chip1 bank)->Core(Chip1( dst). Among them, CRx represents one or more inter-chip routing modules, chip0 is the first chip, Chip1 is the second chip, Core (src) is the source core, and Core (dst) is the destination core.
在一些实施例中,第一芯片的源核,用于将数据传递至第一芯片的源核所在区块内的片上网络节点;第一芯片的源核所在区块内的片上网络节点,用于接收数据并将数据传递至第一芯片的其他区块内的片上网络节点;第一芯片的其他区块内的片上网络节点,用于接收数据并将数据传递至与第二芯片连接的第一芯片的区块对应设置的片间路由模块;与第二芯片连接的第一芯片的区块对应设置的片间路由模块,用于接收数据并将数据传递至第二芯片的目的核所在区块对应设置的片间路由模块;第二芯片的目的核所在区块对应设置的片间路由模块,用于接收数据并将数据传递至第二芯片的目的核所在区块内的片上网络节点;第二芯片的目的核所在区块内的片上网络节点,用于接收数据并将数据传递至第二芯片的目的核。In some embodiments, the source core of the first chip is used to transfer data to the on-chip network node in the block where the source core of the first chip is located; the on-chip network node in the block where the source core of the first chip is located is used It is used to receive data and transfer the data to the on-chip network nodes in other blocks of the first chip; the on-chip network nodes in other blocks of the first chip are used to receive data and transfer the data to the second chip connected to the second chip. The inter-chip routing module corresponding to the block of one chip; the inter-chip routing module corresponding to the block of the first chip connected to the second chip, which is used to receive data and transfer the data to the area where the target core of the second chip is located The inter-chip routing module corresponding to the block; the inter-chip routing module corresponding to the block where the target core of the second chip is located, and is used to receive data and transfer the data to the on-chip network node in the block where the target core of the second chip is located; The on-chip network node in the block where the target core of the second chip is located is used to receive data and transfer the data to the target core of the second chip.
具体数据路由可以为:数据首先通过第一芯片(Chip0)的源核传递至该源核所在区块内的片上网络节点,然后通过第一芯片(Chip0) 其他区块内的片上网络节点传递至与第二芯片(Chip1)目的核所在区块连接的第一芯片(Chip0)上区块对应设置的片间路由模块,再通过第二芯片(Chip1)的目的核所在区块对应设置的片间路由模块后,最后通过目的核所在区块内的片上网络节点传递至目的核中,进行计算。The specific data routing can be as follows: data is first transferred to the on-chip network node in the block where the source core is located through the source core of the first chip (Chip0), and then transferred to the on-chip network nodes in the other blocks of the first chip (Chip0) The inter-chip routing module corresponding to the block on the first chip (Chip0) connected to the block where the target core of the second chip (Chip1) is located, and then the inter-chip routing module corresponding to the block where the target core of the second chip (Chip1) is located After the routing module, it is finally transferred to the target core through the on-chip network node in the block where the target core is located for calculation.
当不同芯片间有区块连接时,数据可先传递至第一芯片中与第二芯片相连的区块对应设置的片间路由模块(Chip0的bank2对应设置的CR2),再传递到第二芯片的与第一芯片相连区块对应设置的片间路由模块(Chip1的bank3对应设置的CR3)中,并最终传递至第二芯片的目的核中。When there is a block connection between different chips, the data can be first transferred to the inter-chip routing module (CR2 set corresponding to bank2 of Chip0) in the block connected to the second chip in the first chip, and then transferred to the second chip In the inter-chip routing module corresponding to the block connected to the first chip (CR3 corresponding to bank3 of Chip1), and finally transferred to the target core of the second chip.
以上两种数据路由的方案,都可以将数据从源核发送到目的核,这样可以解决数据拥堵带来的性能下降的问题,减少数据传输的时间。The above two data routing schemes can send data from the source core to the destination core, which can solve the problem of performance degradation caused by data congestion and reduce the time of data transmission.
在图3的具体示例中,第二芯片中目的核所在区块,也就是第二芯片中与第一芯片相连的区块(Chip1的bank3),第二芯片中目的核所在区块对应设置的片间路由模块与第二芯片中与第一芯片相连的区块对应设置的片间路由模块也是同一个片间路由模块(Chip1的CR3)。但应当理解,如果目的核是在第二芯片的其他区块中,则当数据传递到第二芯片中与第一芯片相连的区块对应设置的片间路由模块后,还可进一步按照以上数据在单个芯片内部传输的方式,传输到目的核(包括通过片间路由模块和片上网络间接传输,或通过片上网络直接传输),在此不再详细描述。In the specific example shown in Figure 3, the block in the second chip where the target core is located, that is, the block in the second chip that is connected to the first chip (bank3 of Chip1), and the block in the second chip where the target core is located is correspondingly set The inter-chip routing module and the inter-chip routing module corresponding to the block connected to the first chip in the second chip are also the same inter-chip routing module (CR3 of Chip1). However, it should be understood that if the target core is in another block of the second chip, after the data is transferred to the inter-chip routing module corresponding to the block connected to the first chip in the second chip, the above data can be further followed The method of transmission within a single chip, transmission to the target core (including indirect transmission through the inter-chip routing module and the on-chip network, or direct transmission through the on-chip network), will not be described in detail here.
同时,应当理解,如果源核所在的第一芯片与目的核所在的第二芯片并不直接连接,而是通过其他的芯片间接的连接,也是可行的;即数据传输过程中,还需要经过其他的芯片,且在每个其它芯片中,数据可通过片间路由模块传输,也可通过片上网络传输。At the same time, it should be understood that if the first chip where the source core is located and the second chip where the destination core is located are not directly connected, but indirectly connected through other chips, it is also feasible; that is, during the data transmission process, other chips need to pass through. In each other chip, data can be transmitted through the inter-chip routing module or through the on-chip network.
第二方面,本公开实施例提供一种数据传输方法,应用于众核系统,众核系统包括至少一个芯片,每个芯片集成多核,每个芯片设置至少两个区块,每个区块包括至少一个核,每个区块对应设置片间路由模块,数据传输方法包括:In a second aspect, the embodiments of the present disclosure provide a data transmission method, which is applied to a many-core system. The many-core system includes at least one chip, each chip integrates multiple cores, and each chip is provided with at least two blocks, and each block includes At least one core, each block corresponds to an inter-chip routing module, and the data transmission method includes:
通过片间路由模块实现区块间的数据传输,通过片上网络实现与片间路由模块的数据传输及数据在各核间的传输。The data transmission between the blocks is realized through the inter-chip routing module, and the data transmission with the inter-chip routing module and the data transmission between the cores are realized through the on-chip network.
对应前述的片上网络互联结构,进行数据传输时,可通过片间路由模块实现区块间(或者说不同区块的核间)的数据传输,而通过片上网络实现与片间路由模块的数据传输及数据在各核间的传输,从而解决数据拥堵带来的性能下降的问题,减少数据传输的时间。Corresponding to the aforementioned on-chip network interconnection structure, when data transmission is performed, the inter-chip routing module can be used to realize data transmission between blocks (or between cores of different blocks), and the data transmission with the inter-chip routing module can be realized through the on-chip network And data transmission between cores, so as to solve the problem of performance degradation caused by data congestion and reduce the time of data transmission.
在一些实施例中,通过片间路由模块实现区块间的数据传输包括:通过片间路由模块实现相邻区块间的数据传输。In some embodiments, realizing data transmission between blocks through the inter-chip routing module includes: realizing data transmission between adjacent blocks through the inter-chip routing module.
本公开实施例中,每两个相邻的区块必然对应“一对(两个)”片间路由模块,而多“对”片间路由模块中,有至少部分“对”的两个片间路由模块间可实现数据交互。In the embodiment of the present disclosure, every two adjacent blocks must correspond to a "pair (two)" inter-chip routing module, and among multiple "pairs" inter-chip routing modules, there are at least partially "pairs" of two slices. Data interaction can be realized between the routing modules.
在一些实施例中,至少部分区块连接用于传递外部数据的数据接口;通过片间路由模块实现区块间的数据传输,通过片上网络实现与片间路由模块的数据传输及数据在各核间的传输,包括:In some embodiments, at least part of the blocks are connected to the data interface used to transfer external data; the data transmission between the blocks is realized through the inter-chip routing module, and the data transmission with the inter-chip routing module is realized through the on-chip network, and the data is transmitted in each core. Inter-transmission, including:
通过与数据接口相连区块对应设置的片间路由模块接收外部数据并处理,通过片上网络和/或其他片间路由模块将处理后的数据传输至目的核,实现外部数据的接收处理及处理后的数据在单个芯片各核间的传输。The external data is received and processed through the inter-chip routing module corresponding to the block connected to the data interface, and the processed data is transmitted to the target core through the on-chip network and/or other inter-chip routing modules to realize the reception and processing of external data The data is transmitted between the cores of a single chip.
根据以上方式,数据传输过程中有至少两种数据路由可以选择,为确定每次实际的数据传输过程选择哪种数据路由,可以是预先设定一个默认的数据路由(如通过AI模型训练得到的最优的数据路由),在该数据路由的数据通路发生拥堵时,才切换另一种数据路由作为备用。According to the above method, there are at least two data routes to choose from during the data transmission process. In order to determine which data route is selected for each actual data transmission process, a default data route (such as the one obtained through AI model training) can be set in advance. Optimal data route), when the data path of the data route is congested, another data route is switched as a backup.
示例性的,以上单个芯片各核间传输数据时的数据路由可包括:Exemplarily, the data routing when transmitting data between the cores of the above single chip may include:
第一种数据路由:The first data routing:
CR->NoC(bank内)->Core,该数据路由是使用CR将数据发往bank内的NoC上,再通过不同层级的NoC传递到目的核中,进行计算。CR->NoC (in the bank)->Core, the data routing is to use CR to send data to the NoC in the bank, and then pass it to the target core through different levels of NoC for calculation.
在一些实施例中,如图1所示,当与数据接口相连区块为目的核所在区块时,通过与数据接口相连区块对应设置的片间路由模块接收外部数据并处理,通过片上网络和/或其他片间路由模块将处理后的数据传输至目的核,包括:In some embodiments, as shown in FIG. 1, when the block connected to the data interface is the block where the target core is located, the external data is received and processed through the inter-chip routing module corresponding to the block connected to the data interface. And/or other inter-chip routing modules transmit the processed data to the target core, including:
通过与数据接口相连区块对应设置的片间路由模块接收外部数据并处理;通过与数据接口相连区块对应设置的片间路由模块将处理后的数据传递至目的核所在区块内的片上网络节点;通过目的核所在区块内的片上网络节点将处理后的数据传递至目的核。External data is received and processed through the inter-chip routing module corresponding to the block connected to the data interface; the processed data is transmitted to the on-chip network in the block where the target core is located through the inter-chip routing module corresponding to the block connected to the data interface Node: The processed data is delivered to the target core through the on-chip network node in the block where the target core is located.
具体数据路由可以为:数据首先通过与目的核所在区块对应设置的片间路由模块,然后传递至目的核所在区块内的片上网络节点,最后传递至目的核中,进行计算。The specific data routing can be as follows: the data first passes through the inter-chip routing module set corresponding to the block where the target core is located, then is transferred to the on-chip network node in the block where the target core is located, and finally transferred to the target core for calculation.
在一些实施例中,如图2所示,当与数据接口相连区块不是目的核所在区块时,通过与数据接口相连区块对应设置的片间路由模块接收外部数据并处理,通过片上网络和/或其他片间路由模块将处理后的数据传输至目的核,包括:通过与数据接口相连区块对应设置的片间路由模块接收外部数据并处理;通过与数据接口相连区块对应设置的片间路由模块将处理后的数据传递至与数据接口相连区块内的片上网络节点;通过与数据接口相连区块内的片上网络节点将处理后的数据传递至目的核所在区块内的片上网络节点;通过目的核所在区块内的片上网络节点将处理后的数据传递至目的核。In some embodiments, as shown in FIG. 2, when the block connected to the data interface is not the block where the target core is located, the external data is received and processed through the inter-chip routing module corresponding to the block connected to the data interface. And/or other inter-chip routing modules transmit the processed data to the target core, including: receiving and processing external data through the inter-chip routing module corresponding to the block connected to the data interface; The inter-chip routing module transfers the processed data to the on-chip network node in the block connected to the data interface; passes the processed data to the on-chip in the block where the target core is located through the on-chip network node in the block connected to the data interface Network node: The processed data is delivered to the target core through the on-chip network node in the block where the target core is located.
具体数据路由可以为:数据首先通过与数据接口相连区块对应设置的片间路由模块,然后通过该区块内的片上网络节点传递至目的核所在区块内的片上网络节点,最后传递至目的核中,进行计算。The specific data routing can be as follows: the data first passes through the inter-chip routing module corresponding to the block connected to the data interface, and then passes the on-chip network node in the block to the on-chip network node in the block where the target core is located, and finally passes to the destination In the core, perform calculations.
第二种数据路由:The second data routing:
CR->CRx->NoC(bank内)->Core,该数据路由采用loop CR(即数据依次通过片间路由模块)的形式进行传输,通过loop CR将数据传送到目的Core所在的bank中,再由bank内的NoC进行传输。其中,CRx表示其中一个或多个片间路由模块。CR->CRx->NoC (in the bank)->Core, the data routing is transmitted in the form of loop CR (that is, the data sequentially passes through the inter-chip routing module), and the data is transmitted to the bank where the destination Core is located through loop CR. Then transfer by the NoC in the bank. Among them, CRx represents one or more inter-chip routing modules.
在一些实施例中,如图1所示,当与数据接口相连区块为目的核 所在区块时,通过与数据接口相连区块对应设置的片间路由模块接收外部数据并处理,通过片上网络和/或其他片间路由模块将处理后的数据传输至目的核,实现外部数据的接收处理及处理后的数据在单个芯片各核间的传输包括:In some embodiments, as shown in FIG. 1, when the block connected to the data interface is the block where the target core is located, the external data is received and processed through the inter-chip routing module corresponding to the block connected to the data interface. And/or other inter-chip routing modules transmit the processed data to the target core to realize the reception and processing of external data and the transmission of processed data between the cores of a single chip, including:
通过与数据接口相连区块对应设置的片间路由模块接收外部数据并处理;通过与数据接口相连区块对应设置的片间路由模块将处理后的数据传递至其他片间路由模块,直至目的核所在区块相邻区块对应设置的片间路由模块;通过目的核所在区块相邻区块对应设置的片间路由模块将处理后的数据传递至目的核所在区块相邻区块内的片上网络节点;通过目的核所在区块相邻区块内的片上网络节点将处理后的数据传递至目的核所在区块内的片上网络节点;通过目的核所在区块内的片上网络节点将处理后的数据传递至目的核。Receive and process external data through the inter-chip routing module corresponding to the block connected to the data interface; pass the processed data to other inter-chip routing modules through the inter-chip routing module corresponding to the block connected to the data interface until the destination core The inter-chip routing module corresponding to the adjacent block of the block where the target core is located; the inter-chip routing module corresponding to the adjacent block where the target core is located transmits the processed data to the adjacent block of the target core block On-chip network node; through the on-chip network node in the adjacent block where the target core is located, the processed data is transferred to the on-chip network node in the block where the target core is located; processed by the on-chip network node in the block where the target core is located The later data is passed to the destination core.
具体数据路由可以为:数据首先通过与目的核所在区块对应设置的片间路由模块,然后依次通过其他片间路由模块,直至目的核所在区块相邻区块对应设置的片间路由模块,再通过目的核所在区块相邻区块内的片上网络节点传递至目的核所在区块内的片上网络节点,最后传递至目的核中,进行计算。The specific data routing can be as follows: the data first passes through the inter-chip routing module corresponding to the block where the target core is located, and then passes through other inter-chip routing modules in turn, until the inter-chip routing module corresponding to the adjacent block of the target core block. Then it is passed to the on-chip network node in the block where the target core is located through the on-chip network node in the adjacent block of the block where the target core is located, and finally transferred to the target core for calculation.
在一些实施例中,如图2所示,当与数据接口相连区块不是目的核所在区块时,通过与数据接口相连区块对应设置的片间路由模块接收外部数据并处理,通过片上网络和/或其他片间路由模块将处理后的数据传输至目的核,包括:In some embodiments, as shown in Figure 2, when the block connected to the data interface is not the block where the target core is located, the external data is received and processed through the inter-chip routing module corresponding to the block connected to the data interface, and the external data is processed through the network on chip. And/or other inter-chip routing modules transmit the processed data to the target core, including:
通过与数据接口相连区块对应设置的片间路由模块接收外部数据并处理;通过与数据接口相连区块对应设置的片间路由模块将处理后的数据传递至其他片间路由模块,直至目的核所在区块对应设置的片间路由模块;通过目的核所在区块对应设置的片间路由模块将处理后的数据传递至目的核所在区块内的片上网络节点;通过目的核所在区块内的片上网络节点将处理后的数据传递至目的核。Receive and process external data through the inter-chip routing module corresponding to the block connected to the data interface; pass the processed data to other inter-chip routing modules through the inter-chip routing module corresponding to the block connected to the data interface until the destination core The inter-chip routing module corresponding to the block where the target core is located; the inter-chip routing module corresponding to the block where the target core is located transmits the processed data to the on-chip network node in the block where the target core is located; The on-chip network node delivers the processed data to the target core.
具体数据路由可以为:数据首先通过与数据接口相连区块对应设置的片间路由模块,然后依次通过其他片间路由模块,直至目的核所 在区块对应设置的片间路由模块,最后通过目的核所在区块内的片上网络节点传递至目的核中,进行计算。The specific data routing can be as follows: the data first passes through the inter-chip routing module corresponding to the block connected to the data interface, then passes through other inter-chip routing modules in turn, until the inter-chip routing module corresponding to the block where the target core is located, and finally passes through the target core. The on-chip network nodes in the block are passed to the target core for calculation.
在一些实施例中,数据传输方法还包括:目的核内的路由接收模块接收数据,目的核内的计算模块根据接收到的数据进行计算。In some embodiments, the data transmission method further includes: a routing receiving module in the target core receives data, and a calculation module in the target core performs calculations based on the received data.
在一些实施例中,通过片间路由模块实现区块间的数据传输,通过片上网络实现与片间路由模块的数据传输及数据在各核间的传输,包括:In some embodiments, the data transmission between blocks is realized through the inter-chip routing module, and the data transmission with the inter-chip routing module and the data transmission between the cores are realized through the on-chip network, including:
通过各片间路由模块和片上网络将数据从一芯片的源核传输至另一芯片的目的核,实现数据在多个芯片各核间的传输。The data is transmitted from the source core of one chip to the destination core of another chip through the inter-chip routing modules and the on-chip network, so as to realize the transmission of data among the cores of multiple chips.
本公开实施例的数据传输方法是针对众核系统的,故还可以实现数据在不同芯片的核间的传输,即数据通过各片间路由模块和片上网络从一芯片的源核传输至另一芯片的目的核。The data transmission method of the embodiment of the present disclosure is aimed at a many-core system, so it can also realize the transmission of data between cores of different chips, that is, data is transmitted from the source core of one chip to another through the inter-chip routing modules and the on-chip network. The target core of the chip.
同样,根据以上方式,数据传输过程中有至少两种数据路由可以选择,为确定每次实际的数据传输过程选择哪种数据路由,可以是预先设定一个默认的数据路由(如通过AI模型训练得到的最优的数据路由),在该数据路由的数据通路发生拥堵时,才切换另一种数据路由作为备用。Similarly, according to the above method, there are at least two data routes to choose from during the data transmission process. To determine which data route to choose for each actual data transmission process, a default data route can be set in advance (for example, through AI model training). The optimal data route obtained), when the data path of the data route is congested, another data route is switched as a backup.
在一些实施例中,第一芯片的一个区块对应设置的片间路由模块与第二芯片的一个区块对应设置的片间路由模块相连,以使第一芯片的该区块与第二芯片的该区块连接。In some embodiments, the inter-chip routing module corresponding to a block of the first chip is connected to the inter-chip routing module corresponding to a block of the second chip, so that the block of the first chip is connected to the second chip. Of the block connection.
示例性的,以上众核系统中传输数据时的数据路由可包括:Exemplarily, the data routing during data transmission in the above many-core system may include:
第一种数据路由:The first data routing:
Core(Chip0 src)->NoC(Chip0 bank内)->CRx(Chip0)->…CRx(Chip0)->CRx(Chip1)->NoC(Chip1 bank内)->Core(Chip1 dst)。其中,CRx表示其中一个或多个片间路由模块,chip0为第一芯片,Chip1为第二芯片,Core(src)表示源核,Core(dst)表示目的核。Core(Chip0 src)->NoC(Chip0 bank)->CRx(Chip0)->…CRx(Chip0)->CRx(Chip1)->NoC(Chip1 bank)->Core(Chip1 dst). Among them, CRx represents one or more inter-chip routing modules, chip0 is the first chip, Chip1 is the second chip, Core (src) indicates the source core, and Core (dst) indicates the destination core.
在一些实施例中,通过各片间路由模块和片上网络将数据从一芯片的源核传输至另一芯片的目的核,包括:In some embodiments, transmitting data from the source core of one chip to the destination core of another chip through the inter-chip routing modules and the on-chip network includes:
通过第一芯片的源核将数据传递至第一芯片的源核所在区块内的片上网络节点;通过第一芯片的源核所在区块内的片上网络节点将数据传递至第一芯片的源核所在区块对应设置的片间路由模块;通过第一芯片的源核所在区块对应设置的片间路由模块将数据传递至第一芯片的其他片间路由模块;通过第一芯片的其他片间路由模块将数据传递至与第二芯片连接的第一芯片的区块对应设置的片间路由模块;通过与第二芯片连接的第一芯片的区块对应设置的片间路由模块将数据传递至第二芯片的目的核所在区块对应设置的片间路由模块;通过第二芯片的目的核所在区块对应设置的片间路由模块将数据传递至第二芯片的目的核所在区块内的片上网络节点;通过第二芯片的目的核所在区块内的片上网络节点将数据传递至第二芯片的目的核。The data is transferred to the on-chip network node in the block where the source core of the first chip is located through the source core of the first chip; the data is transferred to the source of the first chip through the on-chip network node in the block where the source core of the first chip is located The inter-chip routing module corresponding to the block where the core is located; the inter-chip routing module corresponding to the block where the source core of the first chip is located transmits data to other inter-chip routing modules of the first chip; through other chips of the first chip The inter-chip routing module transmits data to the inter-chip routing module corresponding to the block of the first chip connected to the second chip; the data is transmitted through the inter-chip routing module corresponding to the block of the first chip connected to the second chip The inter-chip routing module corresponding to the block where the target core of the second chip is located; the inter-chip routing module corresponding to the block where the target core of the second chip is located is used to transfer data to the block where the target core of the second chip is located On-chip network node: The data is transferred to the target core of the second chip through the on-chip network node in the block where the target core of the second chip is located.
具体数据路由可以为:数据首先通过第一芯片(Chip0)的源核传递至该源核所在区块内的片上网络节点,然后通过源核所在区块对应设置的片间路由模块,再依次通过第一芯片(Chip0)的其他片间路由模块,直至与第二芯片(Chip1)目的核所在区块连接的第一芯片(Chip0)上区块对应设置的片间路由模块,再通过第二芯片(Chip1)的目的核所在区块对应设置的片间路由模块,最后通过目的核所在区块内的片上网络节点传递至目的核。The specific data routing can be as follows: the data is first transmitted through the source core of the first chip (Chip0) to the on-chip network node in the block where the source core is located, and then through the inter-chip routing module corresponding to the block where the source core is located, and then sequentially pass through The other inter-chip routing modules of the first chip (Chip0) until the inter-chip routing module corresponding to the block on the first chip (Chip0) connected to the block where the target core of the second chip (Chip1) is located, and then pass through the second chip (Chip1) The inter-chip routing module corresponding to the block where the target core is located is finally delivered to the target core through the on-chip network node in the block where the target core is located.
第二种数据路由:The second data routing:
Core(Chip0 src)->NoC(Chip0 bank内)->…->NoC(Chip0 bank内)->CRx(Chip0)->CRx(Chip1)->NoC(Chip1 bank内)->Core(Chip1(dst)。其中,CRx表示其中一个或多个片间路由模块,chip0为第一芯片,Chip1为第二芯片,Core(src)表示源核,Core(dst)表示目的核。Core(Chip0 src)->NoC(Chip0 bank)->…->NoC(Chip0 bank)->CRx(Chip0)->CRx(Chip1)->NoC(Chip1 bank)->Core(Chip1( dst). Among them, CRx represents one or more inter-chip routing modules, chip0 is the first chip, Chip1 is the second chip, Core (src) is the source core, and Core (dst) is the destination core.
在一些实施例中,通过各片间路由模块和片上网络将数据从一芯片的源核传输至另一芯片的目的核,包括:In some embodiments, transmitting data from the source core of one chip to the destination core of another chip through the inter-chip routing modules and the on-chip network includes:
通过第一芯片的源核将数据传递至第一芯片的源核所在区块内的 片上网络节点;通过第一芯片的源核所在区块内的片上网络节点将数据传递至第一芯片的其他区块内的片上网络节点;通过第一芯片的其他区块内的片上网络节点将数据传递至与第二芯片连接的第一芯片的区块对应设置的片间路由模块;通过与第二芯片连接的第一芯片的区块对应设置的片间路由模块将数据传递至第二芯片的目的核所在区块对应设置的片间路由模块;通过第二芯片的目的核所在区块对应设置的片间路由模块将数据传递至第二芯片的目的核所在区块内的片上网络节点;通过第二芯片的目的核所在区块内的片上网络节点将数据传递至第二芯片的目的核。The data is transferred to the on-chip network node in the block where the source core of the first chip is located through the source core of the first chip; the data is transferred to the other on the first chip through the on-chip network node in the block where the source core of the first chip is located On-chip network nodes in a block; through on-chip network nodes in other blocks of the first chip to transmit data to the inter-chip routing module corresponding to the block of the first chip connected to the second chip; The inter-chip routing module corresponding to the block of the connected first chip transmits data to the inter-chip routing module corresponding to the block where the target core of the second chip is located; the chip corresponding to the block where the target core of the second chip is located The inter-routing module transfers the data to the on-chip network node in the block where the target core of the second chip is located; and transfers the data to the target core of the second chip through the on-chip network node in the block where the target core of the second chip is located.
具体数据路由可以为:数据首先通过第一芯片(Chip0)的源核传递至该源核所在区块内的片上网络节点,然后通过第一芯片(Chip0)其他区块内的片上网络节点传递至与第二芯片(Chip1)目的核所在区块连接的第一芯片(Chip0)上区块对应设置的片间路由模块,再通过第二芯片(Chip1)的目的核所在区块对应设置的片间路由模块后,最后通过目的核所在区块内的片上网络节点传递至目的核中,进行计算。The specific data routing can be as follows: data is first transferred to the on-chip network node in the block where the source core is located through the source core of the first chip (Chip0), and then transferred to the on-chip network node in other blocks of the first chip (Chip0) The inter-chip routing module corresponding to the block on the first chip (Chip0) connected to the block where the target core of the second chip (Chip1) is located, and then the inter-chip routing module corresponding to the block where the target core of the second chip (Chip1) is located After the routing module, it is finally transferred to the target core through the on-chip network node in the block where the target core is located for calculation.
在图3的具体示例中,第二芯片中目的核所在区块,也就是第二芯片中与第一芯片相连的区块(Chip1的bank3),第二芯片中目的核所在区块对应设置的片间路由模块与第二芯片中与第一芯片相连的区块对应设置的片间路由模块也是同一个片间路由模块(Chip1的CR3)。但应当理解,如果目的核是在第二芯片的其他区块中,则当数据传递到第二芯片中与第一芯片相连的区块对应设置的片间路由模块后,还可进一步按照以上数据在单个芯片内部传输的方式,传输到目的核(包括通过片间路由模块和片上网络间接传输,或通过片上网络直接传输),在此不再详细描述。In the specific example shown in Figure 3, the block in the second chip where the target core is located, that is, the block in the second chip that is connected to the first chip (bank3 of Chip1), and the block in the second chip where the target core is located is correspondingly set The inter-chip routing module and the inter-chip routing module corresponding to the block connected to the first chip in the second chip are also the same inter-chip routing module (CR3 of Chip1). However, it should be understood that if the target core is in another block of the second chip, after the data is transferred to the inter-chip routing module corresponding to the block connected to the first chip in the second chip, the above data can be further followed The method of transmission within a single chip, transmission to the target core (including indirect transmission through the inter-chip routing module and the on-chip network, or direct transmission through the on-chip network), will not be described in detail here.
同时,应当理解,如果源核所在的第一芯片与目的核所在的第二芯片并不直接连接,而是通过其他的芯片间接的连接,也是可行的;即数据传输过程中,还需要经过其他的芯片,且在每个其它芯片中,数据可通过片间路由模块传输,也可通过片上网络传输。At the same time, it should be understood that if the first chip where the source core is located and the second chip where the destination core is located are not directly connected, but indirectly connected through other chips, it is also feasible; that is, during the data transmission process, it is also necessary to pass through other chips. In each other chip, data can be transmitted through the inter-chip routing module or through the on-chip network.
在一些实施例中,每个芯片包括多个沿其周向设置的多个片间路由模块;所述通过片间路由模块实现区块间的数据传输包括:In some embodiments, each chip includes a plurality of inter-chip routing modules arranged along its circumference; the realization of data transmission between blocks through the inter-chip routing module includes:
数据在传输过程中,在多个片间路由模块中,数据按照预定的时针方向传输。In the process of data transmission, in multiple inter-chip routing modules, the data is transmitted in a predetermined clockwise direction.
本公开实施例中,每个芯片包括多个(如3个或以上)片间路由模块,而这些片间路由模块沿该芯片的周向分部为“一圈”,其中在周向上任意两个相邻片间路由模块间形成确定方向的数据连接,从而在这些片间路由模块中,数据是按照预定的时针方向(顺时针方向或逆时针方向)传输的,或者说形成一个“环形(loop)”。In the embodiment of the present disclosure, each chip includes a plurality of (for example, 3 or more) inter-chip routing modules, and these inter-chip routing modules are divided into "a circle" along the circumferential direction of the chip, wherein any two in the circumferential direction A data connection in a certain direction is formed between two adjacent inter-chip routing modules, so that in these inter-chip routing modules, data is transmitted in a predetermined clockwise direction (clockwise or counterclockwise), or in other words, forms a "ring ( loop)".
当然,应当理解,顺时针方向、逆时针方向是与“看”芯片的方向相关的,但不论从哪个方向“看”,各片间路由模块间的实际数据传输方向是不会改变的。Of course, it should be understood that the clockwise and counterclockwise directions are related to the direction of "seeing" the chip, but no matter which direction you "see" from, the actual data transmission direction between the inter-chip routing modules will not change.
例如,参照图1、图2,每个芯片包括4个片间路由模块,而该两个图中,数据可按照顺时针方向依次通过各多个片间路由模块,即从CR0传输至CR1,从CR1传输至CR2,从CR2传输至CR3,从CR3传输至CR0(该传输图中未用箭头示出);而不能是直接反向传播,如数据不能直接从CR0传输至CR3。For example, referring to Figures 1 and 2, each chip includes 4 inter-chip routing modules, and in these two figures, data can pass through multiple inter-chip routing modules in a clockwise direction, that is, transmitted from CR0 to CR1. Transfer from CR1 to CR2, from CR2 to CR3, and from CR3 to CR0 (not shown by arrows in the transfer diagram); it cannot be direct back propagation, for example, data cannot be directly transferred from CR0 to CR3.
当然,以上限定仅表示数据在一个芯片的各片间路由模块中的传输方向,而参照图3,一个芯片上的片间路由模块还可与其他芯片的片间路由模块相连,以将数据传递至其他芯片,而不同芯片的片间路由模块间的数据传输并不一定符合特定的时针方向。Of course, the above limitation only indicates the transmission direction of data in the inter-chip routing modules of a chip, and referring to Figure 3, the inter-chip routing module on one chip can also be connected with the inter-chip routing modules of other chips to transmit data To other chips, the data transmission between the inter-chip routing modules of different chips does not necessarily conform to a specific clockwise direction.
按照以上方式,只用按照单一方向布设片间路由模块间的传输线,可以有效节省芯片面积,突破制程的限制。According to the above method, only the transmission line between the inter-chip routing modules is arranged in a single direction, which can effectively save the chip area and break through the limitation of the manufacturing process.
尤其是,片间路由模块间的数据传输通常带宽大、延迟少,故即使仅按照预定的时针方向传输,也不会导致传输时间的明显增长。In particular, data transmission between inter-chip routing modules usually has a large bandwidth and low delay, so even if it is only transmitted in a predetermined clockwise direction, it will not cause a significant increase in transmission time.
当然,如果不限定数据传输方向(如相邻片间路由模块间的数据可“双向”传输),或者各片间路由模块间形成“网络状”连接,也是可行的,这样可进一步提升数据传输速度和众核系统的性能。Of course, if the data transmission direction is not limited (for example, the data between the adjacent inter-chip routing modules can be transmitted in "two-way"), or the "network-like" connection between the inter-chip routing modules is also feasible, which can further improve the data transmission Speed and performance of many-core systems.
本公开实施例的一种数据传输方法中,数据的整个传输过程可分为两个部分:AI模型部分和板卡(众核系统)数据处理部分;In a data transmission method of the embodiment of the present disclosure, the entire data transmission process can be divided into two parts: the AI model part and the board (many-core system) data processing part;
AI模型部分:AI模型的工作在服务器端(如服务器或计算机)中完成,AI模型会根据一定的规则训练得到数据路由信息(即最优的数据路由)。而服务器端会根据训练的结果,将要通过PCIE接口传送到板卡中的数据依据工具链的编译规则组合成“数据部分(128bits)+数据头(128bit)”的形式,并经由PCIE接口发往到板卡中。其中“数据部分”包括实际要传输的数据的内容,而“数据头”包含了训练得到的数据路由信息,以供众核系统根据该数据路由信息选择默认的数据路由。AI model part: The work of the AI model is completed on the server side (such as a server or a computer), and the AI model will be trained according to certain rules to obtain data routing information (that is, the optimal data routing). According to the results of the training, the server will combine the data to be transmitted to the board through the PCIE interface into the form of "data part (128bits) + data header (128bit)" according to the compilation rules of the tool chain, and send it to the board via the PCIE interface To the board. The "data part" includes the content of the actual data to be transmitted, and the "data header" contains the data routing information obtained by training, so that the many-core system can select the default data routing based on the data routing information.
板卡数据处理部分:板卡的PCIE接口接收服务器发送的数据(外部数据),转化为片间路由模块(CR)可接收和处理的数据,片间路由模块接收数据并进行解包和打包数据的工作,然后进行下一步的数据传输,如将数据发送给片上网络节点(NoC)或片间路由模块等,并最终传输到目的核中进行处理。其中,实际采用的数据路由可参考前述,如默认采用“数据头”的数据路由信息中的最佳的数据路由,而在拥堵时选择其他数据路由。Board data processing part: The PCIE interface of the board receives the data (external data) sent by the server and converts it into data that can be received and processed by the inter-chip routing module (CR). The inter-chip routing module receives the data and unpacks and packs the data. Then, the next step of data transmission is carried out, such as sending the data to the network node on chip (NoC) or inter-chip routing module, etc., and finally transmitted to the target core for processing. Among them, the actual data route used can refer to the foregoing, for example, the best data route in the data route information of the "data header" is used by default, and other data routes are selected when congested.
第三方面,本公开实施例提供一种板卡(如印刷线路板),板卡上集成有本公开实施例的任意一种片上网络互联结构。In the third aspect, the embodiments of the present disclosure provide a board (such as a printed circuit board) on which any one of the on-chip network interconnection structures of the embodiments of the present disclosure is integrated.
第四方面,本公开实施例提供一种电子设备,包括存储器和处理器,其中,存储器用于存储一条或多条计算机指令,一条或多条计算机指令能被处理器执行以实现本公开实施例的任意一种数据传输方法。In a fourth aspect, an embodiment of the present disclosure provides an electronic device including a memory and a processor, where the memory is used to store one or more computer instructions, and one or more computer instructions can be executed by the processor to implement the embodiments of the present disclosure Any of the data transmission methods.
在一些实施例中,电子设备可为服务器、终端等。In some embodiments, the electronic device may be a server, a terminal, or the like.
在一些实施例中,电子设备包括:至少一个处理器;与至少一个处理器通信连接的存储器;以及与其他外界的存储介质通信连接的通 信组件,通信组件在处理器的控制下接收和发送数据;其中,存储器存储有可被至少一个处理器执行的指令,指令被至少一个处理器执行以实现上述实施例中的众核系统的数据传输方法。In some embodiments, the electronic device includes: at least one processor; a memory communicatively connected with the at least one processor; and a communication component communicatively connected with other external storage media, the communication component receiving and sending data under the control of the processor Wherein, the memory stores instructions that can be executed by at least one processor, and the instructions are executed by at least one processor to implement the data transmission method of the many-core system in the foregoing embodiment.
在一些实施例中,存储器作为一种非易失性计算机可读存储介质,可用于存储非易失性软件程序、非易失性计算机可执行程序以及模块。处理器通过运行存储在存储器中的非易失性软件程序、非易失性计算机可执行程序以及模块,从而执行电子设备的各种功能应用以及数据处理,即实现上述众核系统的数据传输方法。In some embodiments, the memory, as a non-volatile computer-readable storage medium, can be used to store non-volatile software programs, non-volatile computer-executable programs, and modules. The processor executes various functional applications and data processing of the electronic device by running non-volatile software programs, non-volatile computer executable programs and modules stored in the memory, that is, realizing the data transmission method of the above-mentioned many-core system .
在一些实施例中,存储器可以包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需要的应用程序;存储数据区可存储选项列表等。In some embodiments, the memory may include a program storage area and a data storage area, where the program storage area may store an operating system and an application program required by at least one function; the storage data area may store a list of options and the like.
在一些实施例中,存储器可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他非易失性固态存储器件。In some embodiments, the memory may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, or other non-volatile solid-state storage devices.
在一些实施例中,存储器可选的包括相对于处理器远程设置的存储器,这些远程存储器可以通过网络连接至外接设备。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。In some embodiments, the memory may optionally include a memory remotely arranged with respect to the processor, and these remote memories may be connected to an external device through a network. Examples of the aforementioned networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
在一些实施例中,一个或者多个模块存储在存储器中,当被一个或者多个模块被处理器执行时,执行上述任意方法实施例中的众核系统的数据传输方法。In some embodiments, one or more modules are stored in the memory, and when the one or more modules are executed by the processor, the data transmission method of the many-core system in any of the foregoing method embodiments is executed.
上述电子设备可执行本公开实施例所提供的众核系统的数据传输方法,具备执行方法相应的功能模块和有益效果,可参见本公开实施例所提供的众核系统的数据传输方法。The above-mentioned electronic device can execute the data transmission method of the many-core system provided in the embodiment of the present disclosure, and has the corresponding functional modules and beneficial effects of the execution method. For details, refer to the data transmission method of the many-core system provided in the embodiment of the present disclosure.
第五方面,本公开实施例提供一种计算机可读存储介质,计算机可读存储介质上存储计算机程序指令,计算机程序指令在被处理器执行时实现本公开实施例的任意一种数据传输方法。In a fifth aspect, embodiments of the present disclosure provide a computer-readable storage medium on which computer program instructions are stored, and the computer program instructions implement any data transmission method in the embodiments of the present disclosure when executed by a processor.
在一些实施例中,计算机可读存储介质用于存储计算机可读程序, 计算机可读程序用于供计算机执行上述部分或全部的众核系统的数据传输方法的实施例。In some embodiments, the computer-readable storage medium is used to store a computer-readable program, and the computer-readable program is used for a computer to execute some or all of the above-mentioned embodiments of the many-core system data transmission method.
在一些实施例中,用于被执行的计算机可读程序可以采用一种或多种编程语言的任意组合来编写,编程语言包括:面向对象的编程语言如C++等;以及常规过程编程语言如“C”编程语言或类似的汇编语言。In some embodiments, the computer-readable program to be executed can be written in any combination of one or more programming languages. The programming languages include: object-oriented programming languages such as C++, etc.; and conventional process programming languages such as " C" programming language or similar assembly language.
本领域技术人员可以理解,上述实施例的数据传输方法中的全部或部分步骤是可以通过程序来指令相关的硬件来实现的,该程序存储在一个存储介质中,包括若干指令用以使得一个设备(可以是单片机,芯片等)或处理器(processor)执行本公开实施例的数据传输方法的全部或部分步骤。而前述的存储介质包括:片上内存或Flash等各种可以存储程序代码的介质。Those skilled in the art can understand that all or part of the steps in the data transmission method of the above embodiment can be implemented by instructing relevant hardware through a program. The program is stored in a storage medium and includes several instructions to enable a device (It may be a single-chip microcomputer, a chip, etc.) or a processor (processor) executes all or part of the steps of the data transmission method of the embodiment of the present disclosure. The aforementioned storage media include: on-chip memory or Flash and other media that can store program codes.
下面将通过三个具体实施例并结合附图,对本公开实施例进行进一步说明。Hereinafter, the embodiments of the present disclosure will be further described through three specific embodiments in conjunction with the accompanying drawings.
实施例1,采用本公开实施例的片上网络互联结构实现单个芯片的核间通信,如图1所示,数据可以通过两种数据路由传输和发送到目的核中。In Embodiment 1, the on-chip network interconnection structure of the embodiment of the present disclosure is adopted to realize the communication between the cores of a single chip. As shown in FIG. 1, data can be transmitted and sent to the target core through two kinds of data routes.
第一种:PCIE->CR0->NoC(bank0)->目标Core(目的核)。具体的,服务器Server的数据包(PCIE协议数据包)通过区块bank0对应设置的片间路由模块CR0,再通过区块bank0内的片上网络节点传递至目标Core中,进行计算。The first type: PCIE->CR0->NoC(bank0)->target Core (target core). Specifically, the data packet (PCIE protocol data packet) of the server Server passes through the inter-chip routing module CR0 corresponding to the block bank0, and then passes the on-chip network node in the block bank0 to the target Core for calculation.
第二种:PCIE->CR0->CR1->CR2->CR3->NoC(bank3)->NoC(bank0)->目标Core(目的核)。具体的,服务器Server的数据包(PCIE协议数据包)首先通过区块bank0对应设置的片间路由模块CR0,然后依次通过区块bank1、区块bank2和区块bank3对应设置的片间路由模块CR1、片间路由模块CR2和片间路由模块CR3,再通过区块bank3内的片上网络节点传递至bank0内的片上网络节点,最后传递至目标 Core中,进行计算。The second type: PCIE->CR0->CR1->CR2->CR3->NoC(bank3)->NoC(bank0)->target Core (destination core). Specifically, the data packet (PCIE protocol data packet) of the server server first passes through the inter-chip routing module CR0 corresponding to block bank0, and then sequentially passes through the inter-chip routing module CR1 corresponding to block bank1, block bank2, and block bank3. The inter-chip routing module CR2 and the inter-chip routing module CR3 are then passed to the on-chip network node in bank0 through the on-chip network node in block bank3, and finally to the target Core for calculation.
如前述,服务器会根据AI模型训练的结果,在两种数据路由中选择一种,通过PCIE接口将数据发到板卡中,例如AI模型训练结果默认选择第一种数据路由,然而当第一种数据路由中例如某一片上网络节点数据需要等待才能传输时,数据会通过第二种数据路由传输至目的核中。As mentioned above, the server will select one of the two data routes according to the results of the AI model training, and send the data to the board through the PCIE interface. For example, the AI model training results choose the first data route by default. In this type of data routing, for example, when a certain on-chip network node data needs to wait before being transmitted, the data will be transmitted to the destination core through the second type of data routing.
实施例2,采用本公开实施例的片上网络互联结构实现单个芯片的核间通信,如图2所示,数据可以通过两种数据路由传输和发送到目的核中。In the second embodiment, the on-chip network interconnection structure of the embodiment of the present disclosure is adopted to realize the communication between the cores of a single chip. As shown in FIG. 2, data can be transmitted and sent to the target core through two kinds of data routes.
第一种:PCIE->CR0->NoC(bank0)->NoC(bank3)->目标Core(目的核)。具体的,服务器Server的数据包(PCIE协议数据包)通过区块bank0对应设置的片间路由模块CR0,再通过区块bank0内的片上网络节点传递至区块bank3的片上网络节点,最后传递至目标Core中,进行计算。The first type: PCIE->CR0->NoC(bank0)->NoC(bank3)->target Core (target core). Specifically, the data packet (PCIE protocol data packet) of the server Server passes through the inter-chip routing module CR0 corresponding to the block bank0, and then passes the on-chip network node in the block bank0 to the on-chip network node in the block bank3, and finally to the Perform calculations in the target Core.
第二种:PCIE->CR0->CR1->CR2->CR3->NoC(bank3)->目标Core(目的核)。具体的,服务器Server的数据包(PCIE协议数据包)首先通过区块bank0对应设置的片间路由模块CR0,然后依次通过区块bank1、区块bank2和区块bank3对应设置的片间路由模块CR1、片间路由模块CR2和片间路由模块CR3,最后通过区块bank3内的片上网络节点传递至目标Core中,进行计算。The second type: PCIE->CR0->CR1->CR2->CR3->NoC(bank3)->target Core (target core). Specifically, the data packet (PCIE protocol data packet) of the server server first passes through the inter-chip routing module CR0 corresponding to block bank0, and then sequentially passes through the inter-chip routing module CR1 corresponding to block bank1, block bank2, and block bank3. , Inter-chip routing module CR2 and inter-chip routing module CR3, and finally passed to the target Core through the on-chip network node in block bank3 for calculation.
实施例3,采用本公开实施例的片上网络互联结构实现不同芯片的核间通信,如图3所示,数据可以通过两种数据路由传输和发送到目的核中。In Embodiment 3, the on-chip network interconnection structure of the embodiment of the present disclosure is adopted to realize the communication between cores of different chips. As shown in FIG. 3, data can be transmitted and sent to the target core through two kinds of data routes.
第一种:Core(Chip0src)->NoC(Chip0 bank0)->CR0(Chip0 bank0)->CR1(Chip0 bank1)->CR2(Chip0 bank2)->CR3(Chip1 bank3)->NoC(Chip1 bank3)->Core(Chip1 dst)。具体的,第一芯片Chip0的源核Core(Chip0src)发出的数据首先通过第一芯片Chip0的区块 bank0内的片上网络节点传递至第一芯片Chip0的区块bank0所对应设置的片间路由模块CR0,然后依次通过第一芯片Chip0的区块bank1和区块bank2所对应设置的片间路由模块CR1和片间路由模块CR2,再通过第二芯片Chip1的区块bank3所对应设置的片间路由模块CR3,最后通过第二芯片Chip1的区块bank3内的片上网络节点传递至第二芯片Chip1的目的核Core(Chip1 dst)中,进行计算。The first type: Core(Chip0src)->NoC(Chip0bank0)->CR0(Chip0bank0)->CR1(Chip0bank1)->CR2(Chip0bank2)->CR3(Chip1bank3)->NoC(Chip1bank3) ->Core(Chip1 dst). Specifically, the data sent by the source core Core (Chip0src) of the first chip Chip0 is first transmitted to the inter-chip routing module corresponding to the block bank0 of the first chip Chip0 through the on-chip network node in the block bank0 of the first chip Chip0 CR0, and then pass through the inter-chip routing module CR1 and the inter-chip routing module CR2 corresponding to the bank1 and bank2 of the first chip Chip0, and then pass the inter-chip routing corresponding to the bank3 of the second chip Chip1 The module CR3 is finally transferred to the target core Core (Chip1 dst) of the second chip Chip1 through the on-chip network node in the block bank3 of the second chip Chip1 for calculation.
第二种:Core(Chip0src)->NoC(Chip0 bank0)->NoC(Chip0 bank3)->NoC(Chip0 bank2)->CR2(Chip0 bank2)->CR3(Chip1 bank3)->NoC(Chip1 bank3)->Core(Chip1 dst)。具体的,第一芯片Chip0的源核Core(Chip0src)发出的数据首先通过第一芯片Chip0的区块bank0内的片上网络节点传递至第一芯片Chip0的区块bank3内的片上网络节点,然后传递至第一芯片Chip0的区块bank2内的片上网络节点,再传递至第一芯片Chip0的区块bank2所对应设置的片间路由模块CR2,再传递至第二芯片Chip1的区块bank3所对应设置的片间路由模块CR3,最后通过第二芯片Chip1的区块bank3内的片上网络节点传递至第二芯片Chip1的目的核Core(Chip1 dst)中,进行计算。The second type: Core(Chip0src)->NoC(Chip0 bank0)->NoC(Chip0 bank3)->NoC(Chip0 bank2)->CR2(Chip0 bank2)->CR3(Chip1 bank3)->NoC(Chip1 bank3) ->Core(Chip1 dst). Specifically, the data sent by the source core Core (Chip0src) of the first chip Chip0 is first transmitted to the on-chip network node in the block bank3 of the first chip Chip0 through the on-chip network node in the block bank0 of the first chip Chip0, and then transmitted To the on-chip network node in the bank2 of the first chip Chip0, and then to the inter-chip routing module CR2 corresponding to the bank2 of the first chip Chip0, and then to the corresponding setting of the bank3 of the second chip Chip1 The inter-chip routing module CR3 of the second chip is finally transferred to the target core Core (Chip1 dst) of the second chip Chip1 through the on-chip network node in the block bank3 of the second chip Chip1 for calculation.
在此处所提供的说明书中,说明了大量具体细节。然而,能够理解,本公开的实施例可以在没有这些具体细节的情况下实践。在一些实例中,并未详细示出公知的方法、结构和技术,以便不模糊对本说明书的理解。In the instructions provided here, a lot of specific details are explained. However, it can be understood that the embodiments of the present disclosure may be practiced without these specific details. In some instances, well-known methods, structures, and technologies are not shown in detail, so as not to obscure the understanding of this specification.
此外,本领域普通技术人员能够理解,尽管在此所述的一些实施例包括其他实施例中所包括的某些特征而不是其他特征,但是不同实施例的特征的组合意味着处于本公开的范围之内并且形成不同的实施例。例如,在权利要求书中,所要求保护的实施例的任意之一都可以以任意的组合方式来使用。In addition, those of ordinary skill in the art can understand that although some embodiments described herein include certain features included in other embodiments but not other features, the combination of features of different embodiments means that they are within the scope of the present disclosure. Within and form different embodiments. For example, in the claims, any one of the claimed embodiments can be used in any combination.
本领域技术人员应理解,尽管已经参考示例性实施例描述了本公开,但是在不脱离本公开实施例的范围的情况下,可进行各种改变并 可用等同物替换其元件。另外,在不脱离本公开实施例的实质范围的情况下,可进行许多修改以使特定情况或材料适应本公开的教导。因此,本公开不限于所公开的特定实施例,而是本公开将包括落入所附权利要求范围内的所有实施例。Those skilled in the art should understand that although the present disclosure has been described with reference to exemplary embodiments, various changes may be made and equivalents may be substituted for its elements without departing from the scope of the embodiments of the present disclosure. In addition, without departing from the essential scope of the embodiments of the present disclosure, many modifications may be made to adapt a particular situation or material to the teaching of the present disclosure. Therefore, the present disclosure is not limited to the specific embodiments disclosed, but the present disclosure will include all embodiments falling within the scope of the appended claims.

Claims (26)

  1. 一种众核系统的片上网络互联结构,其中,众核系统包括至少一个芯片,每个芯片集成多核,所述片上网络互联结构包括:A many-core system on-chip network interconnection structure, wherein the many-core system includes at least one chip, each chip integrates multiple cores, and the on-chip network interconnection structure includes:
    位于芯片上的至少两个区块,每个区块包括至少一个核;At least two blocks located on the chip, each block includes at least one core;
    与每个区块对应设置的片间路由模块,每个片间路由模块配置为与至少一个其他片间路由模块交互;以及,An inter-chip routing module corresponding to each block, and each inter-chip routing module is configured to interact with at least one other inter-chip routing module; and,
    配置为与各片间路由模块交互及交互各核间数据的片上网络。It is configured as an on-chip network that interacts with each inter-chip routing module and exchanges data between each core.
  2. 根据权利要求1所述的片上网络互联结构,其中,The on-chip network interconnection structure according to claim 1, wherein:
    至少部分相邻区块对应的片间路由模块配置为进行交互。At least part of the inter-chip routing modules corresponding to adjacent blocks are configured to interact.
  3. 根据权利要求2所述的片上网络互联结构,其中,至少部分区块连接用于传递外部数据的数据接口;The on-chip network interconnection structure according to claim 2, wherein at least part of the blocks are connected to a data interface for transferring external data;
    所述片上网络互联结构配置为实现外部数据的接收处理及处理后的数据在单个芯片各核间的传输。The on-chip network interconnection structure is configured to realize the receiving and processing of external data and the transmission of the processed data between the cores of a single chip.
  4. 根据权利要求3所述的片上网络互联结构,其中,与数据接口相连区块对应设置的片间路由模块,用于接收并处理所述数据接口传递的外部数据,并将处理后的数据传递至目的核所在区块内的片上网络节点;The on-chip network interconnection structure according to claim 3, wherein the inter-chip routing module corresponding to the block connected to the data interface is used to receive and process the external data transmitted by the data interface, and transmit the processed data to The on-chip network node in the block where the target core is located;
    所述目的核所在区块内的片上网络节点,用于接收所述处理后的数据,并将所述处理后的数据传递至目的核;The on-chip network node in the block where the target core is located is used to receive the processed data and transfer the processed data to the target core;
    其中,所述与数据接口相连区块为目的核所在区块。Wherein, the block connected with the data interface is the block where the target core is located.
  5. 根据权利要求3所述的片上网络互联结构,其中,与数据接口相连区块对应设置的片间路由模块,用于接收并处理所述数据接口传 递的外部数据,并将处理后的数据传递至其他片间路由模块,直至目的核所在区块相邻区块对应设置的片间路由模块;The on-chip network interconnection structure according to claim 3, wherein the inter-chip routing module corresponding to the block connected to the data interface is used to receive and process the external data transmitted by the data interface, and transmit the processed data to Other inter-chip routing modules, up to the inter-chip routing module corresponding to the adjacent block of the block where the target core is located;
    所述目的核所在区块相邻区块对应设置的片间路由模块,用于接收所述处理后的数据,并将所述处理后的数据传递至目的核所在区块相邻区块内的片上网络节点;The inter-chip routing module corresponding to the adjacent block of the block where the target core is located is used to receive the processed data and transfer the processed data to the adjacent block where the target core is located. On-chip network node;
    所述目的核所在区块相邻区块内的片上网络节点,用于接收所述处理后的数据,并将所述处理后的数据传递至目的核所在区块内的片上网络节点;The on-chip network node in a block adjacent to the target core is used to receive the processed data and transfer the processed data to the on-chip network node in the block where the target core is located;
    所述目的核所在区块内的片上网络节点,用于接收所述处理后的数据,并将所述处理后的数据传递至目的核;The on-chip network node in the block where the target core is located is used to receive the processed data and transfer the processed data to the target core;
    其中,所述与数据接口相连区块为目的核所在区块。Wherein, the block connected with the data interface is the block where the target core is located.
  6. 根据权利要求3所述的片上网络互联结构,其中,与数据接口相连区块对应设置的片间路由模块,用于接收并处理所述数据接口传递的外部数据,并将处理后的数据传递至与数据接口相连区块内的片上网络节点;The on-chip network interconnection structure according to claim 3, wherein the inter-chip routing module corresponding to the block connected to the data interface is used to receive and process the external data transmitted by the data interface, and transmit the processed data to The on-chip network node in the block connected with the data interface;
    所述与数据接口相连区块内的片上网络节点,用于接收所述处理后的数据,并将所述处理后的数据传递至目的核所在区块内的片上网络节点;The on-chip network node in the block connected to the data interface is used to receive the processed data and transfer the processed data to the on-chip network node in the block where the target core is located;
    所述目的核所在区块内的片上网络节点,用于接收所述处理后的数据,并将所述处理后的数据传递至目的核;The on-chip network node in the block where the target core is located is used to receive the processed data and transfer the processed data to the target core;
    其中,所述与数据接口相连区块不是目的核所在区块。Wherein, the block connected with the data interface is not the block where the target core is located.
  7. 根据权利要求3所述的片上网络互联结构,其中,与数据接口相连区块对应设置的片间路由模块,用于接收并处理所述数据接口传递的外部数据,并将处理后的数据传递至其他片间路由模块,直至目的核所在区块对应设置的片间路由模块;The on-chip network interconnection structure according to claim 3, wherein the inter-chip routing module corresponding to the block connected to the data interface is used to receive and process the external data transmitted by the data interface, and transmit the processed data to Other inter-chip routing modules, up to the inter-chip routing module corresponding to the block where the target core is located;
    所述目的核所在区块对应设置的片间路由模块,用于接收所述处 理后的数据,并将所述处理后的数据传递至目的核所在区块内的片上网络节点;The inter-chip routing module corresponding to the block where the target core is located is used to receive the processed data and transfer the processed data to the on-chip network node in the block where the target core is located;
    所述目的核所在区块内的片上网络节点,用于接收所述处理后的数据,并将所述处理后的数据传递至目的核;The on-chip network node in the block where the target core is located is used to receive the processed data and transfer the processed data to the target core;
    其中,所述与数据接口相连区块不是目的核所在区块。Wherein, the block connected with the data interface is not the block where the target core is located.
  8. 根据权利要求4-7中任意一项所述的片上网络互联结构,其中,所述目的核包括:7. The on-chip network interconnection structure according to any one of claims 4-7, wherein the target core comprises:
    路由接收模块,用于接收数据;The routing receiving module is used to receive data;
    计算模块,用于根据接收到的数据进行计算。The calculation module is used to perform calculations based on the received data.
  9. 根据权利要求1所述的片上网络互联结构,其中,所述片上网络互联结构配置为实现数据在多个芯片各核间的传输。The on-chip network interconnection structure according to claim 1, wherein the on-chip network interconnection structure is configured to realize data transmission between cores of multiple chips.
  10. 根据权利要求9所述的片上网络互联结构,其中,第一芯片的一个区块对应设置的片间路由模块与第二芯片的一个区块对应设置的片间路由模块相连,以使第一芯片的该区块与第二芯片的该区块连接;The on-chip network interconnection structure according to claim 9, wherein the inter-chip routing module corresponding to a block of the first chip is connected to the inter-chip routing module corresponding to a block of the second chip, so that the first chip The block of is connected to the block of the second chip;
    第一芯片的源核,用于将数据传递至所述第一芯片的源核所在区块内的片上网络节点;The source core of the first chip is used to transfer data to the on-chip network node in the block where the source core of the first chip is located;
    所述第一芯片的源核所在区块内的片上网络节点,用于接收所述数据并将所述数据传递至所述第一芯片的源核所在区块对应设置的片间路由模块;An on-chip network node in a block where the source core of the first chip is located, configured to receive the data and transfer the data to an inter-chip routing module corresponding to the block where the source core of the first chip is located;
    所述第一芯片的源核所在区块对应设置的片间路由模块,用于接收所述数据并将所述数据传递至第一芯片的其他片间路由模块,直至与第二芯片连接的第一芯片的区块对应设置的片间路由模块;The inter-chip routing module corresponding to the block where the source core of the first chip is located is used to receive the data and transfer the data to other inter-chip routing modules of the first chip until the first chip connected to the second chip Inter-chip routing module corresponding to the block of a chip;
    所述与第二芯片连接的第一芯片的区块对应设置的片间路由模块, 用于接收所述数据并将所述数据传递至第二芯片的目的核所在区块对应设置的片间路由模块;The inter-chip routing module corresponding to the block of the first chip connected to the second chip is configured to receive the data and transfer the data to the inter-chip routing corresponding to the block where the target core of the second chip is located Module
    所述第二芯片的目的核所在区块对应设置的片间路由模块,用于接收所述数据并将所述数据传递至所述第二芯片的目的核所在区块内的片上网络节点;An inter-chip routing module corresponding to the block where the target core of the second chip is located, configured to receive the data and transfer the data to the on-chip network node in the block where the target core of the second chip is located;
    所述第二芯片的目的核所在区块内的片上网络节点,用于接收所述数据并将所述数据传递至所述第二芯片的目的核。The on-chip network node in the block where the target core of the second chip is located is used to receive the data and transfer the data to the target core of the second chip.
  11. 根据权利要求9所述的片上网络互联结构,其中,第一芯片的一个区块对应设置的片间路由模块与第二芯片的一个区块对应设置的片间路由模块相连,以使第一芯片的该区块与第二芯片的该区块连接;The on-chip network interconnection structure according to claim 9, wherein the inter-chip routing module corresponding to a block of the first chip is connected to the inter-chip routing module corresponding to a block of the second chip, so that the first chip The block of is connected to the block of the second chip;
    第一芯片的源核,用于将数据传递至所述第一芯片的源核所在区块内的片上网络节点;The source core of the first chip is used to transfer data to the on-chip network node in the block where the source core of the first chip is located;
    所述第一芯片的源核所在区块内的片上网络节点,用于接收所述数据并将所述数据传递至第一芯片的其他区块内的片上网络节点;The on-chip network node in the block where the source core of the first chip is located, configured to receive the data and transfer the data to the on-chip network node in other blocks of the first chip;
    所述第一芯片的其他区块内的片上网络节点,用于接收所述数据并将所述数据传递至与第二芯片连接的第一芯片的区块对应设置的片间路由模块;The on-chip network nodes in other blocks of the first chip are used to receive the data and transmit the data to the inter-chip routing module corresponding to the block of the first chip connected to the second chip;
    所述与第二芯片连接的第一芯片的区块对应设置的片间路由模块,用于接收所述数据并将所述数据传递至第二芯片的目的核所在区块对应设置的片间路由模块;The inter-chip routing module corresponding to the block of the first chip connected to the second chip is used to receive the data and transfer the data to the inter-chip routing corresponding to the block where the target core of the second chip is located Module
    所述第二芯片的目的核所在区块对应设置的片间路由模块,用于接收所述数据并将所述数据传递至所述第二芯片的目的核所在区块内的片上网络节点;An inter-chip routing module corresponding to the block where the target core of the second chip is located, configured to receive the data and transfer the data to the on-chip network node in the block where the target core of the second chip is located;
    所述第二芯片的目的核所在区块内的片上网络节点,用于接收所述数据并将所述数据传递至所述第二芯片的目的核。The on-chip network node in the block where the target core of the second chip is located is used to receive the data and transfer the data to the target core of the second chip.
  12. 一种数据传输方法,应用于众核系统,其中,所述众核系统包括至少一个芯片,每个芯片集成多核,每个芯片设置至少两个区块,每个区块包括至少一个核,每个区块对应设置片间路由模块,所述数据传输方法包括:A data transmission method applied to a many-core system, wherein the many-core system includes at least one chip, each chip integrates multiple cores, each chip is provided with at least two blocks, each block includes at least one core, and each chip integrates multiple cores. Inter-chip routing modules are set corresponding to each block, and the data transmission method includes:
    通过所述片间路由模块实现区块间的数据传输,通过片上网络实现与片间路由模块的数据传输及数据在各核间的传输。Data transmission between blocks is realized through the inter-chip routing module, data transmission with the inter-chip routing module and data transmission between the cores are realized through the on-chip network.
  13. 根据权利要求12所述的数据传输方法,其中,所述通过所述片间路由模块实现区块间的数据传输包括:The data transmission method according to claim 12, wherein the realization of data transmission between blocks through the inter-chip routing module comprises:
    所述通过所述片间路由模块实现相邻区块间的数据传输。The data transmission between adjacent blocks is realized by the inter-chip routing module.
  14. 根据权利要求13所述的数据传输方法,其中,至少部分区块连接用于传递外部数据的数据接口;通过所述片间路由模块实现区块间的数据传输,通过片上网络实现与片间路由模块的数据传输及数据在各核间的传输,包括:The data transmission method according to claim 13, wherein at least part of the blocks are connected to a data interface for transferring external data; the inter-chip routing module is used to realize the data transmission between the blocks, and the inter-chip routing is realized through the on-chip network The data transmission of the module and the transmission of data between the cores include:
    通过与数据接口相连区块对应设置的片间路由模块接收外部数据并处理,通过片上网络和/或其他片间路由模块将处理后的数据传输至目的核,实现外部数据的接收处理及处理后的数据在单个芯片各核间的传输。The external data is received and processed through the inter-chip routing module corresponding to the block connected to the data interface, and the processed data is transmitted to the target core through the on-chip network and/or other inter-chip routing modules to realize the reception and processing of external data The data is transmitted between the cores of a single chip.
  15. 根据权利要求14所述的数据传输方法,其中,通过与数据接口相连区块对应设置的片间路由模块接收外部数据并处理,通过片上网络和/或其他片间路由模块将处理后的数据传输至目的核,包括:The data transmission method according to claim 14, wherein the external data is received and processed through the inter-chip routing module corresponding to the block connected to the data interface, and the processed data is transmitted through the on-chip network and/or other inter-chip routing modules To the target core, including:
    通过与数据接口相连区块对应设置的片间路由模块接收所述外部数据并处理;Receiving and processing the external data through an inter-chip routing module corresponding to the block connected to the data interface;
    通过所述与数据接口相连区块对应设置的片间路由模块将处理后的数据传递至目的核所在区块内的片上网络节点;Transfer the processed data to the on-chip network node in the block where the target core is located through the inter-chip routing module corresponding to the block connected to the data interface;
    通过所述目的核所在区块内的片上网络节点将所述处理后的数据 传递至目的核;Transfer the processed data to the target core through the on-chip network node in the block where the target core is located;
    其中,所述与数据接口相连区块为目的核所在区块。Wherein, the block connected with the data interface is the block where the target core is located.
  16. 根据权利要求14所述的数据传输方法,其中,通过与数据接口相连区块对应设置的片间路由模块接收外部数据并处理,通过片上网络和/或其他片间路由模块将处理后的数据传输至目的核,包括:The data transmission method according to claim 14, wherein the external data is received and processed through the inter-chip routing module corresponding to the block connected to the data interface, and the processed data is transmitted through the on-chip network and/or other inter-chip routing modules To the target core, including:
    通过与数据接口相连区块对应设置的片间路由模块接收所述外部数据并处理;Receiving and processing the external data through an inter-chip routing module corresponding to the block connected to the data interface;
    通过所述与数据接口相连区块对应设置的片间路由模块将处理后的数据传递至其他片间路由模块,直至目的核所在区块相邻区块对应设置的片间路由模块;Transfer the processed data to other inter-chip routing modules through the inter-chip routing module corresponding to the block connected to the data interface until the inter-chip routing module corresponding to the adjacent block of the block where the target core is located;
    通过所述目的核所在区块相邻区块对应设置的片间路由模块将所述处理后的数据传递至目的核所在区块相邻区块内的片上网络节点;Transmitting the processed data to the on-chip network node in the adjacent block of the target core through the inter-chip routing module corresponding to the adjacent block of the target core;
    通过所述目的核所在区块相邻区块内的片上网络节点将所述处理后的数据传递至目的核所在区块内的片上网络节点;Transferring the processed data to the on-chip network node in the block where the target core is located through the on-chip network node in the adjacent block of the block where the target core is located;
    通过所述目的核所在区块内的片上网络节点将所述处理后的数据传递至目的核;Transfer the processed data to the target core through the on-chip network node in the block where the target core is located;
    其中,所述与数据接口相连区块为目的核所在区块。Wherein, the block connected with the data interface is the block where the target core is located.
  17. 根据权利要求14所述的数据传输方法,其中,通过与数据接口相连区块对应设置的片间路由模块接收外部数据并处理,通过片上网络和/或其他片间路由模块将处理后的数据传输至目的核,包括:The data transmission method according to claim 14, wherein the external data is received and processed through the inter-chip routing module corresponding to the block connected to the data interface, and the processed data is transmitted through the on-chip network and/or other inter-chip routing modules To the target core, including:
    通过与数据接口相连区块对应设置的片间路由模块接收所述外部数据并处理;Receiving and processing the external data through an inter-chip routing module corresponding to the block connected to the data interface;
    通过所述与数据接口相连区块对应设置的片间路由模块将处理后的数据传递至与数据接口相连区块内的片上网络节点;Transfer the processed data to the on-chip network node in the block connected to the data interface through the inter-chip routing module corresponding to the block connected to the data interface;
    通过所述与数据接口相连区块内的片上网络节点将所述处理后的 数据传递至目的核所在区块内的片上网络节点;Transferring the processed data to the on-chip network node in the block where the target core is located through the on-chip network node in the block connected to the data interface;
    通过所述目的核所在区块内的片上网络节点将所述处理后的数据传递至目的核;Transfer the processed data to the target core through the on-chip network node in the block where the target core is located;
    其中,所述与数据接口相连区块不是目的核所在区块。Wherein, the block connected with the data interface is not the block where the target core is located.
  18. 根据权利要求14所述的数据传输方法,其中,通过与数据接口相连区块对应设置的片间路由模块接收外部数据并处理,通过片上网络和/或其他片间路由模块将处理后的数据传输至目的核,包括:The data transmission method according to claim 14, wherein the external data is received and processed through the inter-chip routing module corresponding to the block connected to the data interface, and the processed data is transmitted through the on-chip network and/or other inter-chip routing modules To the target core, including:
    通过与数据接口相连区块对应设置的片间路由模块接收所述外部数据并处理;Receiving and processing the external data through an inter-chip routing module corresponding to the block connected to the data interface;
    通过所述与数据接口相连区块对应设置的片间路由模块将处理后的数据传递至其他片间路由模块,直至目的核所在区块对应设置的片间路由模块;Transfer the processed data to other inter-chip routing modules through the inter-chip routing module corresponding to the block connected to the data interface until the inter-chip routing module corresponding to the block where the target core is located;
    通过所述目的核所在区块对应设置的片间路由模块将所述处理后的数据传递至目的核所在区块内的片上网络节点;Transmitting the processed data to the on-chip network node in the block where the target core is located through the inter-chip routing module corresponding to the block where the target core is located;
    通过所述目的核所在区块内的片上网络节点将所述处理后的数据传递至目的核;Transfer the processed data to the target core through the on-chip network node in the block where the target core is located;
    其中,所述与数据接口相连区块不是目的核所在区块。Wherein, the block connected with the data interface is not the block where the target core is located.
  19. 根据权利要求14-18中任意一项所述的数据传输方法,其中,还包括:The data transmission method according to any one of claims 14-18, further comprising:
    目的核内的路由接收模块接收数据,目的核内的计算模块根据接收到的数据进行计算。The routing receiving module in the destination core receives the data, and the calculation module in the destination core performs calculations based on the received data.
  20. 根据权利要求12所述的数据传输方法,其中,通过所述片间路由模块实现区块间的数据传输,通过片上网络实现与片间路由模块的数据传输及数据在各核间的传输,包括:The data transmission method according to claim 12, wherein the data transmission between the blocks is realized by the inter-chip routing module, the data transmission with the inter-chip routing module and the data transmission between the cores are realized through the on-chip network, including :
    通过各片间路由模块和片上网络将数据从一芯片的源核传输至另一芯片的目的核,实现数据在多个芯片各核间的传输。The data is transmitted from the source core of one chip to the destination core of another chip through the inter-chip routing modules and the on-chip network, so as to realize the transmission of data among the cores of multiple chips.
  21. 根据权利要求20所述的数据传输方法,其中,第一芯片的一个区块对应设置的片间路由模块与第二芯片的一个区块对应设置的片间路由模块相连,以使第一芯片的该区块与第二芯片的该区块连接;通过各片间路由模块和片上网络将数据从一芯片的源核传输至另一芯片的目的核,包括:The data transmission method according to claim 20, wherein the inter-chip routing module corresponding to a block of the first chip is connected to the inter-chip routing module corresponding to a block of the second chip, so that the The block is connected to the block of the second chip; transmitting data from the source core of one chip to the destination core of another chip through the inter-chip routing modules and the on-chip network, including:
    通过第一芯片的源核将数据传递至所述第一芯片的源核所在区块内的片上网络节点;Transmitting data to the on-chip network node in the block where the source core of the first chip is located through the source core of the first chip;
    通过所述第一芯片的源核所在区块内的片上网络节点将所述数据传递至所述第一芯片的源核所在区块对应设置的片间路由模块;Transmitting the data to an inter-chip routing module corresponding to the block where the source core of the first chip is located through an on-chip network node in the block where the source core of the first chip is located;
    通过所述第一芯片的源核所在区块对应设置的片间路由模块将所述数据传递至第一芯片的其他片间路由模块;Transferring the data to other inter-chip routing modules of the first chip through an inter-chip routing module corresponding to the block where the source core of the first chip is located;
    通过所述第一芯片的其他片间路由模块将所述数据传递至与第二芯片连接的第一芯片的区块对应设置的片间路由模块;Transmitting the data to the inter-chip routing module corresponding to the block of the first chip connected to the second chip through other inter-chip routing modules of the first chip;
    通过所述与第二芯片连接的第一芯片的区块对应设置的片间路由模块将所述数据传递至第二芯片的目的核所在区块对应设置的片间路由模块;Transmitting the data to the inter-chip routing module corresponding to the block where the target core of the second chip is located through the inter-chip routing module provided corresponding to the block of the first chip connected to the second chip;
    通过所述第二芯片的目的核所在区块对应设置的片间路由模块将所述数据传递至第二芯片的目的核所在区块内的片上网络节点;Transmitting the data to an on-chip network node in the block where the target core of the second chip is located through an inter-chip routing module corresponding to the block where the target core of the second chip is located;
    通过所述第二芯片的目的核所在区块内的片上网络节点将所述数据传递至第二芯片的目的核。The data is transferred to the target core of the second chip through an on-chip network node in the block where the target core of the second chip is located.
  22. 根据权利要求20所述的数据传输方法,其中,第一芯片的一个区块对应设置的片间路由模块与第二芯片的一个区块对应设置的片间路由模块相连,以使第一芯片的该区块与第二芯片的该区块连接;通过各片间路由模块和片上网络将数据从一芯片的源核传输至另一芯 片的目的核,包括:The data transmission method according to claim 20, wherein the inter-chip routing module corresponding to a block of the first chip is connected to the inter-chip routing module corresponding to a block of the second chip, so that the The block is connected to the block of the second chip; transmitting data from the source core of one chip to the destination core of another chip through the inter-chip routing modules and the on-chip network, including:
    通过第一芯片的源核将数据传递至所述第一芯片的源核所在区块内的片上网络节点;Transmitting data to the on-chip network node in the block where the source core of the first chip is located through the source core of the first chip;
    通过所述第一芯片的源核所在区块内的片上网络节点将所述数据传递至第一芯片的其他区块内的片上网络节点;Transmitting the data to network nodes on a chip in other blocks of the first chip through a network node on a chip in the block where the source core of the first chip is located;
    通过所述第一芯片的其他区块内的片上网络节点将所述数据传递至与第二芯片连接的第一芯片的区块对应设置的片间路由模块;Transmitting the data to the inter-chip routing module corresponding to the block of the first chip connected to the second chip through on-chip network nodes in other blocks of the first chip;
    通过所述与第二芯片连接的第一芯片的区块对应设置的片间路由模块将所述数据传递至第二芯片的目的核所在区块对应设置的片间路由模块;Transmitting the data to the inter-chip routing module corresponding to the block where the target core of the second chip is located through the inter-chip routing module provided corresponding to the block of the first chip connected to the second chip;
    通过所述第二芯片的目的核所在区块对应设置的片间路由模块将所述数据传递至第二芯片的目的核所在区块内的片上网络节点;Transmitting the data to an on-chip network node in the block where the target core of the second chip is located through an inter-chip routing module corresponding to the block where the target core of the second chip is located;
    通过所述第二芯片的目的核所在区块内的片上网络节点将所述数据传递至第二芯片的目的核。The data is transferred to the target core of the second chip through an on-chip network node in the block where the target core of the second chip is located.
  23. 根据权利要求12所述的数据传输方法,其中,每个芯片包括多个沿其周向设置的多个片间路由模块;所述通过片间路由模块实现区块间的数据传输包括:The data transmission method according to claim 12, wherein each chip includes a plurality of inter-chip routing modules arranged along its circumference; the realization of inter-block data transmission through the inter-chip routing module includes:
    数据在传输过程中,在同一个芯片的多个片间路由模块中,数据按照预定的时针方向传输。During data transmission, in multiple inter-chip routing modules on the same chip, data is transmitted in a predetermined clockwise direction.
  24. 一种板卡,其中,所述板卡上集成有如权利要求1-11中任意一项所述的片上网络互联结构。A board card, wherein the board card is integrated with the on-chip network interconnection structure according to any one of claims 1-11.
  25. 一种电子设备,包括存储器和处理器,其中,所述存储器用于存储一条或多条计算机指令,其中,所述一条或多条计算机指令能被处理器执行以实现如权利要求12-23中任一项所述的数据传输方法。An electronic device, comprising a memory and a processor, wherein the memory is used to store one or more computer instructions, wherein the one or more computer instructions can be executed by the processor to implement as claimed in claims 12-23 Any of the data transmission methods.
  26. 一种计算机可读存储介质,其中,所述计算机可读存储介质上存储计算机程序指令,所述计算机程序指令在被处理器执行时实现如权利要求12-23中任一项所述的数据传输方法。A computer-readable storage medium, wherein computer program instructions are stored on the computer-readable storage medium, and the computer program instructions, when executed by a processor, realize the data transmission according to any one of claims 12-23 method.
PCT/CN2021/071449 2020-01-20 2021-01-13 Network-on-chip interconnection structure of many-core system, data transmission method, board card, electronic device, and computer-readable storage medium WO2021147721A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010067456.5 2020-01-20
CN202010067456.5A CN113138955B (en) 2020-01-20 2020-01-20 Network-on-chip interconnection structure of many-core system and data transmission method

Publications (1)

Publication Number Publication Date
WO2021147721A1 true WO2021147721A1 (en) 2021-07-29

Family

ID=76809165

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/071449 WO2021147721A1 (en) 2020-01-20 2021-01-13 Network-on-chip interconnection structure of many-core system, data transmission method, board card, electronic device, and computer-readable storage medium

Country Status (2)

Country Link
CN (1) CN113138955B (en)
WO (1) WO2021147721A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114968903A (en) * 2022-04-21 2022-08-30 清华大学 Many external control circuit of nuclear chip

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115794732B (en) * 2023-01-29 2023-07-04 北京超摩科技有限公司 Network-on-chip and network-on-package layered interconnection system based on core particles

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101739241A (en) * 2008-11-12 2010-06-16 中国科学院微电子研究所 On-chip multi-core DSP cluster and application extension method
CN101753388A (en) * 2008-11-28 2010-06-23 中国科学院微电子研究所 Router and interface device suitable for the extending on and among sheets of polycaryon processor
US20140119363A1 (en) * 2012-10-30 2014-05-01 Empire Technology Development Llc Waved Time Multiplexing
CN104008084A (en) * 2014-06-02 2014-08-27 复旦大学 Extensible 2.5-dimensional multi-core processor architecture
CN104077138A (en) * 2014-06-27 2014-10-01 中国科学院计算技术研究所 Multiple core processor system for integrating network router, and integrated method and implement method thereof
CN107005477A (en) * 2014-12-22 2017-08-01 英特尔公司 The route device based on link delay for network-on-chip
CN107807901A (en) * 2017-09-14 2018-03-16 武汉科技大学 A kind of expansible restructural polycaryon processor connection method

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101739241A (en) * 2008-11-12 2010-06-16 中国科学院微电子研究所 On-chip multi-core DSP cluster and application extension method
CN101753388A (en) * 2008-11-28 2010-06-23 中国科学院微电子研究所 Router and interface device suitable for the extending on and among sheets of polycaryon processor
US20140119363A1 (en) * 2012-10-30 2014-05-01 Empire Technology Development Llc Waved Time Multiplexing
CN104008084A (en) * 2014-06-02 2014-08-27 复旦大学 Extensible 2.5-dimensional multi-core processor architecture
CN104077138A (en) * 2014-06-27 2014-10-01 中国科学院计算技术研究所 Multiple core processor system for integrating network router, and integrated method and implement method thereof
CN107005477A (en) * 2014-12-22 2017-08-01 英特尔公司 The route device based on link delay for network-on-chip
CN107807901A (en) * 2017-09-14 2018-03-16 武汉科技大学 A kind of expansible restructural polycaryon processor connection method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114968903A (en) * 2022-04-21 2022-08-30 清华大学 Many external control circuit of nuclear chip
CN114968903B (en) * 2022-04-21 2024-04-19 清华大学 External control circuit of many-core chip

Also Published As

Publication number Publication date
CN113138955A (en) 2021-07-20
CN113138955B (en) 2024-04-02

Similar Documents

Publication Publication Date Title
US11755504B2 (en) Multiprocessor system with improved secondary interconnection network
WO2021147721A1 (en) Network-on-chip interconnection structure of many-core system, data transmission method, board card, electronic device, and computer-readable storage medium
CN1934831B (en) Integrated circuit and method of communication service mapping
WO2020093887A1 (en) Data transmission method and device for network-on-chip (noc) and electronic device
JP3739798B2 (en) System and method for dynamic network topology exploration
US8417778B2 (en) Collective acceleration unit tree flow control and retransmit
US8473818B2 (en) Reliable communications in on-chip networks
CN100583819C (en) Integrated circuit and method for packet switching control
US8751655B2 (en) Collective acceleration unit tree structure
TWI236251B (en) A switching I/O node for connection in a multiprocessor computer system
US20120185633A1 (en) On-chip router and multi-core system using the same
CN100592711C (en) Integrated circuit and method for packet switching control
US20110010522A1 (en) Multiprocessor communication protocol bridge between scalar and vector compute nodes
CN114647602A (en) Cross-chip access control method, device, equipment and medium
KR20150118170A (en) Enhanced 3d torus
CN116383114B (en) Chip, chip interconnection system, data transmission method, electronic device and medium
Schelten et al. A high-throughput, resource-efficient implementation of the RoCEv2 remote DMA protocol and its application
US8645557B2 (en) System of interconnections for external functional blocks on a chip provided with a single configurable communication protocol
WO2022105325A1 (en) Rerouting method, communication apparatus and storage medium
EP4283479A1 (en) Interconnection system, data transmission method, and chip
EP3229145A1 (en) Parallel processing apparatus and communication control method
CN116821044B (en) Processing system, access method and computer readable storage medium
WO2024021878A1 (en) Method for sending load information, method for sending message, and apparatus
US20240020261A1 (en) Peer-to-peer route through in a reconfigurable computing system
TW202002576A (en) Network interface cards, fabric cards, and line cards for loop avoidance in a chassis switch

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21744936

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21744936

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 100223)

122 Ep: pct application non-entry in european phase

Ref document number: 21744936

Country of ref document: EP

Kind code of ref document: A1