WO2021109698A1 - Chip and inter-core data transmission method - Google Patents

Chip and inter-core data transmission method Download PDF

Info

Publication number
WO2021109698A1
WO2021109698A1 PCT/CN2020/118709 CN2020118709W WO2021109698A1 WO 2021109698 A1 WO2021109698 A1 WO 2021109698A1 CN 2020118709 W CN2020118709 W CN 2020118709W WO 2021109698 A1 WO2021109698 A1 WO 2021109698A1
Authority
WO
WIPO (PCT)
Prior art keywords
processing core
processing
data transmission
chip
core
Prior art date
Application number
PCT/CN2020/118709
Other languages
French (fr)
Chinese (zh)
Inventor
罗飞
王维伟
Original Assignee
北京希姆计算科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京希姆计算科技有限公司 filed Critical 北京希姆计算科技有限公司
Publication of WO2021109698A1 publication Critical patent/WO2021109698A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7807System on chip, i.e. computer system on a single chip; System in package, i.e. computer system on one or more chips in a single package
    • G06F15/7825Globally asynchronous, locally synchronous, e.g. network on chip
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • G06F13/40Bus structure
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • G06F13/40Bus structure
    • G06F13/4004Coupling between buses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • G06F13/40Bus structure
    • G06F13/4004Coupling between buses
    • G06F13/4022Coupling between buses using switching circuits, e.g. switching matrix, connection or expansion network
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7807System on chip, i.e. computer system on a single chip; System in package, i.e. computer system on one or more chips in a single package
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the present invention relates to the technical field of chips, in particular to a method for transmitting data between chips and cores.
  • the chip is the cornerstone of data processing, and it fundamentally determines the ability of people to process data. From the perspective of application fields, there are two main routes for chips: one is a general-purpose chip route, such as a central processing unit (CPU), etc. They can provide great flexibility, but they are effective in processing algorithms in specific fields. The power is relatively low; the other is a dedicated chip route, such as Tensor Processing Unit (TPU), etc. They can exert higher effective computing power in some specific fields, but they are more versatile in the face of flexible and changeable In the field, their processing power is relatively poor or even unable to handle.
  • a general-purpose chip route such as a central processing unit (CPU), etc. They can provide great flexibility, but they are effective in processing algorithms in specific fields. The power is relatively low; the other is a dedicated chip route, such as Tensor Processing Unit (TPU), etc. They can exert higher effective computing power in some specific fields, but they are more versatile in the face of flexible and changeable In the field, their processing power is relatively poor or even
  • the chip Due to the wide variety and huge amount of data in the intelligent era, the chip is required to have extremely high flexibility, capable of processing different fields and rapidly changing algorithms, and extremely strong processing capabilities, which can quickly process extremely large and rapidly increasing data. the amount.
  • the purpose of the present invention is to provide a data transmission method between chips and cores.
  • the chip provided by the embodiment of the present invention includes multiple interconnection structures.
  • the processing cores in the same processing core group realize data transmission through the interconnection structure connected to the group, so that the data transmission between different groups of cores can be processed in parallel, greatly Improve the total data bandwidth and improve the performance of the chip.
  • the first aspect of the present invention provides a chip including: a plurality of interconnection structures, each interconnection structure is connected to a processing core group, and all the processing cores connected by one interconnection structure are one Processing core group; each of the processing cores belonging to the same processing core group performs data transmission through the interconnection structure connected to the processing core group; wherein multiple interconnection structures are not directly connected.
  • the chip provided by the embodiment of the present invention includes a plurality of interconnection structures.
  • the processing cores in the same processing core group realize data transmission through the interconnection structure connected to the group, so that data transmission between cores in different processing groups can be achieved.
  • Parallel processing greatly improves the total data bandwidth and improves the performance of the chip.
  • data transmission can be performed between processing cores belonging to different processing core groups.
  • the amount of data transmitted between any two processing cores belonging to a processing core group is greater than the amount of data transmitted between any two different processing core groups.
  • the plurality of interconnection structures includes a first interconnection structure and a second interconnection structure; the first interconnection structure is connected to a first processing core; the first processing core is connected to the second interconnection structure.
  • first interconnect structure is also connected to a second processing core; the second interconnect structure is connected to a third processing core; the second processing core passes through the first processing core and the third processing core Realize data transmission.
  • the interconnection structure includes a third interconnection structure; all the processing cores connected by the third interconnection structure are not connected to other interconnection structures.
  • processing core connected to each interconnection structure is not connected to other interconnection structures.
  • each of the processing cores includes at least one transmission unit, and the bandwidth of the interconnection structure meets the bandwidth requirement of the transmission unit.
  • the bandwidth of the interconnect structure meets the bandwidth requirements of each of the processing cores in the processing core group connected to the interconnect structure.
  • the bandwidth of the interconnection structure is greater than the bandwidth requirements of each of the processing cores in the processing core group connected to the interconnection structure.
  • At least two clock frequencies are provided for the plurality of interconnect structures.
  • bit width values are set for the plurality of interconnect structures.
  • At least two processing core groups have different bandwidth requirements.
  • a card board which includes one or more chips provided in the first aspect.
  • an electronic device including one or more card boards provided in the second aspect.
  • an inter-core data transmission method used in the chip provided in the first aspect includes: each of the processing cores belonging to the same processing core group passes through the processing cores.
  • the group-connected interconnection structure realizes data transmission; the two processing cores respectively belonging to two different processing core groups realize data transmission through at least two of the interconnection structures.
  • the two processing cores respectively belonging to two different processing core groups realize data transmission through at least two interconnection structures, including: the two processing cores respectively belonging to two different processing core groups .
  • the two processing cores respectively belonging to two different processing core groups Through one processing core belonging to the two different processing core groups at the same time, data transmission is realized through the two interconnection structures.
  • a computer storage medium having a computer program stored on the computer storage medium, and when the program is executed by a processor, the steps of the inter-core data transmission method of the fourth aspect are realized.
  • an electronic device including a memory, a processor, and a computer program stored in the memory and capable of running on the processor.
  • the processor implements The fourth aspect of the steps of the inter-core data transmission method.
  • a computer program product which includes computer instructions, and when the computer instructions are executed by a computing device, the computing device can execute the steps of the fourth aspect of the inter-core data transmission method.
  • the chip provided by the embodiment of the present invention is provided with multiple interconnection structures.
  • the processing cores of the same processing core group realize data transmission through the interconnection structure connected to the group, so that the data transmission of different processing core groups can be processed in parallel.
  • it greatly improves the total data bandwidth, improves the efficiency of data transmission between cores, reduces power consumption, and improves the performance of the chip; on the other hand, because different groups of data transmission can be processed in parallel, different groups The data transmission will not interfere with each other, thereby avoiding data congestion and improving the performance of the chip.
  • a processing core group adopts an interconnection structure to realize data transmission
  • different interconnection structures can be set with different clock frequencies and different bit widths, and the corresponding bit widths can be set according to the needs of each group of data transmission.
  • All cores in a chip use an interconnection structure for data transmission, which reduces power consumption, improves the performance of the entire chip and increases the flexibility of chip design.
  • the interconnection is determined according to the actual bandwidth requirements of different processing core groups.
  • the appropriate bit width and frequency of the structure can save the area of the chip.
  • Figure 1 is a schematic diagram of the structure of a chip
  • FIG. 2 is a structural diagram of a chip according to an embodiment of the present invention.
  • Fig. 3 is a structural diagram of a chip according to an embodiment of the present invention.
  • FIG. 4 is a schematic flowchart of a method for transmitting data between cores according to an embodiment of the present invention.
  • all cores may be isomorphic, that is, each core has the same structure, and there may also be heterogeneous cores in all cores, that is, there are at least two different cores. .
  • a multi-core architecture multiple cores often participate in the execution of a certain task at the same time. At this time, there will be data transmission between the cores. Therefore, for a multi-core chip, which architecture is adopted, how to perform data transmission, so that homogeneous or heterogeneous cores can form a chip with excellent performance are very important.
  • Figure 1 is a schematic diagram of the structure of a chip.
  • the chip has only one network on chip, and all cores are connected to each other through a network on chip (NoC, Network On Chip) to exchange data.
  • NoC Network On Chip
  • the chip is equipped with cores of various structures, including K 1 _C 1 , K 2 _C 1 and K M _C 1 , where M indicates that the type of core is M, each The number of cores is multiple.
  • the first type of core K 1 includes K 1 _C 1 , K 1 _C 2 and K 1 _C NM , where NM means that the number of each type of core is NM. That is, K M _C NM is represented as the NM-th nucleus of the M-th structure.
  • each core needs to know the data format of all other cores that need to exchange data. (Bit width). Since the same NoC is used between the core and the core, there are higher requirements on the format of the data packet sent or received, that is, the sending core encodes the data format into the data format of the receiving core, or the receiving core is based on The sending core type re-decodes and encodes the data into the data format of the receiving core, resulting in a high complexity of the data packet routing algorithm and a heavy burden on each core.
  • the other cores cannot use NoC for data transmission. You need to wait for the pair of cores to transmit data. Data transmission, this will cause data transmission delays, resulting in a decrease in the performance of the chip. For example, the data transmission bandwidth between the first core and the second core is very large. During the data transmission between the first core and the second core, the third core generates data that needs to be sent to the fourth core.
  • This part of the data volume is relatively small, but the requirements for real-time performance are very high, and because all cores share a NoC, the third core needs to wait until the data transmission between the first core and the second core is completed before proceeding. Data transmission, such data with high real-time requirements is not transmitted in time, which affects the performance of the chip.
  • Fig. 2 is a structural diagram of a chip according to an embodiment of the present invention.
  • the chip includes: a plurality of interconnection structures, each interconnection structure is connected to a processing core group, all the processing cores connected by one interconnection structure are a processing core group; the processing core group Each of the processing cores in the processing core performs data transmission through the interconnection structure connected to the processing core group; wherein, a plurality of the interconnection structures are not directly connected.
  • the chip includes o interconnection structures, where the interconnection structure may be an inter-core interconnection structure Fabric, a network on chip (Network On Chip, Noc), a bus, or a switch.
  • the interconnection structure of the present invention uses the on-chip network Noc as an example, but is not limited to this.
  • o interconnected structures are NoC 1 , NoC 2 ... NoC o , and each NoC is connected to a processing core group.
  • Each processing core in the processing core group performs data transmission through the NoC connected to the processing core group.
  • the chip includes a seed cores M, each core number N M number, respectively.
  • K 2 _C N2 represents the N 2th nucleus of the second type of nucleus
  • K M _C NM represents the N Mth nucleus of the Mth type of nucleus.
  • the first processing core group includes core K 1 _C 1 , core K 1 _C 2 ... core K 1 _C N1 , core K 2 _C 1 and core K 2 _C 2 .
  • the first network on chip NoC 1 is connected to the first processing core group, that is, each processing core of the first processing core group is connected to NoC 1 .
  • Data transmission between the cores of the first processing core group is achieved through the first network on chip NoC 1. For example, when the core K 1 _C 1 has data to be transmitted to the core K 1 _C 2 , the data is sent to the first on-chip network NoC 1 , and the first on-chip network NoC 1 transmits the data to the core K 1 _C 2 .
  • the second processing core group includes: core K 1 _C 1 , core K 1 _C N1 , core K 2 _C 2 , ... core K 2 _C N2 , core K M _C 1 and core K M _C 2 , this second processing core group
  • Each of the processing cores is connected to the second NoC 2 on-chip network. Data transmission between the cores of the second processing core group is achieved through the second network on chip NoC 2.
  • the o-th processing core group includes a core K 1 _C N1 , a core K 2 _C 2 , a core K M _C 2 , ... a core K M _C NM .
  • Each core of the o-th processing core group is connected to the o-th on-chip network NoC o .
  • Data transmission is realized between the cores of the o-th processing core group through the o-th interconnect structure NoC O.
  • the chip provided by the embodiment of the present invention is provided with a plurality of interconnection structures, and each interconnection structure is connected to a processing core group.
  • Each core of the same processing core group realizes data transmission through the interconnection structure connected to the group.
  • the data transmission between cores can be processed in parallel.
  • transmitting data between cores belonging to the same group only the data format of the cores in the same group needs to be considered, and there is no need to consider other groups of the chip.
  • the data format of the core reduces the complexity of the circuit in the chip, improves the data transmission efficiency between the core and the core, reduces the power consumption, and improves the performance of the chip; on the other hand, because the data transmission of different groups can be processed in parallel , The data transmission of different groups will not interfere with each other, thereby avoiding data congestion and improving the performance of the chip.
  • data transmission can be performed between processing cores belonging to different processing core groups.
  • the plurality of interconnection structures includes a first interconnection structure and a second interconnection structure; the first interconnection structure is connected to a first processing core; and the first processing core is connected to the second interconnection structure.
  • the same processing core can be set in two different processing core groups at the same time.
  • the processing cores K 1 _C 1 are both set in the first processing core.
  • the group it is also set in the second processing core group, that is, the processing core K 1 _C 1 is connected to both the first network on chip NoC 1 and the second network on chip NoC 2 .
  • the amount of data transmitted between any two processing cores belonging to the same processing core group is greater than the amount of data transmitted between any two different groups of processing cores.
  • the first interconnection structure is further connected to a second processing core; the second interconnection structure is connected to a third processing core; the second processing core is connected to the first processing core through the first processing core.
  • Three processing cores realize data transmission.
  • the processing cores K 1 _C 2 of the first processing core group need to send data to the processing cores K M _C 2 of the second processing core group, which can pass through the processing cores K 1 _C 1 , K 1 _C N1 or K 2 _C 2 implementation.
  • the processing core K 1 _C 2 sends data to the processing core K 1 _C 1 through the first on-chip network NoC 1
  • the processing core K 1 _C 1 sends the data to the processing core K M _C through the second on-chip network NoC 2 2 .
  • the multiple interconnection structures include a third interconnection structure; all the processing cores connected by the third interconnection structure are not connected to other interconnection structures.
  • the chip further includes a third network on chip, the third network on chip is connected to a third processing core group, and the third processing core group includes cores K 1 _C 3 and core K 3 _C 1 , these two cores are not connected to other interconnect structures.
  • the processing core connected to each interconnect structure in the chip is not connected to other interconnect structures.
  • the processing cores connected to each interconnection structure are not connected to other interconnection structures, that is, there is no data transmission between the two processing cores respectively belonging to the two processing core groups.
  • each of the processing cores includes at least one transmission unit, and the bandwidth of the interconnect structure meets the bandwidth requirement of the transmission unit.
  • the bandwidth of the interconnect structure is equal to the bandwidth requirement of the transmission unit, so that the processing core can send data to the interconnect structure.
  • the bandwidth requirement of the transmission unit may be the bandwidth required by the transmission unit to send a certain amount of data within a certain period of time.
  • the bandwidth of the interconnect structure meets the bandwidth requirements of each of the processing cores in the processing core group connected to the interconnect structure.
  • the bandwidth requirement of the processing core may be the bandwidth required by the processing core to send a certain amount of data within a certain period of time.
  • the bandwidth of the interconnect structure is greater than the bandwidth requirements of each of the processing cores in the processing core group connected to the interconnect structure.
  • each interconnection structure can best match the structure of all the processing cores in the processing core group connected to the interconnection structure. On the one hand, it can save the area to the greatest extent, and on the other hand, it can make the chip have better performance.
  • At least two processing core groups in the chip have different bandwidth requirements.
  • the bandwidth requirement of the first group is 1MBps
  • the second group is 10MBps.
  • the Noc can save the area of Noc to the greatest extent.
  • FIG. 3 is a schematic diagram of a chip structure provided by an embodiment of the present invention.
  • the chip includes 6 cores. These 6 cores include 3 cores with different structures, namely, the first core K 1 , the second core K 2 and the third core K 3 , each There are two types of nuclei.
  • the two cores in the first type of core K 1 are: core K 1 _C 1 and core K 1 _C 2
  • the two cores in the second type of core K 2 are: core K 2 _C 1 and core K 2 _C 2.
  • the two cores in the third type of core K 2 are: core K 3 _C 1 and core K 3 _C 2 .
  • the 6 cores in this chip are divided into three groups.
  • the first processing core group is composed of core K 1 _C 1 , core K 1 _C 2 , core K 2 _C 1 and K 2 _C 2 , and each core in the first processing core group is connected to the first interconnection structure NoC 1 .
  • the second processing core group is composed of K 1 _C 1 , K 2 _C 2 , K 3 _C 1 , K 3 _C 2 , and each core in the second processing core group is connected to the second interconnection structure NoC 2 .
  • the third processing core group is composed of K 1 _C 2 , K 2 _C 2 , and K 3 _C 2 , and each core in the third processing core group is connected to the third interconnection structure NoC 3 .
  • each core in the chip includes at least one computing unit.
  • the calculation unit may be an execution unit (Execution Unit, EU) or a processing unit (Processing Unit, PU).
  • the chip includes multiple interconnected structures. Each interconnected structure is connected to a processing core group. The same group of processing cores are connected to the processing core group to achieve data transmission. The interconnected structure will be based on the corresponding group of cores. The required bandwidth, the data bit width used by the core, and the data packet and corresponding protocol dedicated to this group of cores are designed.
  • At least two clock frequencies are set in the plurality of interconnect structures in the chip. Since multiple interconnect structures are provided in the chip, each interconnect structure is connected to a group of processing cores, and data transmission between the groups of processing cores can be processed in parallel, so multiple interconnect structures in the chip can be set with multiple different clock frequencies.
  • multiple interconnect structures in the chip can also be set to the same clock frequency.
  • At least two bit width values are set in the plurality of interconnect structures. Since multiple interconnection structures are provided in the chip, each interconnection structure is connected to a group of processing cores, and the data transmission between each group of processing cores can be processed in parallel, so multiple bit width values can be set in the interconnection structure.
  • multiple interconnect structures in the chip may also be set to the same bit width value.
  • bandwidth bit width * clock frequency. If a higher bandwidth is required for data transmission between processing cores of a certain processing core group, the bandwidth of the NoC can be determined according to the bandwidth requirements of the processing core group (for example, determine Appropriate clock frequency and bit width), so that the bandwidth of the NoC meets the bandwidth requirements of each processing core in the processing core group. For example, set a higher NoC bit width. In addition, since the data transmission between each processing core group is independent of each other, if other processing core groups do not require a higher bandwidth, you can set a suitable NoC bit width according to their needs, which can save the area of the chip, thereby Save power consumption.
  • the chip of the embodiment of the present invention since one NoC is used for data transmission between a processing core group, different NoCs can be set with different clock frequencies and different bit widths, and corresponding bits can be set according to the needs of each group of data transmission. Compared with all cores in a chip using an interconnect structure for transmission, it reduces power consumption, improves the performance of the entire chip, increases the flexibility of chip design, and saves area.
  • the chip provided by the embodiment of the present invention is provided with a plurality of interconnection structures, and data transmission between the same processing core group is realized through the interconnection structure connected to the processing core group, so that the cores of different processing core groups can communicate with each other.
  • Data transmission can be processed in parallel. On the one hand, it greatly increases the total data bandwidth, improves the efficiency of data transmission between cores, reduces power consumption, and improves the performance of the chip; on the other hand, due to the data transmission of different groups It can be processed in parallel, and the data transmission of different groups will not interfere with each other, thereby avoiding data congestion and improving the performance of the chip.
  • An embodiment of the present invention also provides a card board, which includes one or more chips provided in the foregoing embodiments.
  • An embodiment of the present invention also provides an electronic device, including one or more card boards provided in the foregoing embodiments.
  • FIG. 4 is a schematic flowchart of a method for transmitting data between cores according to an embodiment of the present invention.
  • the method includes steps S101 to S102:
  • Step S101 each of the processing cores belonging to the same processing core group realizes data transmission through the interconnection structure connected to the processing core group.
  • step S102 the two processing cores respectively belonging to two different processing core groups implement data transmission through at least two interconnection structures.
  • the step of implementing data transmission for the two processing cores respectively belonging to two different processing core groups through at least two interconnection structures includes:
  • the two processing cores respectively belonging to two different processing core groups realize data transmission through the two interconnection structures via one processing core belonging to the two different processing core groups at the same time.
  • K 2 _C 1 transmits data to the K 1 _C 1 through NoC 1
  • K 1 _C 1 transmits the data through the NoC 2 to K 3 _C 1, in order to achieve data transmission.
  • two processing cores belonging to two different processing core groups can also realize data transmission through multiple interconnection structures.
  • the processing core K 2 _C 1 can send data to the processing core K 3 _C 1 through NoC 1 , NoC 2 and NoC 3 .
  • the processing core K 2 _C 1 sends data to the processing core K 1 _C 2 through NoC 1
  • the processing core K 1 _C 2 sends the data to the processing core K 3 _C 2 through NoC 3
  • the processing core K 3 _C 2 will The data is sent to the processing core K 3 _C 1 through NoC 2 .
  • each processing core belonging to the same processing core group realizes data transmission through an interconnect structure connected to the processing core group, so that data transmission between different groups of cores can be processed in parallel.
  • This greatly improves the total data bandwidth improves the efficiency of data transmission between cores, reduces power consumption, and improves the performance of the chip; on the other hand, because the data transmission of different groups can be processed in parallel, the data of different groups can be processed in parallel. The transmission will not interfere with each other, thereby avoiding data congestion and improving the performance of the chip.
  • a computer storage medium is provided, and a computer program is stored on the computer storage medium.
  • the program is executed by a processor, the steps of the inter-core data transmission method provided in the foregoing embodiment are implemented.
  • an electronic device including a memory, a processor, and a computer program stored on the memory and running on the processor, and the processor implements The steps of the inter-core data transmission method provided in the foregoing implementation manners.
  • a computer program product which includes computer instructions, and when the computer instructions are executed by a computing device, the computing device can execute the steps of the inter-core data transmission method provided in the above embodiments .
  • each block in the flowchart or block diagram may represent a module, program segment, or part of code, and the module, program segment, or part of code contains one or more for realizing the specified logical function Executable instructions.
  • the functions marked in the block may also occur in a different order from the order marked in the drawings. For example, two blocks shown in succession can actually be executed substantially in parallel, or they can sometimes be executed in the reverse order, depending on the functions involved.
  • each block in the block diagram and/or flowchart, and the combination of the blocks in the block diagram and/or flowchart can be implemented by a dedicated hardware-based system that performs the specified functions or operations Or it can be realized by a combination of dedicated hardware and computer instructions.
  • the units involved in the embodiments described in the present disclosure can be implemented in software or hardware. Among them, the name of the unit does not constitute a limitation on the unit itself under certain circumstances.
  • exemplary types of hardware logic components include: Field Programmable Gate Array (FPGA), Application Specific Integrated Circuit (ASIC), Application Specific Standard Product (ASSP), System on Chip (SOC), Complex Programmable Logical device (CPLD) and so on.
  • FPGA Field Programmable Gate Array
  • ASIC Application Specific Integrated Circuit
  • ASSP Application Specific Standard Product
  • SOC System on Chip
  • CPLD Complex Programmable Logical device
  • a machine-readable medium may be a tangible medium, which may contain or store a program for use by the instruction execution system, apparatus, or device or in combination with the instruction execution system, apparatus, or device.
  • the machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • the machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or any suitable combination of the foregoing.
  • machine-readable storage media would include electrical connections based on one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the foregoing.
  • RAM random access memory
  • ROM read-only memory
  • EPROM or flash memory erasable programmable read-only memory
  • CD-ROM compact disk read only memory
  • magnetic storage device or any suitable combination of the foregoing.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Microelectronics & Electronic Packaging (AREA)
  • Mathematical Physics (AREA)
  • Multi Processors (AREA)

Abstract

Provided is a chip and an inter-core data transmission method. The chip comprises: a plurality of interconnection structures, wherein each of the interconnection structures is connected to a processing core group, and the processing core group comprises at least two processing cores; data transmission between the processing cores in the processing core group is performed by means of the interconnection structure which is connected to the processing core group; and the plurality of interconnection structures are not directly connected to each other. In the chip, data transmission between cores in the same processing core group is realized by means of the interconnection structure corresponding to the group, and data transmission between cores in different processing core groups can be handled in parallel, such that a total data bandwidth is greatly increased, the efficiency of data transmission between cores is improved, the power consumption is reduced, and the performance of the chip is improved; moreover, since data transmission of different groups can be handled in parallel, the data transmission of different groups will not interfere with each other, thereby preventing the occurrence of a data congestion phenomenon and improving the performance of the chip.

Description

一种芯片及核间数据传输方法Data transmission method between chip and core 技术领域Technical field
本发明涉及芯片技术领域,尤其是涉及一种芯片及核间数据传输方法。The present invention relates to the technical field of chips, in particular to a method for transmitting data between chips and cores.
背景技术Background technique
随着科学技术的发展,人类社会正在快速进入智能时代。智能时代的重要特点,就是人们获得数据的种类越来越多,获得数据的量越来越大,而对处理数据的速度要求越来越高。With the development of science and technology, human society is rapidly entering the era of intelligence. The important feature of the intelligent age is that people have more and more types of data, the amount of data they can obtain is larger and larger, and the requirements for the speed of data processing are getting higher and higher.
芯片是数据处理的基石,它从根本上决定了人们处理数据的能力。从应用领域来看,芯片主要有两条路线:一条是通用芯片路线,例如中央处理器(Central Processing Unit,CPU)等,它们能提供极大的灵活性,但是在处理特定领域算法时有效算力比较低;另一条是专用芯片路线,例如张量处理器(Tensor Processing Unit,TPU)等,它们在某些特定领域,能发挥较高的有效算力,但是面对灵活多变的比较通用的领域,它们处理能力比较差甚至无法处理。The chip is the cornerstone of data processing, and it fundamentally determines the ability of people to process data. From the perspective of application fields, there are two main routes for chips: one is a general-purpose chip route, such as a central processing unit (CPU), etc. They can provide great flexibility, but they are effective in processing algorithms in specific fields. The power is relatively low; the other is a dedicated chip route, such as Tensor Processing Unit (TPU), etc. They can exert higher effective computing power in some specific fields, but they are more versatile in the face of flexible and changeable In the field, their processing power is relatively poor or even unable to handle.
由于智能时代的数据种类繁多且数量巨大,所以要求芯片既具有极高的灵活性,能处理不同领域且日新月异的算法,又具有极强的处理能力,能快速处理极大的且急剧增长的数据量。Due to the wide variety and huge amount of data in the intelligent era, the chip is required to have extremely high flexibility, capable of processing different fields and rapidly changing algorithms, and extremely strong processing capabilities, which can quickly process extremely large and rapidly increasing data. the amount.
发明内容Summary of the invention
(一)发明目的(1) Purpose of the invention
本发明的目的是提供一种芯片及核间数据传输方法。本发明实施方式提供的芯片,包括多个互联结构,同一处理核组内的处理核间通过与该组连接 的互联结构实现数据传输,这样使得不同组核之间的数据传输能够并行处理,大大提高了总的数据带宽,提高了芯片的性能。The purpose of the present invention is to provide a data transmission method between chips and cores. The chip provided by the embodiment of the present invention includes multiple interconnection structures. The processing cores in the same processing core group realize data transmission through the interconnection structure connected to the group, so that the data transmission between different groups of cores can be processed in parallel, greatly Improve the total data bandwidth and improve the performance of the chip.
(二)技术方案(2) Technical solution
为解决上述问题,本发明的第一方面提供了一种芯片,包括:多个互联结构,每个所述互联结构连接一个处理核组,一个所述互联结构连接的全部所述处理核为一个处理核组;归属于同一所述处理核组的各个所述处理核通过与所述处理核组连接的所述互联结构进行数据传输;其中,多个所述互联结构之间不直接相连。In order to solve the above-mentioned problems, the first aspect of the present invention provides a chip including: a plurality of interconnection structures, each interconnection structure is connected to a processing core group, and all the processing cores connected by one interconnection structure are one Processing core group; each of the processing cores belonging to the same processing core group performs data transmission through the interconnection structure connected to the processing core group; wherein multiple interconnection structures are not directly connected.
本发明实施方式提供的芯片,包括多个互联结构,同一处理核组内的处理核间通过与该组连接的互联结构实现数据传输,这样使得不同处理组的核与核之间的数据传输能够并行处理,大大提高了总的数据带宽,提高了芯片的性能。The chip provided by the embodiment of the present invention includes a plurality of interconnection structures. The processing cores in the same processing core group realize data transmission through the interconnection structure connected to the group, so that data transmission between cores in different processing groups can be achieved. Parallel processing greatly improves the total data bandwidth and improves the performance of the chip.
进一步地,归属于不同所述处理核组的各个处理核之间能进行数据传输。Further, data transmission can be performed between processing cores belonging to different processing core groups.
进一步地,归属于一个处理核组的任意两个处理核之间传输的数据量大于分别归属于任意两个不同处理核组的处理核之间传输的数据量。Further, the amount of data transmitted between any two processing cores belonging to a processing core group is greater than the amount of data transmitted between any two different processing core groups.
进一步地,多个所述互联结构包括第一互联结构和第二互联结构;所述第一互联结构连接有第一处理核;所述第一处理核与所述第二互联结构连接。Further, the plurality of interconnection structures includes a first interconnection structure and a second interconnection structure; the first interconnection structure is connected to a first processing core; the first processing core is connected to the second interconnection structure.
进一步地,所述第一互联结构还连接有第二处理核;所述第二互联结构连接有第三处理核;所述第二处理核通过所述第一处理核与所述第三处理核实现数据传输。Further, the first interconnect structure is also connected to a second processing core; the second interconnect structure is connected to a third processing core; the second processing core passes through the first processing core and the third processing core Realize data transmission.
进一步地,所述互联结构包括第三互联结构;所述第三互联结构连接的全部所述处理核与其他所述互联结构不连接。Further, the interconnection structure includes a third interconnection structure; all the processing cores connected by the third interconnection structure are not connected to other interconnection structures.
进一步地,每个所述互联结构连接的所述处理核与其他所述互联结构不连接。Further, the processing core connected to each interconnection structure is not connected to other interconnection structures.
进一步地,每个所述处理核包括至少一个传输单元,所述互联结构的带宽满足所述传输单元的带宽需求。Further, each of the processing cores includes at least one transmission unit, and the bandwidth of the interconnection structure meets the bandwidth requirement of the transmission unit.
进一步地,所述互联结构的带宽满足与所述互联结构连接的所述处理核组内的各个所述处理核的带宽需求。Further, the bandwidth of the interconnect structure meets the bandwidth requirements of each of the processing cores in the processing core group connected to the interconnect structure.
进一步地,互联结构的带宽大于与所述互联结构连接的所述处理核组内的各个所述处理核的带宽需求。Further, the bandwidth of the interconnection structure is greater than the bandwidth requirements of each of the processing cores in the processing core group connected to the interconnection structure.
进一步地,多个所述互联结构至少设置有两种时钟频率。Further, at least two clock frequencies are provided for the plurality of interconnect structures.
进一步地,多个所述互联结构至少设置有两种位宽值。Further, at least two bit width values are set for the plurality of interconnect structures.
进一步地,至少有两个处理核组的带宽需求不同。Furthermore, at least two processing core groups have different bandwidth requirements.
根据本发明的第二方面,提供了一种卡板,包括一个或多个第一方面提供的芯片。According to a second aspect of the present invention, a card board is provided, which includes one or more chips provided in the first aspect.
根据本发明的第三方面,还提供了一种电子设备,包括一个或多个第二方面提供的卡板。According to the third aspect of the present invention, there is also provided an electronic device, including one or more card boards provided in the second aspect.
根据本发明的第四方面,提供了一种核间数据传输方法,用于第一方面提供的芯片中,该方法包括:属于同一个处理核组的各个所述处理核通过与所述处理核组连接的互联结构实现数据传输;分别属于两个不同处理核组的两个所述处理核,至少通过两个所述互联结构实现数据传输。According to a fourth aspect of the present invention, there is provided an inter-core data transmission method used in the chip provided in the first aspect. The method includes: each of the processing cores belonging to the same processing core group passes through the processing cores. The group-connected interconnection structure realizes data transmission; the two processing cores respectively belonging to two different processing core groups realize data transmission through at least two of the interconnection structures.
进一步地,分别属于两个不同处理核组的两个所述处理核,至少通过两个所述互联结构实现数据传输,包括:所述分别属于两个不同处理核组的两个所述处理核,经由同时属于所述两个不同处理核组中的一个处理核,通过两个所述互联结构实现数据传输。Further, the two processing cores respectively belonging to two different processing core groups realize data transmission through at least two interconnection structures, including: the two processing cores respectively belonging to two different processing core groups , Through one processing core belonging to the two different processing core groups at the same time, data transmission is realized through the two interconnection structures.
根据本发明的第五方面,提供了一种计算机存储介质,所述计算机存储介质上存储有计算机程序,所述程序被处理器执行时实现第四方面的核间数据传输方法的步骤。According to a fifth aspect of the present invention, there is provided a computer storage medium having a computer program stored on the computer storage medium, and when the program is executed by a processor, the steps of the inter-core data transmission method of the fourth aspect are realized.
根据本发明的第六方面,提供了一种电子设备,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述程序时实现第四方面的核间数据传输方法的步骤。According to a sixth aspect of the present invention, there is provided an electronic device including a memory, a processor, and a computer program stored in the memory and capable of running on the processor. The processor implements The fourth aspect of the steps of the inter-core data transmission method.
根据本发明的第七方面,提供一种计算机程序产品,其中,包括计算机指令,当所述计算机指令被计算设备执行时,所述计算设备可以执行第四方 面的核间数据传输方法的步骤。According to a seventh aspect of the present invention, a computer program product is provided, which includes computer instructions, and when the computer instructions are executed by a computing device, the computing device can execute the steps of the fourth aspect of the inter-core data transmission method.
(三)有益效果(3) Beneficial effects
本发明的上述技术方案具有如下有益的技术效果:The above technical solution of the present invention has the following beneficial technical effects:
(1)本发明实施方式提供的芯片,设置有多个互联结构,同一处理核组的处理核间通过与该组连接的互联结构实现数据传输,使得不同处理核组的数据传输能够并行处理,一方面,大大提高了总的数据带宽,提高了核与核之间的数据传输效率,降低了功耗,提高了芯片的性能;另一方面,由于不同组的数据传输能够并行处理,不同组的数据传输不会相互干扰,从而避免发生数据拥塞现象,提高了芯片的性能。(1) The chip provided by the embodiment of the present invention is provided with multiple interconnection structures. The processing cores of the same processing core group realize data transmission through the interconnection structure connected to the group, so that the data transmission of different processing core groups can be processed in parallel. On the one hand, it greatly improves the total data bandwidth, improves the efficiency of data transmission between cores, reduces power consumption, and improves the performance of the chip; on the other hand, because different groups of data transmission can be processed in parallel, different groups The data transmission will not interfere with each other, thereby avoiding data congestion and improving the performance of the chip.
(2)本发明实施方式提供的芯片,分属于同一组的核与核之间在数据传输时,只需要考虑同一组内的核的数据格式,无需考虑芯片的其他分组的其他核的数据格式,降低了芯片中电路的复杂度。(2) In the chip provided by the embodiment of the present invention, when data is transmitted between cores belonging to the same group, only the data format of the cores in the same group needs to be considered, and there is no need to consider the data format of other cores in other groups of the chip. , Reduce the complexity of the circuit in the chip.
(3)由于一个处理核组之间采用一个互联结构实现数据传输,不同的互联结构可以设置不同的时钟频率和不同的位宽,可以按照每组数据传输的需要设置相应的位宽,相比于一个芯片中所有的核采用一个互联结构进行数据传输,降低了功耗,提升整个芯片的性能且增加了芯片设计的灵活性,并且根据不同的处理核组的实际带宽的需求,来确定互联结构合适的位宽及频率,能够节省芯片的面积。(3) Since a processing core group adopts an interconnection structure to realize data transmission, different interconnection structures can be set with different clock frequencies and different bit widths, and the corresponding bit widths can be set according to the needs of each group of data transmission. All cores in a chip use an interconnection structure for data transmission, which reduces power consumption, improves the performance of the entire chip and increases the flexibility of chip design. The interconnection is determined according to the actual bandwidth requirements of different processing core groups. The appropriate bit width and frequency of the structure can save the area of the chip.
附图说明Description of the drawings
图1为一种芯片的结构示意图;Figure 1 is a schematic diagram of the structure of a chip;
图2是根据本发明一实施方式的芯片的结构图;FIG. 2 is a structural diagram of a chip according to an embodiment of the present invention;
图3是根据本发明一实施方式的芯片的结构图;Fig. 3 is a structural diagram of a chip according to an embodiment of the present invention;
图4是根据本发明一实施方式的核间数据传输方法流程示意图。FIG. 4 is a schematic flowchart of a method for transmitting data between cores according to an embodiment of the present invention.
具体实施方式Detailed ways
为使本发明的目的、技术方案和优点更加清楚明了,下面结合具体实施方式并参照附图,对本发明进一步详细说明。应该理解,这些描述只是示例性的,而并非要限制本发明的范围。此外,在以下说明中,省略了对公知结构和技术的描述,以避免不必要地混淆本发明的概念。In order to make the objectives, technical solutions, and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with specific embodiments and with reference to the accompanying drawings. It should be understood that these descriptions are only exemplary, and are not intended to limit the scope of the present invention. In addition, in the following description, descriptions of well-known structures and technologies are omitted to avoid unnecessarily obscuring the concept of the present invention.
在多(众)核芯片中,所有的核可能是同构的,也就是每一个核都是相同的结构,所有的核中可能也存在异构的核,也就是至少存在两种不同的核。多(众)核架构中,经常会有多个核同时参与某个任务的执行,此时,各核之间会有数据的传输。因此,对于多核芯片,采用哪种架构,如何进行数据传输,使同构或者异构的核有机的构成一款性能卓越的芯片,至关重要。In a multi-core chip, all cores may be isomorphic, that is, each core has the same structure, and there may also be heterogeneous cores in all cores, that is, there are at least two different cores. . In a multi-core architecture, multiple cores often participate in the execution of a certain task at the same time. At this time, there will be data transmission between the cores. Therefore, for a multi-core chip, which architecture is adopted, how to perform data transmission, so that homogeneous or heterogeneous cores can form a chip with excellent performance are very important.
图1为一种芯片的结构示意图。Figure 1 is a schematic diagram of the structure of a chip.
在图1所示的芯片结构中,该芯片只有一个片上网络,所有的核与核之间通过一个片上网络(NoC,Network On Chip)相互连通,交换数据。In the chip structure shown in Figure 1, the chip has only one network on chip, and all cores are connected to each other through a network on chip (NoC, Network On Chip) to exchange data.
如图1所示,该芯片设置有多种结构的核,多种结构的核包括K 1_C 1、K 2_C 1及K M_C 1,其中M表示核的种类为M种,每一种核的数量为多个,例如第一种核K 1包括K 1_C 1、K 1_C 2及K 1_C NM,其中NM表示为每一种核的个数为NM个。即K M_C NM表示为第M种结构的第NM个核。 As shown in Figure 1, the chip is equipped with cores of various structures, including K 1 _C 1 , K 2 _C 1 and K M _C 1 , where M indicates that the type of core is M, each The number of cores is multiple. For example, the first type of core K 1 includes K 1 _C 1 , K 1 _C 2 and K 1 _C NM , where NM means that the number of each type of core is NM. That is, K M _C NM is represented as the NM-th nucleus of the M-th structure.
由于该芯片中核的种类很多(M种),且不同种类的核与核之间使用同一个NoC进行数据交换,这样导致每一个核都需要知道所有的其他的所需要交换数据的核的数据格式(位宽),由于核与核之间使用同一个NoC,所以对发送或者接收的数据包的格式具有较高的要求,即发送核将数据格式编码成接收核的数据格式,或者接收核根据发送核类型将数据重新解码再编码成接收核的数据格式,导致数据包路由算法的复杂度高,各个核的负担很重。Since there are many types of cores in this chip (M types), and different types of cores use the same NoC for data exchange, each core needs to know the data format of all other cores that need to exchange data. (Bit width). Since the same NoC is used between the core and the core, there are higher requirements on the format of the data packet sent or received, that is, the sending core encodes the data format into the data format of the receiving core, or the receiving core is based on The sending core type re-decodes and encodes the data into the data format of the receiving core, resulting in a high complexity of the data packet routing algorithm and a heavy burden on each core.
此外,由于所有核共用一个NoC,由于核与核之间交换的数据量不同,当有一对核进行数据传输时,其他的核无法采用NoC进行数据传输,需要等待该对核数据传输之后,进行数据传输,这样就会产生数据传输延时的情况,导致芯片的性能降低。例如,第一个核与第二个核之间数据传输带宽很大,在第一个核与第二个核数据传输过程中,第三个核产生了需要发送给第四个 核的数据,这部分数据量比较小,但是对实时性的要求很高,而由于所有的核共用一个NoC,第三个核需要等到第一个核与第二个核之间的数据传输完成之后,再进行数据传输,这样实时性要求高的数据没有及时的传输,影响芯片的性能。In addition, because all cores share a NoC, and the amount of data exchanged between cores is different, when there is a pair of cores for data transmission, the other cores cannot use NoC for data transmission. You need to wait for the pair of cores to transmit data. Data transmission, this will cause data transmission delays, resulting in a decrease in the performance of the chip. For example, the data transmission bandwidth between the first core and the second core is very large. During the data transmission between the first core and the second core, the third core generates data that needs to be sent to the fourth core. This part of the data volume is relatively small, but the requirements for real-time performance are very high, and because all cores share a NoC, the third core needs to wait until the data transmission between the first core and the second core is completed before proceeding. Data transmission, such data with high real-time requirements is not transmitted in time, which affects the performance of the chip.
为解决上述问题,提出本发明的技术方案。In order to solve the above-mentioned problems, the technical solution of the present invention is proposed.
下面将详细说明本申请一实施方式提供的芯片。在本发明的描述中,需要说明的是,术语“第一”、“第二”、“第三”、“第四”仅用于描述目的,而不能理解为指示或暗示相对重要性。此外,下面所描述的本发明不同实施方式中所涉及的技术特征只要彼此之间未构成冲突就可以相互结合。The chip provided by an embodiment of the present application will be described in detail below. In the description of the present invention, it should be noted that the terms "first", "second", "third", and "fourth" are only used for descriptive purposes, and cannot be understood as indicating or implying relative importance. In addition, the technical features involved in the different embodiments of the present invention described below can be combined with each other as long as they do not conflict with each other.
图2是根据本发明一实施方式的芯片的结构图。Fig. 2 is a structural diagram of a chip according to an embodiment of the present invention.
如图2所示,该芯片包括:多个互联结构,每个所述互联结构连接一个处理核组,一个所述互联结构连接的全部所述处理核为一个处理核组;所述处理核组内的各个所述处理核通过与所述处理核组连接的所述互联结构进行数据传输;其中,多个所述互联结构之间不直接相连。As shown in FIG. 2, the chip includes: a plurality of interconnection structures, each interconnection structure is connected to a processing core group, all the processing cores connected by one interconnection structure are a processing core group; the processing core group Each of the processing cores in the processing core performs data transmission through the interconnection structure connected to the processing core group; wherein, a plurality of the interconnection structures are not directly connected.
在图2所示的实施方式中,芯片包括o个互联结构,其中,互联结构可以是核间互联结构Fabric、片上网络(Network On Chip,Noc)、总线bus或开关switch。本发明互联结构采用片上网络Noc为例,但不以此为限。其中,o个互联结构分别为NoC 1、NoC 2…NoC o,每个NoC连接一个处理核组。处理核组内的各个处理核通过与处理核组连接的NoC进行数据传输。 In the embodiment shown in FIG. 2, the chip includes o interconnection structures, where the interconnection structure may be an inter-core interconnection structure Fabric, a network on chip (Network On Chip, Noc), a bus, or a switch. The interconnection structure of the present invention uses the on-chip network Noc as an example, but is not limited to this. Among them, o interconnected structures are NoC 1 , NoC 2 … NoC o , and each NoC is connected to a processing core group. Each processing core in the processing core group performs data transmission through the NoC connected to the processing core group.
该芯片包括M种核,每一种核的个数分别为N M个。例如,K 2_C N2表示为第2种核的第N 2个核,K M_C NM表示为第M种的第N M个核。 The chip includes a seed cores M, each core number N M number, respectively. For example, K 2 _C N2 represents the N 2th nucleus of the second type of nucleus, and K M _C NM represents the N Mth nucleus of the Mth type of nucleus.
在图2所示的实施方式中,第一处理核组包括核K 1_C 1、核K 1_C 2…核K 1_C N1、核K 2_C 1和核K 2_C 2。第一片上网络NoC 1连接该第一处理核组,即第一处理核组的每个处理核都与NoC 1连接。该第一处理核组的核与核之间通过第一片上网络NoC 1实现数据传输。例如当核K 1_C 1有数据需要传输给核K 1_C 2时,将该数据发送至第一片上网络NoC 1,第一片上网络NoC 1将数据传输给核K 1_C 2In the embodiment shown in FIG. 2, the first processing core group includes core K 1 _C 1 , core K 1 _C 2 … core K 1 _C N1 , core K 2 _C 1 and core K 2 _C 2 . The first network on chip NoC 1 is connected to the first processing core group, that is, each processing core of the first processing core group is connected to NoC 1 . Data transmission between the cores of the first processing core group is achieved through the first network on chip NoC 1. For example, when the core K 1 _C 1 has data to be transmitted to the core K 1 _C 2 , the data is sent to the first on-chip network NoC 1 , and the first on-chip network NoC 1 transmits the data to the core K 1 _C 2 .
第二处理核组包括:核K 1_C 1、核K 1_C N1、核K 2_C 2、…核K 2_C N2、核K M_C 1 和核K M_C 2,该第二处理核组的每个处理核都与第二片上网络NoC 2连接。该第二处理核组的核与核之间通过第二片上网络NoC 2实现数据传输。 The second processing core group includes: core K 1 _C 1 , core K 1 _C N1 , core K 2 _C 2 , ... core K 2 _C N2 , core K M _C 1 and core K M _C 2 , this second processing core group Each of the processing cores is connected to the second NoC 2 on-chip network. Data transmission between the cores of the second processing core group is achieved through the second network on chip NoC 2.
第o处理核组包括核K 1_C N1、核K 2_C 2、核K M_C 2、…核K M_C NM。该第o处理核组的每个核都与第o片上网络NoC o连接。该第o处理核组的核与核之间通过第o互联结构NoC O实现数据传输。 The o-th processing core group includes a core K 1 _C N1 , a core K 2 _C 2 , a core K M _C 2 , ... a core K M _C NM . Each core of the o-th processing core group is connected to the o-th on-chip network NoC o . Data transmission is realized between the cores of the o-th processing core group through the o-th interconnect structure NoC O.
本发明实施方式提供的芯片,设置有多个互联结构,每个所述互联结构连接一个处理核组,同一处理核组的各个核之间通过与该组连接的互联结构实现数据传输,不同组的核与核之间的数据传输能够并行处理,一方面,属于同一组的核与核之间在数据传输时,只需要考虑同一组内的核的数据格式,无需考虑芯片的其他分组的其他核的数据格式,降低了芯片中电路的复杂度,提高了核与核之间的数据传输效率,降低了功耗,提高了芯片的性能;另一方面,由于不同组的数据传输能够并行处理,不同组的数据传输不会相互干扰,从而避免发生数据拥塞现象,提高了芯片的性能。The chip provided by the embodiment of the present invention is provided with a plurality of interconnection structures, and each interconnection structure is connected to a processing core group. Each core of the same processing core group realizes data transmission through the interconnection structure connected to the group. The data transmission between cores can be processed in parallel. On the one hand, when transmitting data between cores belonging to the same group, only the data format of the cores in the same group needs to be considered, and there is no need to consider other groups of the chip. The data format of the core reduces the complexity of the circuit in the chip, improves the data transmission efficiency between the core and the core, reduces the power consumption, and improves the performance of the chip; on the other hand, because the data transmission of different groups can be processed in parallel , The data transmission of different groups will not interfere with each other, thereby avoiding data congestion and improving the performance of the chip.
在一个实施方式中,归属于不同所述处理核组的各个处理核之间能进行数据传输。In one embodiment, data transmission can be performed between processing cores belonging to different processing core groups.
在一个优选的实施例中,多个互联结构包括第一互联结构和第二互联结构;第一互联结构连接有第一处理核;第一处理核与所述第二互联结构连接。In a preferred embodiment, the plurality of interconnection structures includes a first interconnection structure and a second interconnection structure; the first interconnection structure is connected to a first processing core; and the first processing core is connected to the second interconnection structure.
具体地,在本实施例中,同一个处理核可以同时设置在两个不同处理核组中,例如,在图2所示的实施方式中,处理核K 1_C 1既设置在第一处理核组中,又设置在第二处理核组中,即,该处理核K 1_C 1既与第一片上网络NoC 1连接,又与第二片上网络NoC 2连接。 Specifically, in this embodiment, the same processing core can be set in two different processing core groups at the same time. For example, in the embodiment shown in FIG. 2, the processing cores K 1 _C 1 are both set in the first processing core. In the group, it is also set in the second processing core group, that is, the processing core K 1 _C 1 is connected to both the first network on chip NoC 1 and the second network on chip NoC 2 .
在一个实施例中,归属于同一处理核组的任意两个处理核之间传输的数据量大于分别归属于任意两个不同组的处理核之间传输的数据量。In one embodiment, the amount of data transmitted between any two processing cores belonging to the same processing core group is greater than the amount of data transmitted between any two different groups of processing cores.
具体地,在本实施例中,如果某两个核之间存在比较大量的数据传输,那么它们会分配在同一个组中。Specifically, in this embodiment, if there is a relatively large amount of data transmission between certain two cores, they will be allocated in the same group.
在一个实施例中,所述第一互联结构还连接有第二处理核;所述第二互联结构连接有第三处理核;所述第二处理核通过所述第一处理核与所述第三 处理核实现数据传输。In one embodiment, the first interconnection structure is further connected to a second processing core; the second interconnection structure is connected to a third processing core; the second processing core is connected to the first processing core through the first processing core. Three processing cores realize data transmission.
具体地,在本实施例中,若分别属于两个分组的两个核之间有数据需要传输,一般来讲这两个核之间需要传输的数据量会比较少,如果某两个核之间只有很少的数据需要交换,则可通过一些搭桥的核中转进行交换,即通过同时位于这两组的核来实现数据传输。例如,与NoC 1连接的K 2_C 1要发送数据到与NoC 2连接的K M_C 1,由于它们不在一个处理核组中,那么K 2_C 1可以通过一个既与NoC 1连接又与NoC 2连接的处理核进行传输,例如K 1_C 1或者K 2_C 2。即K 2_C 1将数据通过NoC 1发送至K 1_C 1,K 1_C 1通过NoC 2将数据发送至K M_C 1,从而实现数据传输。 Specifically, in this embodiment, if there is data to be transmitted between two cores that belong to two groups, generally speaking, the amount of data that needs to be transmitted between the two cores will be relatively small. There is only a small amount of data that needs to be exchanged between them, which can be exchanged through some bridged core transfers, that is, data transmission is realized through cores located in these two groups at the same time. For example, 1 is connected to the NoC K 2 _C 1 to send data to a NoC K M _C connection 21, since they are not a processing core group, then K 2 _C 1 by a both the NoC 1 is connected also with NoC 2 The connected processing core performs transmission, such as K 1 _C 1 or K 2 _C 2 . I.e., K 2 _C 1 transmits data to the K 1 _C 1 through NoC 1, K 1 _C 1 transmits data to the K M _C 1 through NoC 2, in order to achieve data transmission.
例如,在图2所示的实施方式中,第一处理核组的处理核K 1_C 2需要将数据发送给第二处理核组的处理核K M_C 2,可以通过处理核K 1_C 1、K 1_C N1或K 2_C 2实现。 For example, in the embodiment shown in FIG. 2, the processing cores K 1 _C 2 of the first processing core group need to send data to the processing cores K M _C 2 of the second processing core group, which can pass through the processing cores K 1 _C 1 , K 1 _C N1 or K 2 _C 2 implementation.
具体地,处理核K 1_C 2通过第一片上网络NoC 1将数据发送给处理核K 1_C 1,处理核K 1_C 1将数据通过第二片上网络NoC 2发送至处理核K M_C 2Specifically, the processing core K 1 _C 2 sends data to the processing core K 1 _C 1 through the first on-chip network NoC 1 , and the processing core K 1 _C 1 sends the data to the processing core K M _C through the second on-chip network NoC 2 2 .
在一个实施例中,多个互联结构包括第三互联结构;第三互联结构连接的全部所述处理核与其他所述互联结构不连接。In one embodiment, the multiple interconnection structures include a third interconnection structure; all the processing cores connected by the third interconnection structure are not connected to other interconnection structures.
例如,在图2所示的实施方式中,芯片还包括第三片上网络,第三片上网络连接有一个第三处理核组,第三处理核组中包括核K 1_C 3、和核K 3_C 1,这两个核与其他的互联结构不连接。 For example, in the embodiment shown in FIG. 2, the chip further includes a third network on chip, the third network on chip is connected to a third processing core group, and the third processing core group includes cores K 1 _C 3 and core K 3 _C 1 , these two cores are not connected to other interconnect structures.
在本发明的一个实施例中,芯片中的每个互联结构连接的所述处理核与其他所述互联结构不连接。在本实施例中,每个互联结构连接的处理核与其他的互联结构都不连接,即分别属于两个处理核组的两个处理核之间没有数据传输。In an embodiment of the present invention, the processing core connected to each interconnect structure in the chip is not connected to other interconnect structures. In this embodiment, the processing cores connected to each interconnection structure are not connected to other interconnection structures, that is, there is no data transmission between the two processing cores respectively belonging to the two processing core groups.
在一个实施例中,每个所述处理核包括至少一个传输单元,所述互联结构的带宽满足所述传输单元的带宽需求。例如,互联结构的带宽与传输单元的带宽需求相等,以使得处理核能够将数据发送至互联结构中。In an embodiment, each of the processing cores includes at least one transmission unit, and the bandwidth of the interconnect structure meets the bandwidth requirement of the transmission unit. For example, the bandwidth of the interconnect structure is equal to the bandwidth requirement of the transmission unit, so that the processing core can send data to the interconnect structure.
其中,传输单元的带宽需求可以是传输单元在一定时间内发送一定量的 数据所需要的带宽。Among them, the bandwidth requirement of the transmission unit may be the bandwidth required by the transmission unit to send a certain amount of data within a certain period of time.
在一个实施例中,互联结构的带宽满足与该互联结构连接的所述处理核组内的各个所述处理核的带宽需求。In one embodiment, the bandwidth of the interconnect structure meets the bandwidth requirements of each of the processing cores in the processing core group connected to the interconnect structure.
其中,处理核的带宽需求可以是该处理核在一定时间内发送一定量的数据所需要的带宽。Wherein, the bandwidth requirement of the processing core may be the bandwidth required by the processing core to send a certain amount of data within a certain period of time.
优选的,互联结构的带宽大于与该互联结构连接的所述处理核组内的各个所述处理核的带宽需求。这样每一个互联结构都能够最匹配与互联结构连接的处理核组内所有处理核的结构,一方面能最大程度的节省面积,另一方面,使得芯片具有较好的性能。Preferably, the bandwidth of the interconnect structure is greater than the bandwidth requirements of each of the processing cores in the processing core group connected to the interconnect structure. In this way, each interconnection structure can best match the structure of all the processing cores in the processing core group connected to the interconnection structure. On the one hand, it can save the area to the greatest extent, and on the other hand, it can make the chip have better performance.
在一个实施例中,芯片中至少有两个处理核组的带宽需求不同,例如,第一组带宽需求为1MBps,第二组为10MBps,这样可以基于处理核组需求的不同,进而设置相应结构的Noc,能最大程度的节省Noc的面积。In one embodiment, at least two processing core groups in the chip have different bandwidth requirements. For example, the bandwidth requirement of the first group is 1MBps, and the second group is 10MBps. In this way, the corresponding structure can be set based on the different requirements of the processing core groups. The Noc can save the area of Noc to the greatest extent.
图3是本发明一实施方式提供的芯片结构示意图。FIG. 3 is a schematic diagram of a chip structure provided by an embodiment of the present invention.
如图3所示,该芯片包括6个核,这6个核中包括3种结构不同的核,即包括第一种核K 1、第二种核K 2和第三种核K 3,每一种核有2个。 As shown in Figure 3, the chip includes 6 cores. These 6 cores include 3 cores with different structures, namely, the first core K 1 , the second core K 2 and the third core K 3 , each There are two types of nuclei.
其中,第一种核K 1中的两个核为:核K 1_C 1和核K 1_C 2,第二种核K 2中的两个核为:核K 2_C 1和核K 2_C 2,第三种核K 2中的两个核为:核K 3_C 1和核K 3_C 2Among them, the two cores in the first type of core K 1 are: core K 1 _C 1 and core K 1 _C 2 , and the two cores in the second type of core K 2 are: core K 2 _C 1 and core K 2 _C 2. The two cores in the third type of core K 2 are: core K 3 _C 1 and core K 3 _C 2 .
该芯片中的这6个核分为三组。The 6 cores in this chip are divided into three groups.
第一处理核组由核K 1_C 1、核K 1_C 2、核K 2_C 1和K 2_C 2构成,第一处理核组中每个核都与第一互联结构NoC 1连接。 The first processing core group is composed of core K 1 _C 1 , core K 1 _C 2 , core K 2 _C 1 and K 2 _C 2 , and each core in the first processing core group is connected to the first interconnection structure NoC 1 .
第二处理核组由K 1_C 1、K 2_C 2、K 3_C 1、K 3_C 2构成,第二处理核组中每个核都与第二互联结构NoC 2连接。 The second processing core group is composed of K 1 _C 1 , K 2 _C 2 , K 3 _C 1 , K 3 _C 2 , and each core in the second processing core group is connected to the second interconnection structure NoC 2 .
第三处理核组由K 1_C 2、K 2_C 2、K 3_C 2构成,第三处理核组中每个核都与第三互联结构NoC 3连接。 The third processing core group is composed of K 1 _C 2 , K 2 _C 2 , and K 3 _C 2 , and each core in the third processing core group is connected to the third interconnection structure NoC 3 .
在一个优选的实施方式中,芯片中的每个核包括至少一个计算单元。In a preferred embodiment, each core in the chip includes at least one computing unit.
其中,计算单元可以是执行单元(Execution Unit,EU),或者运算单元(Processing Unit,PU)。Among them, the calculation unit may be an execution unit (Execution Unit, EU) or a processing unit (Processing Unit, PU).
芯片中包括多个互联结构,每个互联结构连接有一个处理核组,同一组处理核通过与该处理核组连接的互联结构连接来实现数据传输,该互联结构会根据其对应组的核所需要的带宽,核所使用的数据位宽,设计专用于本组核的数据包和相应的协议。The chip includes multiple interconnected structures. Each interconnected structure is connected to a processing core group. The same group of processing cores are connected to the processing core group to achieve data transmission. The interconnected structure will be based on the corresponding group of cores. The required bandwidth, the data bit width used by the core, and the data packet and corresponding protocol dedicated to this group of cores are designed.
在一个实施例中,芯片中的多个所述互联结构中至少设置有两种时钟频率。由于芯片中设置有多个互联结构,每个互联结构连接一组处理核,各组处理核之间的数据传输可以并行处理,所以芯片中的多个互联结构可以设置多种不同的时钟频率。In an embodiment, at least two clock frequencies are set in the plurality of interconnect structures in the chip. Since multiple interconnect structures are provided in the chip, each interconnect structure is connected to a group of processing cores, and data transmission between the groups of processing cores can be processed in parallel, so multiple interconnect structures in the chip can be set with multiple different clock frequencies.
可选的,芯片中的多个互联结构也可以设置为相同的时钟频率。Optionally, multiple interconnect structures in the chip can also be set to the same clock frequency.
在一个实施例中,多个所述互联结构中至少设置有两种位宽值。由于芯片中设置有多个互联结构,每个互联结构连接一组处理核,各组处理核之间的数据传输可以并行处理,所以互联结构中可以设置有多种位宽值。In one embodiment, at least two bit width values are set in the plurality of interconnect structures. Since multiple interconnection structures are provided in the chip, each interconnection structure is connected to a group of processing cores, and the data transmission between each group of processing cores can be processed in parallel, so multiple bit width values can be set in the interconnection structure.
可选的,芯片中的多个互联结构也可以设置为相同的位宽值。Optionally, multiple interconnect structures in the chip may also be set to the same bit width value.
需要说明的是,带宽=位宽*时钟频率,若某一处理核组的处理核之间数据传输需要较高的带宽,则可以根据该处理核组的带宽需求,确定NoC的带宽(例如确定合适的时钟频率和位宽),以使得NoC的带宽满足处理核组中各个处理核的带宽需求。例如,设置较高的NoC的位宽。另外,由于各个处理核组之间的数据传输相互独立,若其他的处理核组不需要较高的带宽,则可根据其需求,设置相适应的NoC位宽,这样能够节省芯片的面积,从而节省了功耗。It should be noted that bandwidth = bit width * clock frequency. If a higher bandwidth is required for data transmission between processing cores of a certain processing core group, the bandwidth of the NoC can be determined according to the bandwidth requirements of the processing core group (for example, determine Appropriate clock frequency and bit width), so that the bandwidth of the NoC meets the bandwidth requirements of each processing core in the processing core group. For example, set a higher NoC bit width. In addition, since the data transmission between each processing core group is independent of each other, if other processing core groups do not require a higher bandwidth, you can set a suitable NoC bit width according to their needs, which can save the area of the chip, thereby Save power consumption.
而且,本发明实施方式的芯片,由于一个处理核组之间采用一个NoC实现数据传输,不同的NoC可以设置不同的时钟频率和不同的位宽,可以按照每组数据传输的需要设置相应的位宽,相比于一个芯片中所有的核采用一个互联结构进行传输,降低了功耗,提升整个芯片的性能且增加了芯片设计的灵活性,还能够节省面积。Moreover, in the chip of the embodiment of the present invention, since one NoC is used for data transmission between a processing core group, different NoCs can be set with different clock frequencies and different bit widths, and corresponding bits can be set according to the needs of each group of data transmission. Compared with all cores in a chip using an interconnect structure for transmission, it reduces power consumption, improves the performance of the entire chip, increases the flexibility of chip design, and saves area.
本发明的上述技术方案具有如下有益的技术效果:The above technical solution of the present invention has the following beneficial technical effects:
(1)本发明实施方式提供的芯片,设置有多个互联结构,同一处理核组 之间通过与该处理核组连接的互联结构实现数据传输,使得不同处理核组的核与核之间的数据传输能够并行处理,一方面,大大提高了总的数据带宽,提高了核与核之间的数据传输效率,降低了功耗,提高了芯片的性能;另一方面,由于不同组的数据传输能够并行处理,不同组的数据传输不会相互干扰,从而避免发生数据拥塞现象,提高了芯片的性能。(1) The chip provided by the embodiment of the present invention is provided with a plurality of interconnection structures, and data transmission between the same processing core group is realized through the interconnection structure connected to the processing core group, so that the cores of different processing core groups can communicate with each other. Data transmission can be processed in parallel. On the one hand, it greatly increases the total data bandwidth, improves the efficiency of data transmission between cores, reduces power consumption, and improves the performance of the chip; on the other hand, due to the data transmission of different groups It can be processed in parallel, and the data transmission of different groups will not interfere with each other, thereby avoiding data congestion and improving the performance of the chip.
(2)本发明实施方式提供的芯片,分属于同一组的核与核之间在数据传输时,只需要考虑同一组内的核的数据格式,无需考虑芯片的其他分组的其他核的数据格式,降低了芯片中电路的复杂度。(2) In the chip provided by the embodiment of the present invention, when data is transmitted between cores belonging to the same group, only the data format of the cores in the same group needs to be considered, and there is no need to consider the data format of other cores in other groups of the chip. , Reduce the complexity of the circuit in the chip.
本发明一实施方式还提供了一种卡板,包括一个或多个上述实施方式提供的芯片。An embodiment of the present invention also provides a card board, which includes one or more chips provided in the foregoing embodiments.
本发明一实施方式还提供了一种电子设备,包括一个或多个上述实施方式提供的卡板。An embodiment of the present invention also provides an electronic device, including one or more card boards provided in the foregoing embodiments.
图4是本发明一实施方式提供的核间数据传输方法流程示意图。FIG. 4 is a schematic flowchart of a method for transmitting data between cores according to an embodiment of the present invention.
如图4所示,该方法包括步骤S101~步骤S102:As shown in Figure 4, the method includes steps S101 to S102:
步骤S101,属于同一个处理核组的各个所述处理核通过与所述处理核组连接的互联结构实现数据传输。Step S101, each of the processing cores belonging to the same processing core group realizes data transmission through the interconnection structure connected to the processing core group.
步骤S102,分别属于两个不同处理核组的两个所述处理核,至少通过两个所述互联结构实现数据传输。In step S102, the two processing cores respectively belonging to two different processing core groups implement data transmission through at least two interconnection structures.
在一个实施例中,所述分别属于两个不同处理核组的两个所述处理核,至少通过两个所述互联结构实现数据传输步骤包括:In one embodiment, the step of implementing data transmission for the two processing cores respectively belonging to two different processing core groups through at least two interconnection structures includes:
所述分别属于两个不同处理核组的两个所述处理核,经由同时属于所述两个不同处理核组中的一个处理核,通过两个所述互联结构实现数据传输。The two processing cores respectively belonging to two different processing core groups realize data transmission through the two interconnection structures via one processing core belonging to the two different processing core groups at the same time.
例如,在图3所示的实施例中,NoC 1中的K 2_C 1要发送数据到在NoC 2中的K 3_C 1,由于它们不在一个组中,那么K 2_C 1可以通过一个既在NoC 1又在NoC 2中的核进行传输,例如K 1_C 1或者K 2_C 2。即K 2_C 1将数据通过NoC 1发送至K 1_C 1,K 1_C 1通过NoC 2将数据发送至K 3_C 1,从而实现数据传输。 For example, in the embodiment shown in FIG. 3, NoC 1 K 2 _C 1 of the data to be transmitted in the NoC 2 K 3 _C 1, since they are not in a group, then K 2 _C 1 either by a In NoC 1 and in NoC 2 , the core transmits, for example, K 1 _C 1 or K 2 _C 2 . I.e., K 2 _C 1 transmits data to the K 1 _C 1 through NoC 1, K 1 _C 1 transmits the data through the NoC 2 to K 3 _C 1, in order to achieve data transmission.
需要说明的是,分别属于两个不同处理核组的两个处理核,还可以通过 多个互联结构实现数据传输。It should be noted that two processing cores belonging to two different processing core groups can also realize data transmission through multiple interconnection structures.
例如,在图3所示的实施例中,处理核K 2_C 1可以通过NoC 1、NoC 2和NoC 3将数据发送给处理核K 3_C 1。具体地,处理核K 2_C 1通过NoC 1将数据发送给处理核K 1_C 2,处理核K 1_C 2通过NoC 3将数据发送给处理核K 3_C 2,处理核K 3_C 2将该数据通过NoC 2发送给处理核K 3_C 1For example, in the embodiment shown in FIG. 3, the processing core K 2 _C 1 can send data to the processing core K 3 _C 1 through NoC 1 , NoC 2 and NoC 3 . Specifically, the processing core K 2 _C 1 sends data to the processing core K 1 _C 2 through NoC 1 , the processing core K 1 _C 2 sends the data to the processing core K 3 _C 2 through NoC 3 , and the processing core K 3 _C 2 will The data is sent to the processing core K 3 _C 1 through NoC 2 .
本发明实施方式提供的核间数据传输方法,属于同一个处理核组的各个处理核通过与处理核组连接的互联结构实现数据传输,使得不同组核之间的数据传输能够并行处理,一方面,大大提高了总的数据带宽,提高了核与核之间的数据传输效率,降低了功耗,提高了芯片的性能;另一方面,由于不同组的数据传输能够并行处理,不同组的数据传输不会相互干扰,从而避免发生数据拥塞现象,提高了芯片的性能。In the inter-core data transmission method provided by the embodiment of the present invention, each processing core belonging to the same processing core group realizes data transmission through an interconnect structure connected to the processing core group, so that data transmission between different groups of cores can be processed in parallel. , Which greatly improves the total data bandwidth, improves the efficiency of data transmission between cores, reduces power consumption, and improves the performance of the chip; on the other hand, because the data transmission of different groups can be processed in parallel, the data of different groups can be processed in parallel. The transmission will not interfere with each other, thereby avoiding data congestion and improving the performance of the chip.
根据本发明的一个实施方式,提供了一种计算机存储介质,所述计算机存储介质上存储有计算机程序,所述程序被处理器执行时实现上述实施方式提供的核间数据传输方法的步骤。According to an embodiment of the present invention, a computer storage medium is provided, and a computer program is stored on the computer storage medium. When the program is executed by a processor, the steps of the inter-core data transmission method provided in the foregoing embodiment are implemented.
根据本发明的一个实施方式,提供了一种电子设备,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述程序时实现上述实施方式提供的核间数据传输方法的步骤。According to an embodiment of the present invention, there is provided an electronic device including a memory, a processor, and a computer program stored on the memory and running on the processor, and the processor implements The steps of the inter-core data transmission method provided in the foregoing implementation manners.
根据本发明的一个实施方式,提供一种计算机程序产品,其中,包括计算机指令,当所述计算机指令被计算设备执行时,所述计算设备可以执行上述实施方式提供的核间数据传输方法的步骤。According to an embodiment of the present invention, there is provided a computer program product, which includes computer instructions, and when the computer instructions are executed by a computing device, the computing device can execute the steps of the inter-core data transmission method provided in the above embodiments .
本公开附图中的流程图和框图,图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及 的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。The flowcharts and block diagrams in the drawings of the present disclosure illustrate the possible implementation architecture, functions, and operations of the system, method, and computer program product according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagram may represent a module, program segment, or part of code, and the module, program segment, or part of code contains one or more for realizing the specified logical function Executable instructions. It should also be noted that, in some alternative implementations, the functions marked in the block may also occur in a different order from the order marked in the drawings. For example, two blocks shown in succession can actually be executed substantially in parallel, or they can sometimes be executed in the reverse order, depending on the functions involved. It should also be noted that each block in the block diagram and/or flowchart, and the combination of the blocks in the block diagram and/or flowchart, can be implemented by a dedicated hardware-based system that performs the specified functions or operations Or it can be realized by a combination of dedicated hardware and computer instructions.
描述于本公开实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。其中,单元的名称在某种情况下并不构成对该单元本身的限定。The units involved in the embodiments described in the present disclosure can be implemented in software or hardware. Among them, the name of the unit does not constitute a limitation on the unit itself under certain circumstances.
本文中以上描述的功能可以至少部分地由一个或多个硬件逻辑部件来执行。例如,非限制性地,可以使用的示范类型的硬件逻辑部件包括:现场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、片上系统(SOC)、复杂可编程逻辑设备(CPLD)等等。The functions described above in this document may be performed at least in part by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that can be used include: Field Programmable Gate Array (FPGA), Application Specific Integrated Circuit (ASIC), Application Specific Standard Product (ASSP), System on Chip (SOC), Complex Programmable Logical device (CPLD) and so on.
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。In the context of the present disclosure, a machine-readable medium may be a tangible medium, which may contain or store a program for use by the instruction execution system, apparatus, or device or in combination with the instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or any suitable combination of the foregoing. More specific examples of machine-readable storage media would include electrical connections based on one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the foregoing.
应当理解的是,本发明的上述具体实施方式仅仅用于示例性说明或解释本发明的原理,而不构成对本发明的限制。因此,在不偏离本发明的精神和范围的情况下所做的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。此外,本发明所附权利要求旨在涵盖落入所附权利要求范围和边界、或者这种范围和边界的等同形式内的全部变化和修改例。It should be understood that the above-mentioned specific embodiments of the present invention are only used to exemplarily illustrate or explain the principle of the present invention, and do not constitute a limitation to the present invention. Therefore, any modifications, equivalent substitutions, improvements, etc. made without departing from the spirit and scope of the present invention should be included in the protection scope of the present invention. In addition, the appended claims of the present invention are intended to cover all changes and modifications that fall within the scope and boundary of the appended claims, or equivalent forms of such scope and boundary.
以上参照本发明的实施例对本发明予以了说明。但是,这些实施例仅仅是为了说明的目的,而并非为了限制本发明的范围。本发明的范围由所附权利要求及其等价物限定。不脱离本发明的范围,本领域技术人员可以做出多 种替换和修改,这些替换和修改都应落在本发明的范围之内。The present invention has been described above with reference to the embodiments of the present invention. However, these examples are for illustrative purposes only, and are not intended to limit the scope of the present invention. The scope of the present invention is defined by the appended claims and their equivalents. Without departing from the scope of the present invention, those skilled in the art can make many substitutions and modifications, and these substitutions and modifications should fall within the scope of the present invention.
显然,上述实施例仅仅是为清楚地说明所作的举例,而并非对实施方式的限定。对于所属领域的普通技术人员来说,在上述说明的基础上还可以做出其它不同形式的变化或变动。这里无需也无法对所有的实施方式予以穷举。而由此所引伸出的显而易见的变化或变动仍处于本发明创造的保护范围之中。Obviously, the above-mentioned embodiments are merely examples for clear description, and are not intended to limit the implementation manners. For those of ordinary skill in the art, other changes or modifications in different forms can be made on the basis of the above description. It is unnecessary and impossible to list all the implementation methods here. The obvious changes or changes derived from this are still within the protection scope created by the present invention.

Claims (16)

  1. 一种芯片,其特征在于,包括:多个互联结构,每个所述互联结构连接至少两个处理核,一个所述互联结构连接的全部所述处理核为一个处理核组;A chip, characterized by comprising: a plurality of interconnection structures, each interconnection structure is connected to at least two processing cores, and all the processing cores connected by one interconnection structure are a processing core group;
    归属于同一所述处理核组的各个所述处理核通过与所述处理核组连接的所述互联结构进行数据传输;Each of the processing cores belonging to the same processing core group performs data transmission through the interconnection structure connected to the processing core group;
    其中,多个所述互联结构之间不直接相连。Wherein, the multiple interconnection structures are not directly connected.
  2. 如权利要求1所述的芯片,其特征在于,归属于不同所述处理核组的各个处理核之间能进行数据传输。8. The chip of claim 1, wherein data transmission can be performed between processing cores belonging to different processing core groups.
  3. 如权利要求2所述的芯片,其特征在于,The chip of claim 2, wherein:
    归属于一个所述处理核组的任意两个处理核之间传输的数据量大于分别归属于任意两个不同处理核组的处理核之间传输的数据量。The amount of data transmitted between any two processing cores belonging to one processing core group is greater than the amount of data transmitted between any two different processing core groups.
  4. 如权利要求1-3任一项所述的芯片,其特征在于,The chip according to any one of claims 1-3, wherein:
    多个所述互联结构包括第一互联结构和第二互联结构;The plurality of interconnection structures includes a first interconnection structure and a second interconnection structure;
    所述第一互联结构连接有第一处理核;The first interconnection structure is connected with a first processing core;
    所述第一处理核与所述第二互联结构连接。The first processing core is connected to the second interconnect structure.
  5. 如权利要求4所述的芯片,其特征在于,The chip of claim 4, wherein:
    所述第一互联结构还连接有第二处理核;所述第二互联结构连接有第三处理核;The first interconnect structure is also connected with a second processing core; the second interconnect structure is connected with a third processing core;
    所述第二处理核通过所述第一处理核与所述第三处理核实现数据传输。The second processing core implements data transmission through the first processing core and the third processing core.
  6. 如权利要求4或5所述的芯片,其特征在于,The chip according to claim 4 or 5, wherein:
    所述互联结构包括第三互联结构;The interconnection structure includes a third interconnection structure;
    所述第三互联结构连接的全部所述处理核与其他所述互联结构不连接。All the processing cores connected by the third interconnect structure are not connected with other interconnect structures.
  7. 如权利要求1-6任一项所述的芯片,其特征在于,所述互联结构 的带宽满足与所述互联结构连接的所述处理核组内的各个所述处理核的带宽需求。The chip according to any one of claims 1 to 6, wherein the bandwidth of the interconnect structure meets the bandwidth requirements of each of the processing cores in the processing core group connected to the interconnect structure.
  8. 根据权利要求7所述的芯片,其特征在于,每个所述处理核包括至少一个传输单元,所述互联结构的带宽满足与所述互联结构连接的各个处理核的所述传输单元的带宽需求。8. The chip according to claim 7, wherein each of the processing cores includes at least one transmission unit, and the bandwidth of the interconnection structure meets the bandwidth requirements of the transmission units of each processing core connected to the interconnection structure .
  9. 如权利要求1-8任一项所述的芯片,其特征在于,其特征在于,多个所述互联结构至少设置有两种时钟频率;和/或8. The chip according to any one of claims 1-8, wherein the plurality of interconnect structures are provided with at least two clock frequencies; and/or
    多个所述互联结构至少设置有两种位宽值。At least two bit width values are set for the plurality of interconnect structures.
  10. 一种卡板,其特征在于,包括一个或多个如权利要求1-9任一项所述的芯片。A card board, characterized by comprising one or more chips according to any one of claims 1-9.
  11. 一种电子设备,其特征在于,包括一个或多个如权利要求10所述的卡板。An electronic device, characterized by comprising one or more card boards according to claim 10.
  12. 一种核间数据传输方法,用于如权利要求1-9任一项所述的芯片中,其特征在于,包括:An inter-core data transmission method, used in the chip according to any one of claims 1-9, characterized in that it comprises:
    属于同一个处理核组的各个所述处理核通过与所述处理核组连接的互联结构实现数据传输;Each of the processing cores belonging to the same processing core group realizes data transmission through an interconnect structure connected to the processing core group;
    分别属于两个不同处理核组的两个所述处理核,至少通过两个所述互联结构实现数据传输。The two processing cores respectively belonging to two different processing core groups realize data transmission through at least two interconnection structures.
  13. 根据权利要求12所述的方法,其特征在于,所述分别属于两个不同处理核组的两个所述处理核,至少通过两个所述互联结构实现数据传输,包括:The method according to claim 12, wherein the two processing cores respectively belonging to two different processing core groups realize data transmission through at least two interconnection structures, comprising:
    所述分别属于两个不同处理核组的两个所述处理核,经由同时属于所述两个不同处理核组中的一个处理核,通过两个所述互联结构实现数据传输。The two processing cores respectively belonging to two different processing core groups realize data transmission through the two interconnection structures via one processing core belonging to the two different processing core groups at the same time.
  14. 一种计算机存储介质,其特征在于,所述计算机存储介质上存储有计算机程序,所述程序被处理器执行时实现如权利要求12或13所述的核间数据传输方法的步骤。A computer storage medium, characterized in that a computer program is stored on the computer storage medium, and when the program is executed by a processor, the steps of the inter-core data transmission method according to claim 12 or 13 are realized.
  15. 一种电子设备,其特征在于,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述程序时实现如权利要求12或13所述的核间数据传输方法的步骤。An electronic device, comprising a memory, a processor, and a computer program stored on the memory and capable of running on the processor. The processor executes the program as claimed in claim 12 or 13. The steps of the inter-core data transmission method.
  16. 一种计算机程序产品,其特征在于,包括计算机指令,当所述计算机指令被计算设备执行时,所述计算设备可以执行如权利要求12或13所述的核间数据传输方法的步骤。A computer program product, characterized by comprising computer instructions, when the computer instructions are executed by a computing device, the computing device can execute the steps of the inter-core data transmission method according to claim 12 or 13.
PCT/CN2020/118709 2019-12-04 2020-09-29 Chip and inter-core data transmission method WO2021109698A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911230193.9 2019-12-04
CN201911230193.9A CN112905523B (en) 2019-12-04 2019-12-04 Chip and inter-core data transmission method

Publications (1)

Publication Number Publication Date
WO2021109698A1 true WO2021109698A1 (en) 2021-06-10

Family

ID=76110785

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/118709 WO2021109698A1 (en) 2019-12-04 2020-09-29 Chip and inter-core data transmission method

Country Status (2)

Country Link
CN (1) CN112905523B (en)
WO (1) WO2021109698A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113778938B (en) * 2021-08-31 2024-03-12 上海阵量智能科技有限公司 Method, device and chip for determining network-on-chip topology structure

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101546302A (en) * 2009-05-07 2009-09-30 复旦大学 Interconnection structure of multicore processor and hierarchical interconnection design method based on interconnection structure
CN103336756A (en) * 2013-07-19 2013-10-02 中国人民解放军信息工程大学 Generating device for data computational node
CN205540720U (en) * 2016-04-06 2016-08-31 龙芯中科技术有限公司 Treater interconnection structure and mainboard
CN106528052A (en) * 2016-12-26 2017-03-22 北京海嘉科技有限公司 Microprocessor architecture based on distributed function units

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9063730B2 (en) * 2010-12-20 2015-06-23 Intel Corporation Performing variation-aware profiling and dynamic core allocation for a many-core processor

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101546302A (en) * 2009-05-07 2009-09-30 复旦大学 Interconnection structure of multicore processor and hierarchical interconnection design method based on interconnection structure
CN103336756A (en) * 2013-07-19 2013-10-02 中国人民解放军信息工程大学 Generating device for data computational node
CN205540720U (en) * 2016-04-06 2016-08-31 龙芯中科技术有限公司 Treater interconnection structure and mainboard
CN106528052A (en) * 2016-12-26 2017-03-22 北京海嘉科技有限公司 Microprocessor architecture based on distributed function units

Also Published As

Publication number Publication date
CN112905523A (en) 2021-06-04
CN112905523B (en) 2023-11-17

Similar Documents

Publication Publication Date Title
KR101725755B1 (en) Mechanism to control resource utilization with adaptive routing
US8694704B2 (en) Method and apparatus for congestion-aware routing in a computer interconnection network
US9680770B2 (en) System and method for using a multi-protocol fabric module across a distributed server interconnect fabric
JP6093867B2 (en) Non-uniform channel capacity in the interconnect
US20180181191A1 (en) Systems and methods for facilitating low power on a network-on-chip
KR101713405B1 (en) Method to optimize network data flows within a constrained system
US9253085B2 (en) Hierarchical asymmetric mesh with virtual routers
US9489028B2 (en) Managing sideband segments in on-die system fabric
CN101739241A (en) On-chip multi-core DSP cluster and application extension method
US9225545B2 (en) Determining a path for network traffic between nodes in a parallel computer
CN110636139B (en) Optimization method and system for cloud load balancing
Deb et al. Cost effective routing techniques in 2D mesh NoC using on-chip transmission lines
Xin et al. A low-latency NoC router with lookahead bypass
WO2016082198A1 (en) Network on chip, communication control method and controller
WO2016197388A1 (en) On-chip optical interconnection structure and network
CN103106173A (en) Interconnection method among cores of multi-core processor
WO2021109698A1 (en) Chip and inter-core data transmission method
DE102020130555A1 (en) ADAPTIVE DATA SENDING BASED ON LOAD FUNCTIONS
CN114185840A (en) Three-dimensional multi-bare-chip interconnection network structure
Sun et al. DCBuf: a high-performance wireless network-on-chip architecture with distributed wireless interconnects and centralized buffer sharing
CN107220209B (en) Three-dimensional optical network-on-chip architecture based on faults, communication method and optical router
Ueno et al. VCSN: Virtual circuit-switching network for flexible and simple-to-operate communication in HPC FPGA cluster
US20180198682A1 (en) Strategies for NoC Construction Using Machine Learning
CN114445260A (en) Distributed GPU communication method and device based on FPGA
US9774498B2 (en) Hierarchical asymmetric mesh with virtual routers

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20896325

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20896325

Country of ref document: EP

Kind code of ref document: A1