CN112905523A - Chip and inter-core data transmission method - Google Patents

Chip and inter-core data transmission method Download PDF

Info

Publication number
CN112905523A
CN112905523A CN201911230193.9A CN201911230193A CN112905523A CN 112905523 A CN112905523 A CN 112905523A CN 201911230193 A CN201911230193 A CN 201911230193A CN 112905523 A CN112905523 A CN 112905523A
Authority
CN
China
Prior art keywords
processing core
chip
processing
data transmission
cores
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911230193.9A
Other languages
Chinese (zh)
Other versions
CN112905523B (en
Inventor
不公告发明人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Simm Computing Technology Co ltd
Original Assignee
Beijing Simm Computing Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Simm Computing Technology Co ltd filed Critical Beijing Simm Computing Technology Co ltd
Priority to CN201911230193.9A priority Critical patent/CN112905523B/en
Priority to PCT/CN2020/118709 priority patent/WO2021109698A1/en
Publication of CN112905523A publication Critical patent/CN112905523A/en
Application granted granted Critical
Publication of CN112905523B publication Critical patent/CN112905523B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7807System on chip, i.e. computer system on a single chip; System in package, i.e. computer system on one or more chips in a single package
    • G06F15/7825Globally asynchronous, locally synchronous, e.g. network on chip
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • G06F13/40Bus structure
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • G06F13/40Bus structure
    • G06F13/4004Coupling between buses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • G06F13/40Bus structure
    • G06F13/4004Coupling between buses
    • G06F13/4022Coupling between buses using switching circuits, e.g. switching matrix, connection or expansion network
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7807System on chip, i.e. computer system on a single chip; System in package, i.e. computer system on one or more chips in a single package
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Microelectronics & Electronic Packaging (AREA)
  • Mathematical Physics (AREA)
  • Multi Processors (AREA)

Abstract

The invention discloses a chip and an inter-core data transmission method. The chip includes: the system comprises a plurality of interconnection structures, a plurality of storage units and a plurality of processing units, wherein each interconnection structure is connected with a processing core group, and the processing core group comprises at least two processing cores; each processing core in the processing core group performs data transmission through the interconnection structure connected with the processing core group; and the plurality of interconnection structures are not directly connected with each other. According to the chip provided by the embodiment of the invention, data transmission is realized among the cores of the same processing core group through the interconnection structure corresponding to the group, and data transmission among the cores of different processing core groups can be processed in parallel, so that on one hand, the total data bandwidth is greatly improved, the data transmission efficiency among the cores is improved, the power consumption is reduced, and the performance of the chip is improved; on the other hand, because the data transmission of different groups can be processed in parallel, the data transmission of different groups can not interfere with each other, thereby avoiding the data congestion phenomenon and improving the performance of the chip.

Description

Chip and inter-core data transmission method
Technical Field
The invention relates to the technical field of chips, in particular to a chip and an inter-core data transmission method.
Background
With the development of science and technology, the human society is rapidly entering the intelligent era. The important characteristics of the intelligent era are that people obtain more and more data, the quantity of the obtained data is larger and larger, and the requirement on the speed of processing the data is higher and higher.
Chips are the cornerstone of data processing, which fundamentally determines the ability of people to process data. From the application field, the chip mainly has two routes: one is a general chip route, such as a Central Processing Unit (CPU), which provides great flexibility but is less computationally efficient in Processing domain-specific algorithms; the other is a special chip route, such as a Tensor Processing Unit (TPU), which can exert higher effective computing power in some specific fields, but has poorer or even no Processing capability in the more versatile and general fields.
Because the data of the intelligent era is various and huge in quantity, the chip is required to have extremely high flexibility, can process algorithms in different fields and in different days, has extremely high processing capacity, and can rapidly process extremely large and sharply increased data volume.
Disclosure of Invention
Objects of the invention
The invention aims to provide a chip and an inter-core data transmission method. The chip provided by the embodiment of the invention comprises a plurality of interconnection structures, and data transmission among the processing cores in the same processing core group is realized through the interconnection structures connected with the group, so that the data transmission among the cores in different groups can be processed in parallel, the total data bandwidth is greatly improved, and the performance of the chip is improved.
(II) technical scheme
To solve the above problem, a first aspect of the present invention provides a chip comprising: each interconnection structure is connected with one processing core group, and all the processing cores connected with one interconnection structure are one processing core group; each processing core belonging to the same processing core group performs data transmission through the interconnection structure connected with the processing core group; and the plurality of interconnection structures are not directly connected with each other.
The chip provided by the embodiment of the invention comprises a plurality of interconnection structures, and data transmission among the processing cores in the same processing core group is realized through the interconnection structures connected with the group, so that the data transmission among the cores of different processing groups can be processed in parallel, the total data bandwidth is greatly improved, and the performance of the chip is improved.
Further, data transmission can be performed between the processing cores belonging to different processing core groups.
Further, the amount of data transferred between any two processing cores belonging to one processing core group is larger than the amount of data transferred between processing cores respectively belonging to any two different processing core groups.
Further, the plurality of interconnection structures comprise a first interconnection structure and a second interconnection structure; the first interconnection structure is connected with a first processing core; the first processing core is connected with the second interconnection structure.
Further, the first interconnection structure is also connected with a second processing core; the second interconnection structure is connected with a third processing core; and the second processing core realizes data transmission through the first processing core and the third processing core.
Further, the interconnect structure comprises a third interconnect structure; all of the processing cores connected by the third interconnect structure are not connected to other interconnect structures.
Further, the processing core to which each of the interconnect structures is connected is not connected to other of the interconnect structures.
Further, each processing core includes at least one transmission unit, and the bandwidth of the interconnect structure meets the bandwidth requirement of the transmission unit.
Further, the bandwidth of the interconnect structure meets the bandwidth requirement of each processing core in the processing core group connected to the interconnect structure.
Further, the bandwidth of the interconnect structure is greater than the bandwidth requirement of each processing core in the processing core group connected to the interconnect structure.
Further, a plurality of the interconnection structures are provided with at least two clock frequencies.
Further, the plurality of interconnection structures are provided with at least two bit width values.
Further, at least two groups of processing cores have different bandwidth requirements.
According to a second aspect of the present invention, there is provided a card comprising one or more chips as provided in the first aspect.
According to a third aspect of the present invention, there is also provided an electronic apparatus including one or more cards provided in the second aspect.
According to a fourth aspect of the present invention, there is provided an inter-core data transmission method, used in the chip provided in the first aspect, the method including: each processing core belonging to the same processing core group realizes data transmission through an interconnection structure connected with the processing core group; and the two processing cores which respectively belong to two different processing core groups realize data transmission at least through the two interconnection structures.
Further, the two processing cores respectively belonging to two different processing core groups implement data transmission at least through the two interconnection structures, including: the two processing cores respectively belonging to the two different processing core groups realize data transmission through the two interconnection structures via one processing core simultaneously belonging to the two different processing core groups.
According to a fifth aspect of the present invention, there is provided a computer storage medium having stored thereon a computer program which, when executed by a processor, carries out the steps of the inter-core data transfer method of the fourth aspect.
According to a sixth aspect of the present invention, there is provided an electronic device, comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the method for transmitting data between cores of the fourth aspect when executing the program.
According to a seventh aspect of the present invention, there is provided a computer program product comprising computer instructions which, when executed by a computing device, may perform the steps of the method of the fourth aspect for inter-core data transfer.
(III) advantageous effects
The technical scheme of the invention has the following beneficial technical effects:
(1) the chip provided by the embodiment of the invention is provided with a plurality of interconnection structures, and data transmission is realized among the processing cores of the same processing core group through the interconnection structure connected with the group, so that the data transmission of different processing core groups can be processed in parallel, on one hand, the total data bandwidth is greatly improved, the data transmission efficiency between the cores is improved, the power consumption is reduced, and the performance of the chip is improved; on the other hand, because the data transmission of different groups can be processed in parallel, the data transmission of different groups can not interfere with each other, thereby avoiding the data congestion phenomenon and improving the performance of the chip.
(2) The chip provided by the embodiment of the invention only needs to consider the data format of the cores in the same group and does not need to consider the data formats of other cores of other groups of the chip when data transmission is carried out between the cores belonging to the same group, thereby reducing the complexity of circuits in the chip.
(3) Because an interconnection structure is adopted between processing core groups to realize data transmission, different interconnection structures can be provided with different clock frequencies and different bit widths, and corresponding bit widths can be set according to the requirement of each group of data transmission.
Drawings
FIG. 1 is a schematic diagram of a chip;
FIG. 2 is a block diagram of a chip according to an embodiment of the invention;
FIG. 3 is a block diagram of a chip according to an embodiment of the invention;
fig. 4 is a flowchart illustrating an inter-core data transmission method according to an embodiment of the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings in conjunction with the following detailed description. It should be understood that the description is intended to be exemplary only, and is not intended to limit the scope of the present invention. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present invention.
In many (many) core chips, all cores may be homogeneous, i.e., each core has the same structure, and heterogeneous cores may exist among all cores, i.e., at least two different cores exist. In a multi (many) core architecture, multiple cores often participate in the execution of a task at the same time, and at this time, data is transmitted between the cores. Therefore, it is important for the multi-core chip to adopt which architecture and how to transmit data, so that homogeneous or heterogeneous cores organically form a chip with excellent performance.
Fig. 1 is a schematic structural diagram of a chip.
In the Chip structure shown in fig. 1, the Chip has only one Network On Chip, and all cores are interconnected through one Network On Chip (NoC) to exchange data.
As shown in FIG. 1, the chip is provided with cores of various structures including K1_C1、K2_C1And KM_C1Where M denotes the number of M types of cores, each of which is a plurality, e.g. the first type of core K1Comprising K1_C1、K1_C2And K1_CNMWhere NM is expressed as the number of NM cores per core. Namely KM_CNMDenoted as the Nth core of the Mth structure.
Because the cores in the chip are of many types (M types), and different types of cores use the same NoC for data exchange, each core needs to know the data formats (bit widths) of all other cores that need to exchange data, and because the same NoC is used between the cores, the format of a transmitted or received data packet has a high requirement, that is, the transmitting core encodes the data format into the data format of the receiving core, or the receiving core re-decodes the data according to the type of the transmitting core and then encodes the data into the data format of the receiving core, so that the complexity of a packet routing algorithm is high, and the burden of each core is heavy.
In addition, since all cores share one NoC, because the amount of data exchanged between the cores is different, when one pair of cores performs data transmission, other cores cannot perform data transmission by using the NoC, and need to wait for the data transmission of the pair of cores to perform data transmission, which may cause a data transmission delay and reduce the performance of the chip. For example, the data transmission bandwidth between the first core and the second core is large, during the data transmission process between the first core and the second core, the third core generates data to be sent to the fourth core, the data amount of the data is small, but the requirement on the real-time performance is high, and since all the cores share one NoC, the third core needs to wait until the data transmission between the first core and the second core is completed, and then perform the data transmission, so that the data with high real-time performance requirement is not transmitted in time, and the performance of the chip is affected.
In order to solve the problems, the technical scheme of the invention is provided.
The chip provided by one embodiment of the present application will be described in detail below. In the description of the present invention, it should be noted that the terms "first", "second", "third", and "fourth" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. In addition, the technical features involved in the different embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
Fig. 2 is a block diagram of a chip according to an embodiment of the invention.
As shown in fig. 2, the chip includes: each interconnection structure is connected with one processing core group, and all the processing cores connected with one interconnection structure are one processing core group; each processing core in the processing core group performs data transmission through the interconnection structure connected with the processing core group; and the plurality of interconnection structures are not directly connected with each other.
In the embodiment shown in FIG. 2, the chip includes o interconnect structures, where the interconnect structures may be coresInterconnect Fabric, Network On Chip (Noc), bus, or switch. The interconnection structure of the present invention adopts a network on chip Noc as an example, but not limited thereto. Wherein, o interconnection structures are respectively NoC1、NoC2…NoCoEach NoC is connected to a group of processing cores. Each processing core within the processing core group performs data transmission via the NoC connected to the processing core group.
The chip comprises M kinds of cores, and the number of each kind of cores is NMAnd (4) respectively. For example, K2_CN2N, denoted as type 2 core2A nucleus, KM_CNMN-th denoted as M-thMAnd (4) a kernel.
In the embodiment shown in FIG. 2, the first processing core group includes core K1_C1Nucleus K1_C2… nuclear K1_CN1Nucleus K2_C1And nucleus K2_C2. First network on chip NoC1Connecting the first processing core group, i.e., each processing core of the first processing core group, with the NoC1And (4) connecting. The cores of the first processing core group are connected with each other through a first network on chip (NoC)1And realizing data transmission. For example when the nucleus K1_C1There is data to transmit to core K1_C2Then, the data is sent to the first network on chip NoC1First network on chip NoC1Transmitting data to core K1_C2
The second processing core group includes: nucleus K1_C1Nucleus K1_CN1Nucleus K2_C2… Nuclear K2_CN2Nucleus KM_C1And nucleus KM_C2Each processing core of the second processing core group is connected with a second network on chip NoC2And (4) connecting. The cores of the second processing core group pass through a second network on chip (NoC)2And realizing data transmission.
The o processing core group includes a core K1_CN1Nucleus K2_C2Nucleus KM_C2… Nuclear KM_CNM. Each core of the o-th processing core groupAnd o network on chip NoCoAnd (4) connecting. The cores of the o-th processing core group are interconnected through an o-th interconnection structure NoCOAnd realizing data transmission.
The chip provided by the embodiment of the invention is provided with a plurality of interconnection structures, each interconnection structure is connected with one processing core group, data transmission is realized between cores of the same processing core group through the interconnection structure connected with the group, and data transmission between cores of different groups can be processed in parallel; on the other hand, because the data transmission of different groups can be processed in parallel, the data transmission of different groups can not interfere with each other, thereby avoiding the data congestion phenomenon and improving the performance of the chip.
In one embodiment, data transfer is enabled between the processing cores belonging to different ones of the processing core groups.
In a preferred embodiment, the plurality of interconnect structures includes a first interconnect structure and a second interconnect structure; the first interconnection structure is connected with a first processing core; the first processing core is connected with the second interconnection structure.
In particular, in the embodiment, the same processing core may be simultaneously disposed in two different processing core groups, for example, in the embodiment shown in fig. 2, the processing core K1_C1Both in the first processing core group and in the second processing core group, i.e. the processing core K1_C1Network on chip (NoC) with first1Connected to a second network on chip NoC2And (4) connecting.
In one embodiment, the amount of data transferred between any two processing cores belonging to the same processing core group is greater than the amount of data transferred between processing cores belonging to any two different groups, respectively.
Specifically, in this embodiment, if there is a relatively large amount of data transfer between two cores, they are allocated in the same group.
In one embodiment, a second processing core is further connected to the first interconnect structure; the second interconnection structure is connected with a third processing core; and the second processing core realizes data transmission through the first processing core and the third processing core.
Specifically, in this embodiment, if there is data to be transmitted between two cores belonging to two groups, the data amount to be transmitted between the two cores will be generally small, and if there is only a small amount of data to be exchanged between two cores, the data may be exchanged through some bridged core relays, that is, the data transmission is realized through the cores located in the two groups at the same time. For example, with NoC1Linked K2_C1To send data to NoC2Linked KM_C1Since they are not in a processing core group, then K2_C1Can be combined with NoC1Connected to NoC2The coupled processing cores carrying out the transfer, e.g. K1_C1Or K2_C2. Namely K2_C1Passing data through NoC1To K1_C1,K1_C1By NoC2Sending data to KM_C1Thereby realizing data transmission.
For example, in the embodiment shown in FIG. 2, processing core K of the first processing core group1_C2Processing core K that needs to send data to the second processing core groupM_C2Can be processed by processing the kernel K1_C1、K1_CN1Or K2_C2And (5) realizing.
Specifically, processing core K1_C2Through a first network on chip NoC1Sending data to a processing core K1_C1Processing core K1_C1Passing data through a second network on chip NoC2Sent to processing core KM_C2
In one embodiment, the plurality of interconnect structures includes a third interconnect structure; all of the processing cores connected by the third interconnect structure are not connected to other of the interconnect structures.
For example, in the embodiment shown in fig. 2, the chip further includes a third network on chip, the third network on chip is connected to a third processing core group, and the third processing core group includes a core K1_C3And nucleus K3_C1The two cores are not connected to other interconnect structures.
In one embodiment of the invention, the processing core to which each interconnect structure in a chip is connected is not connected to other said interconnect structures. In this embodiment, the processing core connected to each interconnect structure is not connected to any other interconnect structure, that is, there is no data transmission between two processing cores belonging to two processing core groups respectively.
In one embodiment, each of the processing cores includes at least one transmission unit, and the bandwidth of the interconnect structure meets the bandwidth requirement of the transmission unit. For example, the bandwidth of the interconnect fabric is equal to the bandwidth requirement of the transfer unit to enable the processing cores to send data into the interconnect fabric.
Wherein the bandwidth requirement of the transmission unit may be a bandwidth required by the transmission unit to transmit a certain amount of data within a certain time.
In one embodiment, the bandwidth of the interconnect fabric satisfies the bandwidth requirements of each of the processing cores in the group of processing cores connected to the interconnect fabric.
The bandwidth requirement of a processing core may be a bandwidth required by the processing core to transmit a certain amount of data in a certain time.
Preferably, the bandwidth of the interconnect structure is greater than the bandwidth requirement of each processing core in the processing core group connected to the interconnect structure. Therefore, each interconnection structure can be matched with the structures of all the processing cores in the processing core group connected with the interconnection structure most, so that the area can be saved to the maximum extent on one hand, and on the other hand, the chip has better performance.
In one embodiment, the bandwidth requirements of at least two processing core groups in the chip are different, for example, the first group has a bandwidth requirement of 1MBps, and the second group has a bandwidth requirement of 10MBps, so that the Noc with the corresponding structure can be further arranged based on the difference of the requirements of the processing core groups, and the area of the Noc can be saved to the greatest extent.
Fig. 3 is a schematic diagram of a chip structure according to an embodiment of the invention.
As shown in FIG. 3, the chip includes 6 cores, and the 6 cores include 3 cores with different structures, namely, a first core K1A second nucleus K2And a third core K3There are 2 cores per core.
Wherein, the first kernel K1Two nuclei in (1) are: nucleus K1_C1And nucleus K1_C2Second type of nucleus K2Two nuclei in (1) are: nucleus K2_C1And nucleus K2_C2Third type of nucleus K2Two nuclei in (1) are: nucleus K3_C1And nucleus K3_C2
The 6 cores in the chip are divided into three groups.
The first processing core group is composed of a core K1_C1Nucleus K1_C2Nucleus K2_C1And K2_C2Each core in the first processing core group is connected with a first interconnection structure NoC1And (4) connecting.
The second processing core group is composed of K1_C1、K2_C2、K3_C1、K3_C2Each core in the second processing core group is connected with a second interconnection structure NoC2And (4) connecting.
The third processing core group consists of K1_C2、K2_C2、K3_C2Each core in the third processing core group is connected with a third interconnection structure NoC3And (4) connecting.
In a preferred embodiment, each core in the chip includes at least one computational unit.
The computing Unit may be an Execution Unit (EU) or an arithmetic Unit (PU).
The chip comprises a plurality of interconnection structures, each interconnection structure is connected with a processing core group, the same group of processing cores are connected with the interconnection structure connected with the processing core group to realize data transmission, and the interconnection structure can design a data packet and a corresponding protocol special for the group of cores according to the bandwidth required by the cores of the corresponding group and the data bit width used by the cores.
In one embodiment, at least two clock frequencies are provided in a plurality of said interconnect structures in a chip. Because a plurality of interconnection structures are arranged in the chip, each interconnection structure is connected with one group of processing cores, and data transmission among the groups of processing cores can be processed in parallel, the plurality of interconnection structures in the chip can be provided with various different clock frequencies.
Optionally, the plurality of interconnect structures in the chip may also be set to the same clock frequency.
In one embodiment, at least two bit width values are provided in a plurality of the interconnect structures. Because a plurality of interconnection structures are arranged in the chip, each interconnection structure is connected with one group of processing cores, and data transmission among the groups of processing cores can be processed in parallel, various bit width values can be arranged in the interconnection structures.
Optionally, a plurality of interconnect structures in the chip may also be set to the same bit width value.
If data transmission between processing cores of a certain processing core group requires a higher bandwidth, the bandwidth of the NoC may be determined (for example, an appropriate clock frequency and bit width are determined) according to the bandwidth requirement of the processing core group, so that the bandwidth of the NoC satisfies the bandwidth requirement of each processing core in the processing core group. For example, the bit width of the higher NoC is set. In addition, because the data transmission between each processing core group is independent, if other processing core groups do not need higher bandwidth, the adaptive NoC bit width can be set according to the requirements, so that the area of a chip can be saved, and the power consumption is saved.
In addition, according to the chip of the embodiment of the invention, one NoC is adopted between one processing core group to realize data transmission, different NoCs can set different clock frequencies and different bit widths, and corresponding bit widths can be set according to the requirement of each group of data transmission.
The technical scheme of the invention has the following beneficial technical effects:
(1) the chip provided by the embodiment of the invention is provided with a plurality of interconnection structures, and data transmission is realized between the same processing core group through the interconnection structure connected with the processing core group, so that the data transmission between the cores of different processing core groups can be processed in parallel, on one hand, the total data bandwidth is greatly improved, the data transmission efficiency between the cores is improved, the power consumption is reduced, and the performance of the chip is improved; on the other hand, because the data transmission of different groups can be processed in parallel, the data transmission of different groups can not interfere with each other, thereby avoiding the data congestion phenomenon and improving the performance of the chip.
(2) The chip provided by the embodiment of the invention only needs to consider the data format of the cores in the same group and does not need to consider the data formats of other cores of other groups of the chip when data transmission is carried out between the cores belonging to the same group, thereby reducing the complexity of circuits in the chip.
An embodiment of the present invention further provides a card board, which includes one or more chips provided in the above embodiments.
An embodiment of the present invention further provides an electronic device, including one or more of the cards provided in the above embodiments.
Fig. 4 is a flowchart illustrating an inter-core data transmission method according to an embodiment of the present invention.
As shown in fig. 4, the method includes steps S101 to S102:
step S101, each processing core belonging to the same processing core group realizes data transmission through an interconnection structure connected with the processing core group.
Step S102, the two processing cores respectively belonging to two different processing core groups realize data transmission at least through the two interconnection structures.
In an embodiment, the step of implementing data transmission by at least two of the processing cores belonging to two different processing core groups respectively through at least two of the interconnect structures includes:
the two processing cores respectively belonging to the two different processing core groups realize data transmission through the two interconnection structures via one processing core simultaneously belonging to the two different processing core groups.
For example, in the embodiment shown in FIG. 3, a NoC1K in (1)2_C1To send data to the NoC2K in (1)3_C1Since they are not in one group, then K2_C1Can be distributed through one existing NoC1And in NoC2The core in (1) transmitting, e.g. K1_C1Or K2_C2. Namely K2_C1Passing data through NoC1To K1_C1,K1_C1By NoC2Sending data to K3_C1Thereby realizing data transmission.
It should be noted that, the two processing cores respectively belonging to two different processing core groups may also implement data transmission through a plurality of interconnection structures.
For example, in the embodiment shown in FIG. 3, processing core K2_C1Can pass through NoC1、NoC2And NoC3Sending data to a processing core K3_C1. Specifically, processing core K2_C1By NoC1Sending data to a processing core K1_C2Processing core K1_C2By NoC3Sending data to a processing core K3_C2Processing core K3_C2Pass the data through the NoC2Sent to processing core K3_C1
According to the inter-core data transmission method provided by the embodiment of the invention, each processing core belonging to the same processing core group realizes data transmission through the interconnection structure connected with the processing core group, so that the data transmission among different groups of cores can be processed in parallel, on one hand, the total data bandwidth is greatly improved, the data transmission efficiency among the cores is improved, the power consumption is reduced, and the performance of a chip is improved; on the other hand, because the data transmission of different groups can be processed in parallel, the data transmission of different groups can not interfere with each other, thereby avoiding the data congestion phenomenon and improving the performance of the chip.
According to an embodiment of the present invention, a computer storage medium is provided, on which a computer program is stored, which when executed by a processor implements the steps of the inter-core data transmission method provided by the above-described embodiment.
According to an embodiment of the present invention, an electronic device is provided, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the processor implements the steps of the inter-core data transmission method provided in the foregoing embodiment.
According to an embodiment of the present invention, a computer program product is provided, which includes computer instructions, and when the computer instructions are executed by a computing device, the computing device may execute the steps of the inter-core data transmission method provided in the above embodiment.
The flowchart and block diagrams in the figures of the present disclosure illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present disclosure may be implemented by software or hardware. Where the name of an element does not in some cases constitute a limitation on the element itself.
The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
It is to be understood that the above-described embodiments of the present invention are merely illustrative of or explaining the principles of the invention and are not to be construed as limiting the invention. Therefore, any modification, equivalent replacement, improvement and the like made without departing from the spirit and scope of the present invention should be included in the protection scope of the present invention. Further, it is intended that the appended claims cover all such variations and modifications as fall within the scope and boundaries of the appended claims or the equivalents of such scope and boundaries.
The invention has been described above with reference to embodiments thereof. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present invention. The scope of the invention is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be devised by those skilled in the art without departing from the scope of the invention, and these alternatives and modifications are intended to be within the scope of the invention.
It should be understood that the above examples are only for clarity of illustration and are not intended to limit the embodiments. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. And obvious variations or modifications therefrom are within the scope of the invention.

Claims (10)

1. A chip, comprising: each interconnection structure is connected with at least two processing cores, and all the processing cores connected with one interconnection structure are a processing core group;
each processing core belonging to the same processing core group performs data transmission through the interconnection structure connected with the processing core group;
and the plurality of interconnection structures are not directly connected with each other.
2. The chip of claim 1, wherein data transfers are enabled between processing cores belonging to different ones of the groups of processing cores.
3. The chip of claim 2,
the amount of data transferred between any two processing cores belonging to one of the processing core groups is greater than the amount of data transferred between processing cores respectively belonging to any two different processing core groups.
4. The chip according to any of claims 1 to 3,
the plurality of interconnection structures comprise a first interconnection structure and a second interconnection structure;
the first interconnection structure is connected with a first processing core;
the first processing core is connected with the second interconnection structure.
5. The chip of claim 4,
the first interconnection structure is also connected with a second processing core; the second interconnection structure is connected with a third processing core;
and the second processing core realizes data transmission through the first processing core and the third processing core.
6. The chip of claim 4 or 5,
the interconnect structure comprises a third interconnect structure;
all of the processing cores connected by the third interconnect structure are not connected to other interconnect structures.
7. The chip of any of claims 1-6, wherein the interconnect fabric has a bandwidth that meets bandwidth requirements of each of the processing cores within the group of processing cores to which the interconnect fabric is connected.
8. The chip of any of claims 1-7, wherein a plurality of said interconnect structures are provided with at least two clock frequencies; and/or
The plurality of interconnection structures are provided with at least two bit width values.
9. An inter-core data transmission method used in the chip according to any one of claims 1 to 8, comprising:
each processing core belonging to the same processing core group realizes data transmission through an interconnection structure connected with the processing core group;
and the two processing cores which respectively belong to two different processing core groups realize data transmission at least through the two interconnection structures.
10. The method according to claim 9, wherein the two processing cores belonging to two different processing core groups respectively implement data transmission through at least two of the interconnect structures, and the method comprises:
the two processing cores respectively belonging to the two different processing core groups realize data transmission through the two interconnection structures via one processing core simultaneously belonging to the two different processing core groups.
CN201911230193.9A 2019-12-04 2019-12-04 Chip and inter-core data transmission method Active CN112905523B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201911230193.9A CN112905523B (en) 2019-12-04 2019-12-04 Chip and inter-core data transmission method
PCT/CN2020/118709 WO2021109698A1 (en) 2019-12-04 2020-09-29 Chip and inter-core data transmission method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911230193.9A CN112905523B (en) 2019-12-04 2019-12-04 Chip and inter-core data transmission method

Publications (2)

Publication Number Publication Date
CN112905523A true CN112905523A (en) 2021-06-04
CN112905523B CN112905523B (en) 2023-11-17

Family

ID=76110785

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911230193.9A Active CN112905523B (en) 2019-12-04 2019-12-04 Chip and inter-core data transmission method

Country Status (2)

Country Link
CN (1) CN112905523B (en)
WO (1) WO2021109698A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023029487A1 (en) * 2021-08-31 2023-03-09 上海商汤智能科技有限公司 Method and apparatus for determining topological structure of network-on-chip, and chip

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101546302A (en) * 2009-05-07 2009-09-30 复旦大学 Interconnection structure of multicore processor and hierarchical interconnection design method based on interconnection structure
US20120159496A1 (en) * 2010-12-20 2012-06-21 Saurabh Dighe Performing Variation-Aware Profiling And Dynamic Core Allocation For A Many-Core Processor
CN106528052A (en) * 2016-12-26 2017-03-22 北京海嘉科技有限公司 Microprocessor architecture based on distributed function units

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103336756B (en) * 2013-07-19 2016-01-27 中国人民解放军信息工程大学 A kind of generating apparatus of data computational node
CN205540720U (en) * 2016-04-06 2016-08-31 龙芯中科技术有限公司 Treater interconnection structure and mainboard

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101546302A (en) * 2009-05-07 2009-09-30 复旦大学 Interconnection structure of multicore processor and hierarchical interconnection design method based on interconnection structure
US20120159496A1 (en) * 2010-12-20 2012-06-21 Saurabh Dighe Performing Variation-Aware Profiling And Dynamic Core Allocation For A Many-Core Processor
CN106528052A (en) * 2016-12-26 2017-03-22 北京海嘉科技有限公司 Microprocessor architecture based on distributed function units

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023029487A1 (en) * 2021-08-31 2023-03-09 上海商汤智能科技有限公司 Method and apparatus for determining topological structure of network-on-chip, and chip

Also Published As

Publication number Publication date
CN112905523B (en) 2023-11-17
WO2021109698A1 (en) 2021-06-10

Similar Documents

Publication Publication Date Title
US11677662B2 (en) FPGA-efficient directional two-dimensional router
US8811422B2 (en) Single chip protocol converter
EP3298740B1 (en) Directional two-dimensional router and interconnection network for field programmable gate arrays
US8867559B2 (en) Managing starvation and congestion in a two-dimensional network having flow control
Ahmed et al. Architecture and design of efficient 3D network-on-chip (3D NoC) for custom multicore SoC
JP2002508100A (en) Packet routing switch to control access to shared memory at different data rates
EP3364625B1 (en) Device, system and method for adaptive payload compression in a network fabric
CN105740199A (en) Time sequence power estimation device and method of network on chip
CN111630487A (en) Centralized-distributed hybrid organization of shared memory for neural network processing
US20190251048A1 (en) Accelerating distributed stream processing
CN112905523A (en) Chip and inter-core data transmission method
CN114185840A (en) Three-dimensional multi-bare-chip interconnection network structure
TW200540644A (en) A single chip protocol converter
CN108429938A (en) In reconfigurable arrays processor optical interconnection network is communicated between cluster
Wang et al. A dynamic priority arbiter for Network-on-Chip
CN107220209B (en) Three-dimensional optical network-on-chip architecture based on faults, communication method and optical router
US11636061B2 (en) On-demand packetization for a chip-to-chip interface
US8788737B2 (en) Transport of PCI-ordered traffic over independent networks
Ueno et al. VCSN: Virtual circuit-switching network for flexible and simple-to-operate communication in HPC FPGA cluster
CN106933663B (en) A kind of multithread scheduling method and system towards many-core system
CN117221212B (en) Optical network on chip low congestion routing method and related equipment
CN112437032B (en) Data transmitting/receiving device and method, storage medium, and electronic apparatus
CN107205152A (en) H.265 encoder modeling method based on the network-on-chip traffic
Lu et al. Permutation on the mesh with reconfigurable bus: Algorithms and practical considerations
WO2020087249A1 (en) Multi-core chip structure

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant