CN112905523A

CN112905523A - Chip and inter-core data transmission method

Info

Publication number: CN112905523A
Application number: CN201911230193.9A
Authority: CN
Inventors: 不公告发明人
Original assignee: Beijing Simm Computing Technology Co ltd
Current assignee: Beijing Simm Computing Technology Co ltd
Priority date: 2019-12-04
Filing date: 2019-12-04
Publication date: 2021-06-04
Anticipated expiration: 2039-12-04
Also published as: CN112905523B; WO2021109698A1

Abstract

The invention discloses a chip and an inter-core data transmission method. The chip includes: the system comprises a plurality of interconnection structures, a plurality of storage units and a plurality of processing units, wherein each interconnection structure is connected with a processing core group, and the processing core group comprises at least two processing cores; each processing core in the processing core group performs data transmission through the interconnection structure connected with the processing core group; and the plurality of interconnection structures are not directly connected with each other. According to the chip provided by the embodiment of the invention, data transmission is realized among the cores of the same processing core group through the interconnection structure corresponding to the group, and data transmission among the cores of different processing core groups can be processed in parallel, so that on one hand, the total data bandwidth is greatly improved, the data transmission efficiency among the cores is improved, the power consumption is reduced, and the performance of the chip is improved; on the other hand, because the data transmission of different groups can be processed in parallel, the data transmission of different groups can not interfere with each other, thereby avoiding the data congestion phenomenon and improving the performance of the chip.

Description

Chip and inter-core data transmission method

Technical Field

The invention relates to the technical field of chips, in particular to a chip and an inter-core data transmission method.

Background

With the development of science and technology, the human society is rapidly entering the intelligent era. The important characteristics of the intelligent era are that people obtain more and more data, the quantity of the obtained data is larger and larger, and the requirement on the speed of processing the data is higher and higher.

Chips are the cornerstone of data processing, which fundamentally determines the ability of people to process data. From the application field, the chip mainly has two routes: one is a general chip route, such as a Central Processing Unit (CPU), which provides great flexibility but is less computationally efficient in Processing domain-specific algorithms; the other is a special chip route, such as a Tensor Processing Unit (TPU), which can exert higher effective computing power in some specific fields, but has poorer or even no Processing capability in the more versatile and general fields.

Because the data of the intelligent era is various and huge in quantity, the chip is required to have extremely high flexibility, can process algorithms in different fields and in different days, has extremely high processing capacity, and can rapidly process extremely large and sharply increased data volume.

Disclosure of Invention

Objects of the invention

The invention aims to provide a chip and an inter-core data transmission method. The chip provided by the embodiment of the invention comprises a plurality of interconnection structures, and data transmission among the processing cores in the same processing core group is realized through the interconnection structures connected with the group, so that the data transmission among the cores in different groups can be processed in parallel, the total data bandwidth is greatly improved, and the performance of the chip is improved.

(II) technical scheme

To solve the above problem, a first aspect of the present invention provides a chip comprising: each interconnection structure is connected with one processing core group, and all the processing cores connected with one interconnection structure are one processing core group; each processing core belonging to the same processing core group performs data transmission through the interconnection structure connected with the processing core group; and the plurality of interconnection structures are not directly connected with each other.

The chip provided by the embodiment of the invention comprises a plurality of interconnection structures, and data transmission among the processing cores in the same processing core group is realized through the interconnection structures connected with the group, so that the data transmission among the cores of different processing groups can be processed in parallel, the total data bandwidth is greatly improved, and the performance of the chip is improved.

Further, data transmission can be performed between the processing cores belonging to different processing core groups.

Further, the amount of data transferred between any two processing cores belonging to one processing core group is larger than the amount of data transferred between processing cores respectively belonging to any two different processing core groups.

Further, the plurality of interconnection structures comprise a first interconnection structure and a second interconnection structure; the first interconnection structure is connected with a first processing core; the first processing core is connected with the second interconnection structure.

Further, the first interconnection structure is also connected with a second processing core; the second interconnection structure is connected with a third processing core; and the second processing core realizes data transmission through the first processing core and the third processing core.

Further, the interconnect structure comprises a third interconnect structure; all of the processing cores connected by the third interconnect structure are not connected to other interconnect structures.

Further, the processing core to which each of the interconnect structures is connected is not connected to other of the interconnect structures.

Further, each processing core includes at least one transmission unit, and the bandwidth of the interconnect structure meets the bandwidth requirement of the transmission unit.

Further, the bandwidth of the interconnect structure meets the bandwidth requirement of each processing core in the processing core group connected to the interconnect structure.

Further, the bandwidth of the interconnect structure is greater than the bandwidth requirement of each processing core in the processing core group connected to the interconnect structure.

Further, a plurality of the interconnection structures are provided with at least two clock frequencies.

Further, the plurality of interconnection structures are provided with at least two bit width values.

Further, at least two groups of processing cores have different bandwidth requirements.

According to a second aspect of the present invention, there is provided a card comprising one or more chips as provided in the first aspect.

According to a third aspect of the present invention, there is also provided an electronic apparatus including one or more cards provided in the second aspect.

According to a fourth aspect of the present invention, there is provided an inter-core data transmission method, used in the chip provided in the first aspect, the method including: each processing core belonging to the same processing core group realizes data transmission through an interconnection structure connected with the processing core group; and the two processing cores which respectively belong to two different processing core groups realize data transmission at least through the two interconnection structures.

Further, the two processing cores respectively belonging to two different processing core groups implement data transmission at least through the two interconnection structures, including: the two processing cores respectively belonging to the two different processing core groups realize data transmission through the two interconnection structures via one processing core simultaneously belonging to the two different processing core groups.

According to a fifth aspect of the present invention, there is provided a computer storage medium having stored thereon a computer program which, when executed by a processor, carries out the steps of the inter-core data transfer method of the fourth aspect.

According to a sixth aspect of the present invention, there is provided an electronic device, comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the method for transmitting data between cores of the fourth aspect when executing the program.

According to a seventh aspect of the present invention, there is provided a computer program product comprising computer instructions which, when executed by a computing device, may perform the steps of the method of the fourth aspect for inter-core data transfer.

(III) advantageous effects

The technical scheme of the invention has the following beneficial technical effects:

(1) the chip provided by the embodiment of the invention is provided with a plurality of interconnection structures, and data transmission is realized among the processing cores of the same processing core group through the interconnection structure connected with the group, so that the data transmission of different processing core groups can be processed in parallel, on one hand, the total data bandwidth is greatly improved, the data transmission efficiency between the cores is improved, the power consumption is reduced, and the performance of the chip is improved; on the other hand, because the data transmission of different groups can be processed in parallel, the data transmission of different groups can not interfere with each other, thereby avoiding the data congestion phenomenon and improving the performance of the chip.

(2) The chip provided by the embodiment of the invention only needs to consider the data format of the cores in the same group and does not need to consider the data formats of other cores of other groups of the chip when data transmission is carried out between the cores belonging to the same group, thereby reducing the complexity of circuits in the chip.

(3) Because an interconnection structure is adopted between processing core groups to realize data transmission, different interconnection structures can be provided with different clock frequencies and different bit widths, and corresponding bit widths can be set according to the requirement of each group of data transmission.

Drawings

FIG. 1 is a schematic diagram of a chip;

FIG. 2 is a block diagram of a chip according to an embodiment of the invention;

FIG. 3 is a block diagram of a chip according to an embodiment of the invention;

fig. 4 is a flowchart illustrating an inter-core data transmission method according to an embodiment of the invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings in conjunction with the following detailed description. It should be understood that the description is intended to be exemplary only, and is not intended to limit the scope of the present invention. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present invention.

In many (many) core chips, all cores may be homogeneous, i.e., each core has the same structure, and heterogeneous cores may exist among all cores, i.e., at least two different cores exist. In a multi (many) core architecture, multiple cores often participate in the execution of a task at the same time, and at this time, data is transmitted between the cores. Therefore, it is important for the multi-core chip to adopt which architecture and how to transmit data, so that homogeneous or heterogeneous cores organically form a chip with excellent performance.

Fig. 1 is a schematic structural diagram of a chip.

In the Chip structure shown in fig. 1, the Chip has only one Network On Chip, and all cores are interconnected through one Network On Chip (NoC) to exchange data.

As shown in FIG. 1, the chip is provided with cores of various structures including K₁_C₁、K₂_C₁And K_M_C₁Where M denotes the number of M types of cores, each of which is a plurality, e.g. the first type of core K₁Comprising K₁_C₁、K₁_C₂And K₁_C_NMWhere NM is expressed as the number of NM cores per core. Namely K_M_C_NMDenoted as the Nth core of the Mth structure.

Because the cores in the chip are of many types (M types), and different types of cores use the same NoC for data exchange, each core needs to know the data formats (bit widths) of all other cores that need to exchange data, and because the same NoC is used between the cores, the format of a transmitted or received data packet has a high requirement, that is, the transmitting core encodes the data format into the data format of the receiving core, or the receiving core re-decodes the data according to the type of the transmitting core and then encodes the data into the data format of the receiving core, so that the complexity of a packet routing algorithm is high, and the burden of each core is heavy.

In addition, since all cores share one NoC, because the amount of data exchanged between the cores is different, when one pair of cores performs data transmission, other cores cannot perform data transmission by using the NoC, and need to wait for the data transmission of the pair of cores to perform data transmission, which may cause a data transmission delay and reduce the performance of the chip. For example, the data transmission bandwidth between the first core and the second core is large, during the data transmission process between the first core and the second core, the third core generates data to be sent to the fourth core, the data amount of the data is small, but the requirement on the real-time performance is high, and since all the cores share one NoC, the third core needs to wait until the data transmission between the first core and the second core is completed, and then perform the data transmission, so that the data with high real-time performance requirement is not transmitted in time, and the performance of the chip is affected.

In order to solve the problems, the technical scheme of the invention is provided.

The chip provided by one embodiment of the present application will be described in detail below. In the description of the present invention, it should be noted that the terms "first", "second", "third", and "fourth" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. In addition, the technical features involved in the different embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

Fig. 2 is a block diagram of a chip according to an embodiment of the invention.

As shown in fig. 2, the chip includes: each interconnection structure is connected with one processing core group, and all the processing cores connected with one interconnection structure are one processing core group; each processing core in the processing core group performs data transmission through the interconnection structure connected with the processing core group; and the plurality of interconnection structures are not directly connected with each other.

In the embodiment shown in FIG. 2, the chip includes o interconnect structures, where the interconnect structures may be coresInterconnect Fabric, Network On Chip (Noc), bus, or switch. The interconnection structure of the present invention adopts a network on chip Noc as an example, but not limited thereto. Wherein, o interconnection structures are respectively NoC₁、NoC₂…NoC_oEach NoC is connected to a group of processing cores. Each processing core within the processing core group performs data transmission via the NoC connected to the processing core group.

The chip comprises M kinds of cores, and the number of each kind of cores is N_MAnd (4) respectively. For example, K₂_C_N2N, denoted as type 2 core₂A nucleus, K_M_C_NMN-th denoted as M-th_MAnd (4) a kernel.

In the embodiment shown in FIG. 2, the first processing core group includes core K₁_C₁Nucleus K₁_C₂… nuclear K₁_C_N1Nucleus K₂_C₁And nucleus K₂_C₂. First network on chip NoC₁Connecting the first processing core group, i.e., each processing core of the first processing core group, with the NoC₁And (4) connecting. The cores of the first processing core group are connected with each other through a first network on chip (NoC)₁And realizing data transmission. For example when the nucleus K₁_C₁There is data to transmit to core K₁_C₂Then, the data is sent to the first network on chip NoC₁First network on chip NoC₁Transmitting data to core K₁_C₂。

The second processing core group includes: nucleus K₁_C₁Nucleus K₁_C_N1Nucleus K₂_C₂… Nuclear K₂_C_N2Nucleus K_M_C₁And nucleus K_M_C₂Each processing core of the second processing core group is connected with a second network on chip NoC₂And (4) connecting. The cores of the second processing core group pass through a second network on chip (NoC)₂And realizing data transmission.

The o processing core group includes a core K₁_C_N1Nucleus K₂_C₂Nucleus K_M_C₂… Nuclear K_M_C_NM. Each core of the o-th processing core groupAnd o network on chip NoC_oAnd (4) connecting. The cores of the o-th processing core group are interconnected through an o-th interconnection structure NoC_OAnd realizing data transmission.

The chip provided by the embodiment of the invention is provided with a plurality of interconnection structures, each interconnection structure is connected with one processing core group, data transmission is realized between cores of the same processing core group through the interconnection structure connected with the group, and data transmission between cores of different groups can be processed in parallel; on the other hand, because the data transmission of different groups can be processed in parallel, the data transmission of different groups can not interfere with each other, thereby avoiding the data congestion phenomenon and improving the performance of the chip.

In one embodiment, data transfer is enabled between the processing cores belonging to different ones of the processing core groups.

In a preferred embodiment, the plurality of interconnect structures includes a first interconnect structure and a second interconnect structure; the first interconnection structure is connected with a first processing core; the first processing core is connected with the second interconnection structure.

In particular, in the embodiment, the same processing core may be simultaneously disposed in two different processing core groups, for example, in the embodiment shown in fig. 2, the processing core K₁_C₁Both in the first processing core group and in the second processing core group, i.e. the processing core K₁_C₁Network on chip (NoC) with first₁Connected to a second network on chip NoC₂And (4) connecting.

In one embodiment, the amount of data transferred between any two processing cores belonging to the same processing core group is greater than the amount of data transferred between processing cores belonging to any two different groups, respectively.

Specifically, in this embodiment, if there is a relatively large amount of data transfer between two cores, they are allocated in the same group.

In one embodiment, a second processing core is further connected to the first interconnect structure; the second interconnection structure is connected with a third processing core; and the second processing core realizes data transmission through the first processing core and the third processing core.

Specifically, in this embodiment, if there is data to be transmitted between two cores belonging to two groups, the data amount to be transmitted between the two cores will be generally small, and if there is only a small amount of data to be exchanged between two cores, the data may be exchanged through some bridged core relays, that is, the data transmission is realized through the cores located in the two groups at the same time. For example, with NoC₁Linked K₂_C₁To send data to NoC₂Linked K_M_C₁Since they are not in a processing core group, then K₂_C₁Can be combined with NoC₁Connected to NoC₂The coupled processing cores carrying out the transfer, e.g. K₁_C₁Or K₂_C₂. Namely K₂_C₁Passing data through NoC₁To K₁_C₁，K₁_C₁By NoC₂Sending data to K_M_C₁Thereby realizing data transmission.

For example, in the embodiment shown in FIG. 2, processing core K of the first processing core group₁_C₂Processing core K that needs to send data to the second processing core group_M_C₂Can be processed by processing the kernel K₁_C₁、K₁_C_N1Or K₂_C₂And (5) realizing.

Specifically, processing core K₁_C₂Through a first network on chip NoC₁Sending data to a processing core K₁_C₁Processing core K₁_C₁Passing data through a second network on chip NoC₂Sent to processing core K_M_C₂。

In one embodiment, the plurality of interconnect structures includes a third interconnect structure; all of the processing cores connected by the third interconnect structure are not connected to other of the interconnect structures.

For example, in the embodiment shown in fig. 2, the chip further includes a third network on chip, the third network on chip is connected to a third processing core group, and the third processing core group includes a core K₁_C₃And nucleus K₃_C₁The two cores are not connected to other interconnect structures.

In one embodiment of the invention, the processing core to which each interconnect structure in a chip is connected is not connected to other said interconnect structures. In this embodiment, the processing core connected to each interconnect structure is not connected to any other interconnect structure, that is, there is no data transmission between two processing cores belonging to two processing core groups respectively.

In one embodiment, each of the processing cores includes at least one transmission unit, and the bandwidth of the interconnect structure meets the bandwidth requirement of the transmission unit. For example, the bandwidth of the interconnect fabric is equal to the bandwidth requirement of the transfer unit to enable the processing cores to send data into the interconnect fabric.

Wherein the bandwidth requirement of the transmission unit may be a bandwidth required by the transmission unit to transmit a certain amount of data within a certain time.

In one embodiment, the bandwidth of the interconnect fabric satisfies the bandwidth requirements of each of the processing cores in the group of processing cores connected to the interconnect fabric.

The bandwidth requirement of a processing core may be a bandwidth required by the processing core to transmit a certain amount of data in a certain time.

Preferably, the bandwidth of the interconnect structure is greater than the bandwidth requirement of each processing core in the processing core group connected to the interconnect structure. Therefore, each interconnection structure can be matched with the structures of all the processing cores in the processing core group connected with the interconnection structure most, so that the area can be saved to the maximum extent on one hand, and on the other hand, the chip has better performance.

In one embodiment, the bandwidth requirements of at least two processing core groups in the chip are different, for example, the first group has a bandwidth requirement of 1MBps, and the second group has a bandwidth requirement of 10MBps, so that the Noc with the corresponding structure can be further arranged based on the difference of the requirements of the processing core groups, and the area of the Noc can be saved to the greatest extent.

Fig. 3 is a schematic diagram of a chip structure according to an embodiment of the invention.

As shown in FIG. 3, the chip includes 6 cores, and the 6 cores include 3 cores with different structures, namely, a first core K₁A second nucleus K₂And a third core K₃There are 2 cores per core.

Wherein, the first kernel K₁Two nuclei in (1) are: nucleus K₁_C₁And nucleus K₁_C₂Second type of nucleus K₂Two nuclei in (1) are: nucleus K₂_C₁And nucleus K₂_C₂Third type of nucleus K₂Two nuclei in (1) are: nucleus K₃_C₁And nucleus K₃_C₂。

The 6 cores in the chip are divided into three groups.

The first processing core group is composed of a core K₁_C₁Nucleus K₁_C₂Nucleus K₂_C₁And K₂_C₂Each core in the first processing core group is connected with a first interconnection structure NoC₁And (4) connecting.

The second processing core group is composed of K₁_C₁、K₂_C₂、K₃_C₁、K₃_C₂Each core in the second processing core group is connected with a second interconnection structure NoC₂And (4) connecting.

The third processing core group consists of K₁_C₂、K₂_C₂、K₃_C₂Each core in the third processing core group is connected with a third interconnection structure NoC₃And (4) connecting.

In a preferred embodiment, each core in the chip includes at least one computational unit.

The computing Unit may be an Execution Unit (EU) or an arithmetic Unit (PU).

The chip comprises a plurality of interconnection structures, each interconnection structure is connected with a processing core group, the same group of processing cores are connected with the interconnection structure connected with the processing core group to realize data transmission, and the interconnection structure can design a data packet and a corresponding protocol special for the group of cores according to the bandwidth required by the cores of the corresponding group and the data bit width used by the cores.

In one embodiment, at least two clock frequencies are provided in a plurality of said interconnect structures in a chip. Because a plurality of interconnection structures are arranged in the chip, each interconnection structure is connected with one group of processing cores, and data transmission among the groups of processing cores can be processed in parallel, the plurality of interconnection structures in the chip can be provided with various different clock frequencies.

Optionally, the plurality of interconnect structures in the chip may also be set to the same clock frequency.

In one embodiment, at least two bit width values are provided in a plurality of the interconnect structures. Because a plurality of interconnection structures are arranged in the chip, each interconnection structure is connected with one group of processing cores, and data transmission among the groups of processing cores can be processed in parallel, various bit width values can be arranged in the interconnection structures.

Optionally, a plurality of interconnect structures in the chip may also be set to the same bit width value.

If data transmission between processing cores of a certain processing core group requires a higher bandwidth, the bandwidth of the NoC may be determined (for example, an appropriate clock frequency and bit width are determined) according to the bandwidth requirement of the processing core group, so that the bandwidth of the NoC satisfies the bandwidth requirement of each processing core in the processing core group. For example, the bit width of the higher NoC is set. In addition, because the data transmission between each processing core group is independent, if other processing core groups do not need higher bandwidth, the adaptive NoC bit width can be set according to the requirements, so that the area of a chip can be saved, and the power consumption is saved.

In addition, according to the chip of the embodiment of the invention, one NoC is adopted between one processing core group to realize data transmission, different NoCs can set different clock frequencies and different bit widths, and corresponding bit widths can be set according to the requirement of each group of data transmission.

(1) the chip provided by the embodiment of the invention is provided with a plurality of interconnection structures, and data transmission is realized between the same processing core group through the interconnection structure connected with the processing core group, so that the data transmission between the cores of different processing core groups can be processed in parallel, on one hand, the total data bandwidth is greatly improved, the data transmission efficiency between the cores is improved, the power consumption is reduced, and the performance of the chip is improved; on the other hand, because the data transmission of different groups can be processed in parallel, the data transmission of different groups can not interfere with each other, thereby avoiding the data congestion phenomenon and improving the performance of the chip.

An embodiment of the present invention further provides a card board, which includes one or more chips provided in the above embodiments.

An embodiment of the present invention further provides an electronic device, including one or more of the cards provided in the above embodiments.

Fig. 4 is a flowchart illustrating an inter-core data transmission method according to an embodiment of the present invention.

As shown in fig. 4, the method includes steps S101 to S102:

step S101, each processing core belonging to the same processing core group realizes data transmission through an interconnection structure connected with the processing core group.

Step S102, the two processing cores respectively belonging to two different processing core groups realize data transmission at least through the two interconnection structures.

In an embodiment, the step of implementing data transmission by at least two of the processing cores belonging to two different processing core groups respectively through at least two of the interconnect structures includes:

the two processing cores respectively belonging to the two different processing core groups realize data transmission through the two interconnection structures via one processing core simultaneously belonging to the two different processing core groups.

For example, in the embodiment shown in FIG. 3, a NoC₁K in (1)₂_C₁To send data to the NoC₂K in (1)₃_C₁Since they are not in one group, then K₂_C₁Can be distributed through one existing NoC₁And in NoC₂The core in (1) transmitting, e.g. K₁_C₁Or K₂_C₂. Namely K₂_C₁Passing data through NoC₁To K₁_C₁，K₁_C₁By NoC₂Sending data to K₃_C₁Thereby realizing data transmission.

It should be noted that, the two processing cores respectively belonging to two different processing core groups may also implement data transmission through a plurality of interconnection structures.

For example, in the embodiment shown in FIG. 3, processing core K₂_C₁Can pass through NoC₁、NoC₂And NoC₃Sending data to a processing core K₃_C₁. Specifically, processing core K₂_C₁By NoC₁Sending data to a processing core K₁_C₂Processing core K₁_C₂By NoC₃Sending data to a processing core K₃_C₂Processing core K₃_C₂Pass the data through the NoC₂Sent to processing core K₃_C₁。

According to the inter-core data transmission method provided by the embodiment of the invention, each processing core belonging to the same processing core group realizes data transmission through the interconnection structure connected with the processing core group, so that the data transmission among different groups of cores can be processed in parallel, on one hand, the total data bandwidth is greatly improved, the data transmission efficiency among the cores is improved, the power consumption is reduced, and the performance of a chip is improved; on the other hand, because the data transmission of different groups can be processed in parallel, the data transmission of different groups can not interfere with each other, thereby avoiding the data congestion phenomenon and improving the performance of the chip.

According to an embodiment of the present invention, a computer storage medium is provided, on which a computer program is stored, which when executed by a processor implements the steps of the inter-core data transmission method provided by the above-described embodiment.

According to an embodiment of the present invention, an electronic device is provided, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the processor implements the steps of the inter-core data transmission method provided in the foregoing embodiment.

According to an embodiment of the present invention, a computer program product is provided, which includes computer instructions, and when the computer instructions are executed by a computing device, the computing device may execute the steps of the inter-core data transmission method provided in the above embodiment.

The flowchart and block diagrams in the figures of the present disclosure illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present disclosure may be implemented by software or hardware. Where the name of an element does not in some cases constitute a limitation on the element itself.

The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

It is to be understood that the above-described embodiments of the present invention are merely illustrative of or explaining the principles of the invention and are not to be construed as limiting the invention. Therefore, any modification, equivalent replacement, improvement and the like made without departing from the spirit and scope of the present invention should be included in the protection scope of the present invention. Further, it is intended that the appended claims cover all such variations and modifications as fall within the scope and boundaries of the appended claims or the equivalents of such scope and boundaries.

The invention has been described above with reference to embodiments thereof. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present invention. The scope of the invention is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be devised by those skilled in the art without departing from the scope of the invention, and these alternatives and modifications are intended to be within the scope of the invention.

It should be understood that the above examples are only for clarity of illustration and are not intended to limit the embodiments. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. And obvious variations or modifications therefrom are within the scope of the invention.

Claims

1. A chip, comprising: each interconnection structure is connected with at least two processing cores, and all the processing cores connected with one interconnection structure are a processing core group;

each processing core belonging to the same processing core group performs data transmission through the interconnection structure connected with the processing core group;

and the plurality of interconnection structures are not directly connected with each other.

2. The chip of claim 1, wherein data transfers are enabled between processing cores belonging to different ones of the groups of processing cores.

3. The chip of claim 2,

the amount of data transferred between any two processing cores belonging to one of the processing core groups is greater than the amount of data transferred between processing cores respectively belonging to any two different processing core groups.

4. The chip according to any of claims 1 to 3,

the plurality of interconnection structures comprise a first interconnection structure and a second interconnection structure;

the first interconnection structure is connected with a first processing core;

the first processing core is connected with the second interconnection structure.

5. The chip of claim 4,

the first interconnection structure is also connected with a second processing core; the second interconnection structure is connected with a third processing core;

and the second processing core realizes data transmission through the first processing core and the third processing core.

6. The chip of claim 4 or 5,

the interconnect structure comprises a third interconnect structure;

all of the processing cores connected by the third interconnect structure are not connected to other interconnect structures.

7. The chip of any of claims 1-6, wherein the interconnect fabric has a bandwidth that meets bandwidth requirements of each of the processing cores within the group of processing cores to which the interconnect fabric is connected.

8. The chip of any of claims 1-7, wherein a plurality of said interconnect structures are provided with at least two clock frequencies; and/or

The plurality of interconnection structures are provided with at least two bit width values.

9. An inter-core data transmission method used in the chip according to any one of claims 1 to 8, comprising:

each processing core belonging to the same processing core group realizes data transmission through an interconnection structure connected with the processing core group;

and the two processing cores which respectively belong to two different processing core groups realize data transmission at least through the two interconnection structures.

10. The method according to claim 9, wherein the two processing cores belonging to two different processing core groups respectively implement data transmission through at least two of the interconnect structures, and the method comprises: