CN112905523B - Chip and inter-core data transmission method - Google Patents

Chip and inter-core data transmission method Download PDF

Info

Publication number
CN112905523B
CN112905523B CN201911230193.9A CN201911230193A CN112905523B CN 112905523 B CN112905523 B CN 112905523B CN 201911230193 A CN201911230193 A CN 201911230193A CN 112905523 B CN112905523 B CN 112905523B
Authority
CN
China
Prior art keywords
processing core
processing
chip
cores
core
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911230193.9A
Other languages
Chinese (zh)
Other versions
CN112905523A (en
Inventor
请求不公布姓名
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Simm Computing Technology Co ltd
Original Assignee
Beijing Simm Computing Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Simm Computing Technology Co ltd filed Critical Beijing Simm Computing Technology Co ltd
Priority to CN201911230193.9A priority Critical patent/CN112905523B/en
Priority to PCT/CN2020/118709 priority patent/WO2021109698A1/en
Publication of CN112905523A publication Critical patent/CN112905523A/en
Application granted granted Critical
Publication of CN112905523B publication Critical patent/CN112905523B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7807System on chip, i.e. computer system on a single chip; System in package, i.e. computer system on one or more chips in a single package
    • G06F15/7825Globally asynchronous, locally synchronous, e.g. network on chip
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • G06F13/40Bus structure
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • G06F13/40Bus structure
    • G06F13/4004Coupling between buses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • G06F13/40Bus structure
    • G06F13/4004Coupling between buses
    • G06F13/4022Coupling between buses using switching circuits, e.g. switching matrix, connection or expansion network
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7807System on chip, i.e. computer system on a single chip; System in package, i.e. computer system on one or more chips in a single package
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The application discloses a chip and an inter-core data transmission method. The chip comprises: a plurality of interconnect structures, each interconnect structure connecting a processing core group, the processing core group comprising at least two processing cores; each processing core in the processing core group performs data transmission through the interconnection structure connected with the processing core group; wherein a plurality of interconnection structures are not directly connected. According to the chip provided by the embodiment of the application, data transmission is realized among the cores of the same processing core group through the interconnection structure corresponding to the group, and the data transmission among the cores of different processing core groups can be processed in parallel, so that on one hand, the total data bandwidth is greatly improved, the data transmission efficiency among the cores is improved, the power consumption is reduced, and the performance of the chip is improved; on the other hand, as the data transmission of different groups can be processed in parallel, the data transmission of different groups can not interfere with each other, thereby avoiding the occurrence of data congestion phenomenon and improving the performance of the chip.

Description

Chip and inter-core data transmission method
Technical Field
The application relates to the technical field of chips, in particular to a chip and an inter-core data transmission method.
Background
With the development of science and technology, human society is rapidly entering the intelligent era. The important characteristics of the intelligent age are that the variety of data obtained by people is more and more, the amount of obtained data is more and more, and the requirement on the speed of processing the data is higher and more.
The chip is the basic stone for data processing and fundamentally determines the ability of people to process data. From the application field, the chip has two main routes: one is a general chip route, such as a central processing unit (Central Processing Unit, CPU) or the like, which provides great flexibility but is relatively low in terms of effective algorithms in processing domain-specific algorithms; the other is a special chip route, such as tensor processor (Tensor Processing Unit, TPU) and the like, which can exert higher effective calculation force in certain specific fields, but faces the flexible and changeable more general fields, and has relatively poor processing capability or even cannot process.
Because of the large variety and huge number of data in the intelligent age, the chip is required to have extremely high flexibility, can process algorithms in different fields and in daily life and in a very strong processing capacity, and can rapidly process extremely large and rapidly growing data volume.
Disclosure of Invention
Object of the application
The application aims to provide a chip and an inter-core data transmission method. The chip provided by the embodiment of the application comprises a plurality of interconnection structures, and the data transmission between the processing cores in the same processing core group is realized through the interconnection structures connected with the processing cores, so that the data transmission between the cores in different groups can be processed in parallel, the total data bandwidth is greatly improved, and the performance of the chip is improved.
(II) technical scheme
To solve the above problems, a first aspect of the present application provides a chip, including: the system comprises a plurality of interconnection structures, a plurality of processing core groups and a plurality of processing core groups, wherein each interconnection structure is connected with one processing core group, and all processing cores connected with one interconnection structure are one processing core group; each processing core belonging to the same processing core group performs data transmission through the interconnection structure connected with the processing core group; wherein a plurality of interconnection structures are not directly connected.
The chip provided by the embodiment of the application comprises a plurality of interconnection structures, and the data transmission between the processing cores in the same processing core group is realized through the interconnection structures connected with the processing cores, so that the data transmission between the cores in different processing groups can be processed in parallel, the total data bandwidth is greatly improved, and the performance of the chip is improved.
Further, data transfer between the processing cores belonging to different said processing core groups is enabled.
Further, the amount of data transferred between any two processing cores belonging to one processing core group is larger than the amount of data transferred between processing cores respectively belonging to any two different processing core groups.
Further, the plurality of interconnection structures include a first interconnection structure and a second interconnection structure; the first interconnection structure is connected with a first processing core; the first processing core is connected with the second interconnection structure.
Further, the first interconnection structure is also connected with a second processing core; the second interconnection structure is connected with a third processing core; the second processing core realizes data transmission through the first processing core and the third processing core.
Further, the interconnection structure comprises a third interconnection structure; all the processing cores connected by the third interconnection structure are not connected with other interconnection structures.
Further, the processing core to which each of the interconnect structures is connected is not connected to the other interconnect structures.
Further, each processing core includes at least one transmission unit, and the bandwidth of the interconnection structure meets the bandwidth requirement of the transmission unit.
Further, the bandwidth of the interconnect structure meets the bandwidth requirements of each of the processing cores within the set of processing cores connected to the interconnect structure.
Further, the bandwidth of the interconnect fabric is greater than the bandwidth requirements of each of the processing cores within the set of processing cores coupled to the interconnect fabric.
Further, a plurality of the interconnection structures are provided with at least two clock frequencies.
Further, the plurality of interconnection structures are provided with at least two bit width values.
Further, there are at least two processing core groups that differ in bandwidth requirements.
According to a second aspect of the present application there is provided a card comprising one or more of the chips provided in the first aspect.
According to a third aspect of the present application there is also provided an electronic device comprising one or more cards as provided in the second aspect.
According to a fourth aspect of the present application, there is provided an inter-core data transmission method for use in a chip provided in the first aspect, the method comprising: each processing core belonging to the same processing core group realizes data transmission through an interconnection structure connected with the processing core group; and the two processing cores respectively belong to two different processing core groups, and data transmission is realized through at least two interconnection structures.
Further, the two processing cores respectively belonging to two different processing core groups realize data transmission through at least two interconnection structures, and the method comprises the following steps: the two processing cores respectively belonging to two different processing core groups realize data transmission through two interconnection structures by virtue of one processing core simultaneously belonging to the two different processing core groups.
According to a fifth aspect of the present application, there is provided a computer storage medium having stored thereon a computer program which when executed by a processor performs the steps of the inter-core data transfer method of the fourth aspect.
According to a sixth aspect of the present application there is provided an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the inter-core data transfer method of the fourth aspect when the program is executed.
According to a seventh aspect of the present application there is provided a computer program product comprising computer instructions which, when executed by a computing device, may perform the steps of the inter-core data transfer method of the fourth aspect.
(III) beneficial effects
The technical scheme of the application has the following beneficial technical effects:
(1) The chip provided by the embodiment of the application is provided with a plurality of interconnection structures, and the processing cores of the same processing core group realize data transmission through the interconnection structures connected with the processing cores, so that the data transmission of different processing core groups can be processed in parallel, on one hand, the total data bandwidth is greatly improved, the data transmission efficiency between the cores is improved, the power consumption is reduced, and the performance of the chip is improved; on the other hand, as the data transmission of different groups can be processed in parallel, the data transmission of different groups can not interfere with each other, thereby avoiding the occurrence of data congestion phenomenon and improving the performance of the chip.
(2) The chip provided by the embodiment of the application only needs to consider the data format of the cores in the same group and the data formats of other cores in other groups of the chip when the cores belonging to the same group are used for data transmission, so that the complexity of circuits in the chip is reduced.
(3) Because an interconnection structure is adopted between processing core groups to realize data transmission, different interconnection structures can set different clock frequencies and different bit widths, corresponding bit widths can be set according to the requirement of each group of data transmission, compared with the case that all cores in a chip adopt one interconnection structure to carry out data transmission, the power consumption is reduced, the performance of the whole chip is improved, the flexibility of chip design is improved, and the proper bit widths and frequency of the interconnection structure are determined according to the actual bandwidth requirements of different processing core groups, so that the area of the chip can be saved.
Drawings
FIG. 1 is a schematic diagram of a chip;
FIG. 2 is a block diagram of a chip according to an embodiment of the application;
FIG. 3 is a block diagram of a chip according to an embodiment of the application;
fig. 4 is a flowchart of an inter-core data transmission method according to an embodiment of the present application.
Detailed Description
The objects, technical solutions and advantages of the present application will become more apparent by the following detailed description of the present application with reference to the accompanying drawings. It should be understood that the description is only illustrative and is not intended to limit the scope of the application. In addition, in the following description, descriptions of well-known structures and techniques are omitted so as not to unnecessarily obscure the present application.
In a multi (many) core chip, all cores may be homogenous, i.e. each core is of the same structure, and heterogeneous cores may also be present in all cores, i.e. at least two different cores. In many (many) core architectures, multiple cores often participate in the execution of a task at the same time, and at this time, data is transmitted between the cores. Therefore, what architecture is adopted for the multi-core chip, how to perform data transmission, so that a chip with excellent performance is formed by isomorphic or heterogeneous core organic structure is important.
Fig. 1 is a schematic diagram of a chip structure.
In the Chip structure shown in fig. 1, the Chip has only one Network On Chip, and all cores are mutually communicated through a Network On Chip (NoC) to exchange data.
As shown in FIG. 1, the chip is provided with cores of various structures including K 1 _C 1 、K 2 _C 1 K is as follows M _C 1 Wherein M represents the species of core M, the number of each core being plural, e.g. core K of the first type 1 Comprises K 1 _C 1 、K 1 _C 2 K is as follows 1 _C NM Where NM is represented as the number of NMs per core. Namely K M _C NM The NM-th core, denoted as the Mth structure.
Since the cores in the chip are of various types (M types), and the same NoC is used for data exchange between the cores of different types, each core needs to know the data format (bit width) of all other cores needing to exchange data, and since the same NoC is used between the cores, the data format of a data packet transmitted or received has a high requirement, namely, the transmitting core encodes the data format into the data format of the receiving core, or the receiving core re-decodes and re-encodes the data into the data format of the receiving core according to the type of the transmitting core, so that the complexity of a data packet routing algorithm is high, and the burden of each core is heavy.
In addition, since all cores share one NoC, due to the different data amounts exchanged between the cores, when a pair of cores performs data transmission, other cores cannot perform data transmission by adopting the NoC, and need to wait for the pair of cores to perform data transmission, so that the situation of data transmission delay is generated, and the performance of the chip is reduced. For example, the data transmission bandwidth between the first core and the second core is very large, in the data transmission process of the first core and the second core, the third core generates data which needs to be sent to the fourth core, and the data size is smaller, but the real-time requirement is very high, and as all cores share one NoC, the third core needs to wait until the data transmission between the first core and the second core is completed, and then the data transmission is performed, so that the data with high real-time requirement is not transmitted in time, and the performance of the chip is affected.
In order to solve the problems, the technical scheme of the application is provided.
The chip provided by one embodiment of the present application will be described in detail below. In the description of the present application, it should be noted that the terms "first," "second," "third," "fourth," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. In addition, the technical features of the different embodiments of the present application described below may be combined with each other as long as they do not collide with each other.
Fig. 2 is a block diagram of a chip according to an embodiment of the present application.
As shown in fig. 2, the chip includes: the system comprises a plurality of interconnection structures, a plurality of processing core groups and a plurality of processing core groups, wherein each interconnection structure is connected with one processing core group, and all processing cores connected with one interconnection structure are one processing core group; each processing core in the processing core group performs data transmission through the interconnection structure connected with the processing core group; wherein a plurality of interconnection structures are not directly connected.
In the embodiment shown in fig. 2, the Chip includes o interconnect structures, where the interconnect structures may be inter-core interconnect structures Fabric, network On Chip (Noc), bus, or switch. The interconnection structure of the present application uses network on chip Noc as an example, but is not limited thereto. Wherein o interconnection structures are NoC respectively 1 、NoC 2 …NoC o Each NoC is connected to a set of processing cores. Each processing core in the processing core group performs data transmission through the NoC connected to the processing core group.
The chip comprises M cores, wherein the number of each core is N M And each. For example, K 2 _C N2 N. represented as core 2 2 Core, K M _C NM Nth denoted as mth M And (3) a core.
In the embodiment shown in FIG. 2, the first processing core group includes core K 1 _C 1 Core K 1 _C 2 … core K 1 _C N1 Core K 2 _C 1 And core K 2 _C 2 . First network on chip NoC 1 Connecting the first processing core group, i.e. the firstEach processing core of a group of processing cores is associated with a NoC 1 And (5) connection. The cores of the first processing core group are connected with each other through a first network on chip NoC 1 And realizing data transmission. For example when core K 1 _C 1 With data to be transferred to the core K 1 _C 2 When transmitting the data to the first network on chip NoC 1 First network on chip NoC 1 Transmitting data to core K 1 _C 2
The second processing core group includes: core K 1 _C 1 Core K 1 _C N1 Core K 2 _C 2 … core K 2 _C N2 Core K M _C 1 And core K M _C 2 Each processing core of the second set of processing cores is associated with a second network on chip NoC 2 And (5) connection. The cores of the second processing core group pass through a second network on chip NoC between the cores 2 And realizing data transmission.
The o-th processing core group includes core K 1 _C N1 Core K 2 _C 2 Core K M _C 2 … core K M _C NM . Each core of the group of o-th processing cores is associated with an o-th network on chip NoC o And (5) connection. NoC is formed between cores of the o-th processing core group through an o-th interconnection structure O And realizing data transmission.
The chip provided by the embodiment of the application is provided with a plurality of interconnection structures, each interconnection structure is connected with one processing core group, data transmission is realized among the cores of the same processing core group through the interconnection structure connected with the group, and data transmission among the cores of different groups can be processed in parallel, so that on one hand, when the cores belonging to the same group are used for data transmission, only the data format of the cores in the same group is needed to be considered, and the data formats of other cores of other groups of the chip are not needed to be considered, thereby reducing the complexity of circuits in the chip, improving the data transmission efficiency between the cores, reducing the power consumption and improving the performance of the chip; on the other hand, as the data transmission of different groups can be processed in parallel, the data transmission of different groups can not interfere with each other, thereby avoiding the occurrence of data congestion phenomenon and improving the performance of the chip.
In one embodiment, data transfer is enabled between the processing cores belonging to different ones of the processing core groups.
In a preferred embodiment, the plurality of interconnect structures includes a first interconnect structure and a second interconnect structure; the first interconnection structure is connected with a first processing core; the first processing core is connected with the second interconnection structure.
Specifically, in this embodiment, the same processing core may be disposed in two different processing core groups at the same time, for example, in the embodiment shown in fig. 2, the processing core K 1 _C 1 Disposed in both the first and second processing core groups, i.e. the processing core K 1 _C 1 Both with the first network on chip NoC 1 Connected with a second network-on-chip NoC 2 And (5) connection.
In one embodiment, the amount of data transferred between any two processing cores belonging to the same processing core group is greater than the amount of data transferred between processing cores respectively belonging to any two different groups.
Specifically, in the present embodiment, if there is a relatively large amount of data transfer between some two cores, they are allocated in the same group.
In one embodiment, the first interconnect structure further has a second processing core connected thereto; the second interconnection structure is connected with a third processing core; the second processing core realizes data transmission through the first processing core and the third processing core.
Specifically, in this embodiment, if there is data to be transmitted between two cores respectively belonging to two packets, generally, the amount of data to be transmitted between the two cores will be relatively small, and if there is only little data to be exchanged between some two cores, the data transmission may be implemented by transferring through some bridged cores, that is, by simultaneously locating the cores of the two groups. For example, with NoC 1 Connected K 2 _C 1 To send data to NoC 2 Connected K M _C 1 Since they are not in a processing core group, then K 2 _C 1 Can be combined with NoC through one 1 Connected with NoC 2 The connected processing cores transmitting, e.g. K 1 _C 1 Or K 2 _C 2 . Namely K 2 _C 1 Passing data through NoC 1 Send to K 1 _C 1 ,K 1 _C 1 Through NoC 2 Transmitting data to K M _C 1 Thereby realizing data transmission.
For example, in the embodiment shown in FIG. 2, processing core K of the first processing core group 1 _C 2 Processing core K that needs to send data to the second processing core group M _C 2 By processing core K 1 _C 1 、K 1 _C N1 Or K 2 _C 2 Realizing the method.
Specifically, processing core K 1 _C 2 Through a first network on chip NoC 1 Sending the data to the processing core K 1 _C 1 Processing core K 1 _C 1 Passing data through a second network on chip NoC 2 Send to processing core K M _C 2
In one embodiment, the plurality of interconnect structures includes a third interconnect structure; all of the processing cores connected by the third interconnect structure are not connected to the other interconnect structures.
For example, in the embodiment shown in fig. 2, the chip further includes a third network-on-chip, and a third processing core group is connected to the third network-on-chip, where the third processing core group includes a core K 1 _C 3 And core K 3 _C 1 The two cores are not connected to other interconnect structures.
In one embodiment of the application, the processing cores connected by each interconnect structure in the chip are not connected to other of the interconnect structures. In this embodiment, the processing cores connected by each interconnect structure are not connected to other interconnect structures, i.e. there is no data transfer between two processing cores respectively belonging to two processing core groups.
In one embodiment, each of the processing cores includes at least one transmission unit, and the bandwidth of the interconnect structure meets the bandwidth requirements of the transmission units. For example, the bandwidth of the interconnect structure is equal to the bandwidth requirements of the transmission units so that the processing cores can send data into the interconnect structure.
The bandwidth requirement of the transmission unit may be, among other things, the bandwidth required by the transmission unit to transmit a certain amount of data over a certain time.
In one embodiment, the bandwidth of the interconnect structure meets the bandwidth requirements of each of the processing cores within the set of processing cores connected to the interconnect structure.
The bandwidth requirement of the processing core may be, among other things, the bandwidth required by the processing core to transmit a certain amount of data over a certain period of time.
Preferably, the bandwidth of the interconnect structure is greater than the bandwidth requirements of each of said processing cores within said group of processing cores connected to the interconnect structure. Therefore, each interconnection structure can be optimally matched with the structures of all the processing cores in the processing core group connected with the interconnection structure, so that the area can be saved to the greatest extent on one hand, and the chip has better performance on the other hand.
In one embodiment, at least two processing core groups in the chip have different bandwidth requirements, for example, the first group has a bandwidth requirement of 1MBps and the second group has a bandwidth requirement of 10MBps, so that Noc with a corresponding structure can be further set based on the different requirements of the processing core groups, and the area of Noc can be saved to the greatest extent.
Fig. 3 is a schematic diagram of a chip structure according to an embodiment of the application.
As shown in FIG. 3, the chip includes 6 cores, and 3 cores having different structures are included in the 6 cores, namely, the first core K 1 Second kind of core K 2 And a third core K 3 There are 2 cores each.
Wherein the first core K 1 The two cores in (a) are: core K 1 _C 1 And core K 1 _C 2 Second kind of core K 2 The two cores in (a) are: core K 2 _C 1 And core K 2 _C 2 Third kind of core K 2 The two cores in (a) are: core K 3 _C 1 And core K 3 _C 2
The 6 cores in the chip are divided into three groups.
The first processing core group consists of core K 1 _C 1 Core K 1 _C 2 Core K 2 _C 1 And K 2 _C 2 Each core in the first processing core group is connected with a first interconnection structure NoC 1 And (5) connection.
The second processing core group consists of K 1 _C 1 、K 2 _C 2 、K 3 _C 1 、K 3 _C 2 Each core in the second processing core group is connected with a second interconnection structure NoC 2 And (5) connection.
The third processing core group consists of K 1 _C 2 、K 2 _C 2 、K 3 _C 2 The third processing core group is composed of each core and a third interconnection structure NoC 3 And (5) connection.
In a preferred embodiment, each core in the chip comprises at least one computational unit.
The computing Unit may be an Execution Unit (EU), or a Processing Unit (PU).
The chip comprises a plurality of interconnection structures, each interconnection structure is connected with a processing core group, the same group of processing cores are connected with the interconnection structure connected with the processing core group to realize data transmission, and the interconnection structure can design a data packet special for the group of cores and a corresponding protocol according to the bandwidth required by the cores of the corresponding group of the interconnection structure and the data bit width used by the cores.
In one embodiment, at least two clock frequencies are provided in a plurality of the interconnect structures in the chip. Because the chip is provided with a plurality of interconnection structures, each interconnection structure is connected with one group of processing cores, and data transmission among the processing cores of each group can be processed in parallel, the interconnection structures in the chip can be provided with a plurality of different clock frequencies.
Alternatively, multiple interconnect structures in a chip may be set to the same clock frequency.
In one embodiment, at least two bit width values are provided in a plurality of the interconnect structures. Because the chip is provided with a plurality of interconnection structures, each interconnection structure is connected with one group of processing cores, and data transmission among the processing cores of each group can be processed in parallel, a plurality of bit width values can be arranged in the interconnection structures.
Alternatively, multiple interconnect structures in a chip may be set to the same bit width value.
It should be noted that, if the data transmission between the processing cores of a certain processing core group needs a higher bandwidth, the bandwidth of the NoC may be determined according to the bandwidth requirement of the processing core group (for example, determining a suitable clock frequency and bit width) so that the bandwidth of the NoC meets the bandwidth requirement of each processing core in the processing core group. For example, the bit width of a higher NoC is set. In addition, because the data transmission among the processing core groups is mutually independent, if other processing core groups do not need higher bandwidth, the corresponding NoC bit width can be set according to the requirements of the processing core groups, so that the area of a chip can be saved, and the power consumption is saved.
In addition, the chip of the embodiment of the application adopts one NoC to realize data transmission among one processing core group, different NoCs can set different clock frequencies and different bit widths, and corresponding bit widths can be set according to the data transmission requirement of each group.
The technical scheme of the application has the following beneficial technical effects:
(1) The chip provided by the embodiment of the application is provided with a plurality of interconnection structures, and the data transmission between the same processing core group is realized through the interconnection structures connected with the processing core group, so that the data transmission between the cores of different processing core groups can be processed in parallel, on one hand, the total data bandwidth is greatly improved, the data transmission efficiency between the cores is improved, the power consumption is reduced, and the performance of the chip is improved; on the other hand, as the data transmission of different groups can be processed in parallel, the data transmission of different groups can not interfere with each other, thereby avoiding the occurrence of data congestion phenomenon and improving the performance of the chip.
(2) The chip provided by the embodiment of the application only needs to consider the data format of the cores in the same group and the data formats of other cores in other groups of the chip when the cores belonging to the same group are used for data transmission, so that the complexity of circuits in the chip is reduced.
An embodiment of the present application further provides a card board, which includes one or more chips provided in the above embodiment.
The application further provides electronic equipment, which comprises one or more clamping plates provided by the embodiment.
Fig. 4 is a flowchart of an inter-core data transmission method according to an embodiment of the present application.
As shown in fig. 4, the method includes steps S101 to S102:
step S101, each processing core belonging to the same processing core group realizes data transmission through an interconnection structure connected with the processing core group.
Step S102, two processing cores respectively belonging to two different processing core groups realize data transmission through at least two interconnection structures.
In one embodiment, the step of implementing data transmission by at least two of the interconnection structures includes:
the two processing cores respectively belonging to two different processing core groups realize data transmission through two interconnection structures by virtue of one processing core simultaneously belonging to the two different processing core groups.
For example, in the embodiment shown in FIG. 3, the NoC 1 K in (B) 2 _C 1 To send data to NoC 2 K in (B) 3 _C 1 Since they are not in a group, then K 2 _C 1 Through a common NoC 1 And at NoC 2 The core in (B) transmitting, e.g. K 1 _C 1 Or K 2 _C 2 . Namely K 2 _C 1 Passing data through NoC 1 Send to K 1 _C 1 ,K 1 _C 1 Through NoC 2 Transmitting data to K 3 _C 1 Thereby realizing data transmission.
It should be noted that, the two processing cores respectively belong to two different processing core groups, and data transmission can be realized through a plurality of interconnection structures.
For example, in the embodiment shown in FIG. 3, processing core K 2 _C 1 Can pass through NoC 1 、NoC 2 And NoC (NoC) 3 Sending the data to the processing core K 3 _C 1 . Specifically, processing core K 2 _C 1 Through NoC 1 Sending the data to the processing core K 1 _C 2 Processing core K 1 _C 2 Through NoC 3 Sending the data to the processing core K 3 _C 2 Processing core K 3 _C 2 Passing the data through a NoC 2 Sent to processing core K 3 _C 1
According to the inter-core data transmission method provided by the embodiment of the application, each processing core belonging to the same processing core group realizes data transmission through the interconnection structure connected with the processing core group, so that the data transmission among different groups of cores can be processed in parallel, on one hand, the total data bandwidth is greatly improved, the data transmission efficiency among the cores is improved, the power consumption is reduced, and the performance of the chip is improved; on the other hand, as the data transmission of different groups can be processed in parallel, the data transmission of different groups can not interfere with each other, thereby avoiding the occurrence of data congestion phenomenon and improving the performance of the chip.
According to an embodiment of the present application, there is provided a computer storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the inter-core data transmission method provided in the above embodiment.
According to an embodiment of the present application, there is provided an electronic device including a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the inter-core data transmission method provided in the above embodiment when the program is executed by the processor.
According to an embodiment of the present application, there is provided a computer program product including computer instructions which, when executed by a computing device, can perform the steps of the inter-core data transmission method provided in the above embodiment.
The flowcharts and block diagrams in the figures of this disclosure illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units involved in the embodiments of the present disclosure may be implemented by means of software, or may be implemented by means of hardware. Wherein the names of the units do not constitute a limitation of the units themselves in some cases.
The functions described above herein may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), an Application Specific Standard Product (ASSP), a system on a chip (SOC), a Complex Programmable Logic Device (CPLD), and the like.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
It is to be understood that the above-described embodiments of the present application are merely illustrative of or explanation of the principles of the present application and are in no way limiting of the application. Accordingly, any modification, equivalent replacement, improvement, etc. made without departing from the spirit and scope of the present application should be included in the scope of the present application. Furthermore, the appended claims are intended to cover all such changes and modifications that fall within the scope and boundary of the appended claims, or equivalents of such scope and boundary.
The application has been described above with reference to the embodiments thereof. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present application. The scope of the application is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be made by those skilled in the art without departing from the scope of the application, and such alternatives and modifications are intended to fall within the scope of the application.
It is apparent that the above examples are given by way of illustration only and are not limiting of the embodiments. Other variations or modifications of the above teachings will be apparent to those of ordinary skill in the art. It is not necessary here nor is it exhaustive of all embodiments. While still being apparent from variations or modifications that may be made by those skilled in the art are within the scope of the application.

Claims (7)

1. A chip, comprising: the system comprises a plurality of interconnection structures, a plurality of processing cores and a plurality of processing cores, wherein each interconnection structure is connected with at least two processing cores, and all processing cores connected with one interconnection structure are a processing core group;
each processing core belonging to the same processing core group carries out data transmission through the interconnection structure connected with the processing core group to which the processing core belongs;
wherein a plurality of interconnection structures are not directly connected;
the plurality of interconnection structures comprise a first interconnection structure and a second interconnection structure;
the first interconnection structure is connected with a first processing core;
the first processing core is connected with the second interconnection structure;
the first interconnection structure is also connected with a second processing core; the second interconnection structure is connected with a third processing core;
the second processing core realizes data transmission through the first processing core and the third processing core.
2. The chip of claim 1, wherein data transfer is enabled between individual processing cores belonging to different ones of said processing core groups.
3. The chip of claim 2, wherein the chip,
the amount of data transferred between any two processing cores belonging to one of the processing core groups is greater than the amount of data transferred between processing cores respectively belonging to any two different processing core groups.
4. The chip of claim 3, wherein the chip,
the interconnection structure comprises a third interconnection structure;
all the processing cores connected by the third interconnection structure are not connected with other interconnection structures.
5. The chip of claim 1, wherein a bandwidth of the interconnect structure meets a bandwidth requirement of each of the processing cores within the set of processing cores connected to the interconnect structure.
6. The chip of claim 1, wherein a plurality of the interconnect structures are provided with at least two clock frequencies; and/or
The interconnection structures are provided with at least two bit width values.
7. An inter-core data transmission method for use in a chip as claimed in any one of claims 1 to 6, comprising:
each processing core belonging to the same processing core group realizes data transmission through an interconnection structure connected with the processing core group;
the two processing cores respectively belong to two different processing core groups, and data transmission is realized through at least two interconnection structures; the two processing cores respectively belonging to two different processing core groups realize data transmission through two interconnection structures by virtue of one processing core simultaneously belonging to the two different processing core groups.
CN201911230193.9A 2019-12-04 2019-12-04 Chip and inter-core data transmission method Active CN112905523B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201911230193.9A CN112905523B (en) 2019-12-04 2019-12-04 Chip and inter-core data transmission method
PCT/CN2020/118709 WO2021109698A1 (en) 2019-12-04 2020-09-29 Chip and inter-core data transmission method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911230193.9A CN112905523B (en) 2019-12-04 2019-12-04 Chip and inter-core data transmission method

Publications (2)

Publication Number Publication Date
CN112905523A CN112905523A (en) 2021-06-04
CN112905523B true CN112905523B (en) 2023-11-17

Family

ID=76110785

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911230193.9A Active CN112905523B (en) 2019-12-04 2019-12-04 Chip and inter-core data transmission method

Country Status (2)

Country Link
CN (1) CN112905523B (en)
WO (1) WO2021109698A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113778938B (en) * 2021-08-31 2024-03-12 上海阵量智能科技有限公司 Method, device and chip for determining network-on-chip topology structure

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101546302A (en) * 2009-05-07 2009-09-30 复旦大学 Interconnection structure of multicore processor and hierarchical interconnection design method based on interconnection structure
CN106528052A (en) * 2016-12-26 2017-03-22 北京海嘉科技有限公司 Microprocessor architecture based on distributed function units

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9063730B2 (en) * 2010-12-20 2015-06-23 Intel Corporation Performing variation-aware profiling and dynamic core allocation for a many-core processor
CN103336756B (en) * 2013-07-19 2016-01-27 中国人民解放军信息工程大学 A kind of generating apparatus of data computational node
CN205540720U (en) * 2016-04-06 2016-08-31 龙芯中科技术有限公司 Treater interconnection structure and mainboard

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101546302A (en) * 2009-05-07 2009-09-30 复旦大学 Interconnection structure of multicore processor and hierarchical interconnection design method based on interconnection structure
CN106528052A (en) * 2016-12-26 2017-03-22 北京海嘉科技有限公司 Microprocessor architecture based on distributed function units

Also Published As

Publication number Publication date
CN112905523A (en) 2021-06-04
WO2021109698A1 (en) 2021-06-10

Similar Documents

Publication Publication Date Title
US10355996B2 (en) Heterogeneous channel capacities in an interconnect
US8811422B2 (en) Single chip protocol converter
US9571399B2 (en) Method and apparatus for congestion-aware routing in a computer interconnection network
US8819611B2 (en) Asymmetric mesh NoC topologies
US9253085B2 (en) Hierarchical asymmetric mesh with virtual routers
EP3364625B1 (en) Device, system and method for adaptive payload compression in a network fabric
CN110636139B (en) Optimization method and system for cloud load balancing
CN112905523B (en) Chip and inter-core data transmission method
CN108429938A (en) In reconfigurable arrays processor optical interconnection network is communicated between cluster
CN116915708A (en) Method for routing data packets, processor and readable storage medium
Ueno et al. Hybrid network utilization for efficient communication in a tightly coupled FPGA cluster
US9762474B2 (en) Systems and methods for selecting a router to connect a bridge in the network on chip (NoC)
US11636061B2 (en) On-demand packetization for a chip-to-chip interface
CN111274193A (en) Data processing apparatus and method
US9774498B2 (en) Hierarchical asymmetric mesh with virtual routers
CN114445260A (en) Distributed GPU communication method and device based on FPGA
CN117221212B (en) Optical network on chip low congestion routing method and related equipment
Shu et al. Optimal many-to-many personalized concurrent communication in RapidIO-based fat-trees
CN114095289B (en) Data multicast circuit, method, electronic device, and computer-readable storage medium
CN112437032B (en) Data transmitting/receiving device and method, storage medium, and electronic apparatus
CN112825101B (en) Chip architecture, data processing method thereof, electronic equipment and storage medium
US20230254253A1 (en) Message split-aggregation for multi-stage electrical interconnection network
CN111404829B (en) Port aggregation method, device, equipment and storage medium
KR20230120559A (en) Electronic device for performing message split-aggregation in multi-stage electrical interconnection network and method for operating method thereof
RU2642383C2 (en) Method of information transmission

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant