CN114095289B - Data multicast circuit, method, electronic device, and computer-readable storage medium - Google Patents

Data multicast circuit, method, electronic device, and computer-readable storage medium Download PDF

Info

Publication number
CN114095289B
CN114095289B CN202010787623.3A CN202010787623A CN114095289B CN 114095289 B CN114095289 B CN 114095289B CN 202010787623 A CN202010787623 A CN 202010787623A CN 114095289 B CN114095289 B CN 114095289B
Authority
CN
China
Prior art keywords
multicast
data
processing core
circuit
core
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010787623.3A
Other languages
Chinese (zh)
Other versions
CN114095289A (en
Inventor
请求不公布姓名
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Ximu Semiconductor Technology Co ltd
Original Assignee
Beijing Simm Computing Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Simm Computing Technology Co ltd filed Critical Beijing Simm Computing Technology Co ltd
Priority to CN202010787623.3A priority Critical patent/CN114095289B/en
Publication of CN114095289A publication Critical patent/CN114095289A/en
Application granted granted Critical
Publication of CN114095289B publication Critical patent/CN114095289B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/02Details
    • H04L12/16Arrangements for providing special services to substations
    • H04L12/18Arrangements for providing special services to substations for broadcast or conference, e.g. multicast
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/20Support for services
    • H04L49/201Multicast operation; Broadcast operation

Abstract

The embodiment of the disclosure discloses a data multicast circuit, a method and a chip. Wherein the data multicast circuit corresponds to a first processing core, comprising: a multicast enabling signal generating circuit for generating a multicast enabling signal according to the original multicast information and the core identification of the first processing core; wherein the original multicast information is used to identify all processing cores participating in the data multicast; the receiving and transmitting control circuit is used for generating a multicast data packet according to the multicast enabling signal; wherein the multicast data packet comprises multicast data and first multicast information; wherein the first multicast information is used to identify individual ones of the processing cores that have not participated in the data multicast. The data multicast circuit enables the processing cores to relay multicast data according to multicast information through the multicast enabling signal generating circuit and the receiving and transmitting control circuit, and solves the technical problems of complex hardware circuit and inflexible multicast control caused by multicast data among a plurality of processing cores in the prior art.

Description

Data multicast circuit, method, electronic device, and computer-readable storage medium
Technical Field
The present disclosure relates to the field of processors, and more particularly, to a data multicast circuit, method, electronic device, and computer readable storage medium.
Background
With the development of the information age and the progress of science and technology, the artificial intelligence reaches the blowout type development stage, the information quantity is increased dramatically, and the data processing quantity is also increased. How to efficiently process mass data becomes a common goal for technological workers.
Meanwhile, artificial intelligence and chip technology jointly progress, more and more transistors can be integrated on a single chip, the data computing capacity of the chip is also larger and larger, and different technical companies sequentially integrate multi-core processors on the single chip to improve the hardware performance. And the shared transmission of data among multiple cores often becomes a key bottleneck for improving the computing capacity of the chip.
In the multi-processing core chip architecture in the artificial intelligence field, different neural network models are often used for sharing weights or other data by different processing cores, and the sharing of the same data to a plurality of processing cores becomes a key for improving the computing power of the whole chip.
In the prior art, data sharing between processing cores is generally implemented using a shared storage or data broadcasting manner.
FIG. 1a shows a prior art shared storage implementation. As shown in fig. 1a, a plurality of processing cores are connected to the same shared memory, and each processing core reads and writes data from and into the shared memory, and the data transmission process is as follows: core0/core1/… …/core reads data into shared memory simultaneously; the shared memory arbitrates the application of each core and returns corresponding data; the corresponding calculation is completed after the core0/core1/… …/core obtains the data. However, the scheme using the above shared memory has the following drawbacks: 1. because the shared memory can serially receive requests of different processing cores, the data bandwidth of the shared memory becomes a bottleneck; 2. each processing core independently reads and writes the shared memory, so that the task synchronism is poor and the efficiency is low; physical circuit congestion from multi-processing cores to memory is complex, and high-performance production realization of chips is affected.
Fig. 1b shows a prior art implementation of data broadcasting. As shown in fig. 1b, core0 may broadcast data to other cores through a NOC (Network On Chip) network, and the data transmission process is as follows: core0 sends the data to the NOC; the NOC receives the data and broadcasts the data to other cores; all cores receive data sent by the NOC. However, the data broadcasting method has the following defects that 1, the NOC topological structure of the data broadcasting has large design difficulty, especially when the number of processing cores on a chip is more, the physical winding difficulty of the chip is increased, so that the chip performance is reduced; 2. the use of the network is limited in scene, most of the existing NOCs are third party IP, the support for multicast is limited, and in addition, a plurality of groups of cores are used for multicasting simultaneously, so that the probability of network deadlock can be greatly increased due to data dependence.
Disclosure of Invention
This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
In order to solve the above technical problems in the prior art, the embodiments of the present disclosure propose the following technical solutions:
in a first aspect, an embodiment of the present disclosure provides a data multicast circuit, corresponding to a first processing core, including:
a multicast enabling signal generating circuit for generating a multicast enabling signal according to the original multicast information and the core identification of the first processing core; wherein the original multicast information is used to identify all processing cores participating in the data multicast;
the receiving and transmitting control circuit is used for generating a multicast data packet according to the multicast enabling signal; wherein the multicast data packet comprises multicast data and first multicast information; wherein the first multicast information is used to identify individual ones of the processing cores that have not participated in the data multicast.
Further, the transceiver control circuit is further configured to:
Judging whether the first processing core is a main processing core or not according to the original multicast information;
validating the multicast enable signal in response to the first processing core being a primary processing core;
the multicast enable signal is disabled in response to the first processing core not being a master processing core.
Further, the data multicast circuit further includes:
a starting point processing core calculating circuit for calculating a next target processing core of the multicast data according to the original multicast information;
in response to the multicast enable signal being active, the transceiver control circuitry receives the multicast data and sends the multicast data packet to the next target processing core.
Further, the transceiver control circuit is further configured to:
generating the first multicast information according to the original multicast information and the core identification of the first processing core;
and generating the multicast data packet according to the first multicast information and the multicast data.
Further, in response to the multicast enable signal being inactive, the transceiver control circuitry is further to:
receiving a multicast data packet;
determining whether the multicast data needs to be forwarded according to first multicast information in the multicast data packet;
Determining a next target processing core according to the first multicast information in response to the need to forward the multicast data;
generating new first multicast information according to the first multicast information and the core identification of the first processing core;
generating a new multicast data packet according to the multicast data and the new first multicast information;
forwarding the new multicast data packet to the next target processing core.
Further, the data multicast circuit further includes:
address decoding circuitry to receive a core identification of a next target processing core to determine an address of the next target processing core.
Further, in response to the multicast enable signal being inactive, the transceiver control circuitry is further to:
and in response to the first processing core being the last processing core participating in the data multicast, sending a transmission end signal to all processing cores participating in the data multicast.
Further, the data multicast circuit further includes:
an endpoint processing core calculation circuit for determining the last processing core participating in the data multicast according to the original multicast information;
and the multicast ending signal generating circuit is used for generating a multicast ending signal according to the output result of the end point processing core calculating circuit and the transmission ending signal.
Further, the multicast enable signal generating circuit includes:
processing core representation vector generation circuitry for generating a representation vector of a core identification of the first processing core;
and the signal generating circuit is used for generating the multicast enabling signal according to the representation vector and the original multicast information. In a second aspect, an embodiment of the present disclosure provides a data multicast method, including:
generating a multicast enabling signal according to the original multicast information and the core identification of the first processing core; wherein the multicast information is used to identify all processing cores participating in the data multicast;
generating a multicast data packet according to the multicast enabling signal; wherein, the multicast data packet comprises multicast data and first multicast information; wherein the first multicast information is used to identify individual ones of the processing cores that have not participated in the data multicast. In a third aspect, embodiments of the present disclosure provide a chip comprising the data multicast circuit of any one of the first aspects.
In a fourth aspect, an embodiment of the present disclosure provides an electronic device, including: a memory for storing computer readable instructions; and one or more processors configured to execute the computer-readable instructions such that the processor, when executed, implements the data multicasting method according to any one of the preceding aspects.
In a fifth aspect, an embodiment of the present disclosure provides a non-transitory computer readable storage medium, wherein the non-transitory computer readable storage medium stores computer instructions for causing a computer to perform the data multicast method according to any one of the preceding first aspects.
In a sixth aspect, embodiments of the present disclosure provide a computer program product characterized by: comprising computer instructions which, when executed by a computing device, can perform the data multicasting method of any of the preceding aspects.
In a seventh aspect, embodiments of the present disclosure provide a computing device, including one or more chips according to the third aspect.
The embodiment of the disclosure discloses a data multicast circuit, a method and a chip. Wherein the data multicast circuit corresponds to a first processing core, comprising: a multicast enabling signal generating circuit for generating a multicast enabling signal according to the original multicast information and the core identification of the first processing core; wherein the original multicast information is used to identify all processing cores participating in the data multicast; the receiving and transmitting control circuit is used for generating a multicast data packet according to the multicast enabling signal; wherein the multicast data packet comprises multicast data and first multicast information; wherein the first multicast information is used to identify individual ones of the processing cores that have not participated in the data multicast. The data multicast circuit enables the processing cores to relay multicast data according to multicast information through the multicast enabling signal generating circuit and the receiving and transmitting control circuit, and solves the technical problems of complex hardware circuit and inflexible multicast control caused by multicast data among a plurality of processing cores in the prior art.
The foregoing description is only an overview of the disclosed technology, and may be implemented in accordance with the disclosure of the present disclosure, so that the above-mentioned and other objects, features and advantages of the present disclosure can be more clearly understood, and the following detailed description of the preferred embodiments is given with reference to the accompanying drawings.
Drawings
The above and other features, advantages, and aspects of embodiments of the present disclosure will become more apparent by reference to the following detailed description when taken in conjunction with the accompanying drawings. The same or similar reference numbers will be used throughout the drawings to refer to the same or like elements. It should be understood that the figures are schematic and that elements and components are not necessarily drawn to scale.
FIGS. 1a and 1b are schematic diagrams of prior art of the present disclosure;
fig. 2 is a schematic view of an application scenario in an embodiment of the disclosure;
fig. 3 is a schematic diagram of a data multicast circuit in an embodiment of the present disclosure;
fig. 4 is a schematic diagram of a multicast end signal generating circuit according to an embodiment of the present disclosure;
fig. 5 is a flowchart of a data multicast method provided in an embodiment of the present disclosure.
Detailed Description
Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure have been shown in the accompanying drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but are provided to provide a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the present disclosure are for illustration purposes only and are not intended to limit the scope of the present disclosure.
It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order and/or performed in parallel. Furthermore, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.
The term "including" and variations thereof as used herein are intended to be open-ended, i.e., including, but not limited to. The term "based on" is based at least in part on. The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments. Related definitions of other terms will be given in the description below.
It should be noted that the terms "first," "second," and the like in this disclosure are merely used to distinguish between different devices, modules, or units and are not used to define an order or interdependence of functions performed by the devices, modules, or units.
It should be noted that references to "one", "a plurality" and "a plurality" in this disclosure are intended to be illustrative rather than limiting, and those of ordinary skill in the art will appreciate that "one or more" is intended to be understood as "one or more" unless the context clearly indicates otherwise.
The names of messages or information interacted between the various devices in the embodiments of the present disclosure are for illustrative purposes only and are not intended to limit the scope of such messages or information.
Fig. 2 is an application scenario schematic diagram of an embodiment of the present disclosure. As shown in FIG. 2, the chip includes n processing cores, core-0 through core-n, respectively, that are interconnected together using a NOC implemented by a plurality of clusters (Cluster). When the processing core needs to execute data multicasting, the data needing multicasting is sent to the NOC, and the NOC is used for sending the data to the next processing core, the next processing core continues to execute the data multicasting, and the data multicasting is forwarded in sequence until the last processing core participating in the data multicasting.
Fig. 3 is a schematic diagram of a data multicast circuit according to an embodiment of the present disclosure. The data multicast circuit 300 provided in this embodiment corresponds to the first processing core, that is, each processing core corresponds to one data multicast circuit 300, and the data multicast circuit may be disposed inside the processing core, as shown in fig. 3, or may be disposed outside the processing core (not shown); the data multicast circuit 300 provided in this embodiment includes:
a multicast enable signal generating circuit bc_ EN (broadcast enable) 301 configured to generate a multicast enable signal bc_en according to the original multicast information coremap [ n:0] and the core identifier coreid of the first processing core; wherein the original multicast information is used to identify all processing cores participating in the data multicast;
A transceiver control circuit rx_ctrl302, configured to generate a multicast packet according to the multicast enable signal bc_en; wherein the multicast data packet comprises multicast data and first multicast information; wherein the first multicast information is used to identify individual ones of the processing cores that have not participated in the data multicast.
In this embodiment, the original multicast information coremap [ n:0] is set by a register; the upper layer program sets the original multicast information through the multicast information register of each processing core to control whether the processing core participates in data multicast. For example, there are 8 processing cores, including core 0-core 7; the upper layer program determines that the core0 needs to send data to other processing cores, such as core 1-core 7, sets the value of a multicast information register of each processing core 0-core 7 participating in data multicast to 0xff, namely, core [7:0] =1111_1111, namely, the original multicast information comprises 8 bits, each bit corresponds to one processing core, wherein core [0] corresponds to core0, if core [0] =1, the core0 participates in data multicast, and if core [0] =0, the core0 does not participate in data multicast; the correspondence between other bits in the original multicast information and other processing cores is similar, and will not be described again. And the multicast enabling signal generating circuit of the processing core generates a multicast enabling signal according to the core identification of the first processing core corresponding to the data multicast circuit and the original multicast information after receiving the original multicast information.
Optionally, the multicast enable signal generating circuit further includes:
processing core representation vector generation circuitry 3011 to generate a representation vector of a core identification of the first processing core;
and a signal generating circuit 3012, configured to generate the multicast enable signal according to the representation vector and original multicast information.
Illustratively, the processing core representation vector generation circuit is implemented by a shift circuit for shifting the value 1 left, the number of left shifts being related to the core identification of the first processing core. For example, if the first processing core is core0 and its core identifier is 0, the core identifier is used as an input value of a shift circuit, and the shift circuit shifts the value 1 by 0 bit to the left, so as to obtain a representation vector of the core0 as follows: 0000_0001. The number of bits of the output of the shift circuit is equal to the number of processing cores, e.g. 8 processing cores are included in the above example, and the representation vector output by the shift circuit comprises 8 bits, each bit being represented by 0 or 1.
Illustratively, the signal generating circuit 3012 has a plurality of logic circuit implementations; as shown in fig. 3, the signal generating circuit 3012 includes n not gates, where n is the number of processing cores; each not gate corresponds to one data bit of the shift circuit, i.e. each bit output of the shift circuit is then inverted by one not gate. Taking the above representation vector of core0 as an example, each bit is inverted by an NOT circuit to obtain 1111_1110. As shown in fig. 3, the signal generating circuit 3012 further includes n and gates, where each and gate includes two input terminals, one of the input terminals is configured to receive a value of one bit in the original multicast information, and the other input terminal is connected to one of the plurality of not gates, and is configured to receive an input of the not gate, that is, each of the plurality of and gates is configured to logically and the output of the not gate with a value of a corresponding position in the original multicast information; taking core0 as an example, it indicates that the vector is inverted by the not gate to obtain 1111_1110, and the original multicast information is 1111_1111, and then the two are logically and operated according to the bits to obtain: 1111_1110. As shown in fig. 3, the signal generating circuit 3012 further includes an or circuit, where the or circuit includes n input terminals, each input terminal corresponds to the multiple and circuits one by one, and the and circuit performs a logical or operation on the values input by the n input terminals to obtain an output signal, that is, the multicast enable signal bc_en. Taking core0 as an example, the value of the or gate is 1111_1110, and the or gate performs an or operation on 1111_1110 by bits to obtain a value of 1, that is, for core0, the multicast enable signal bc_en=1.
For other processing cores, the computation process is similar to the process described above to generate the corresponding bc_en signal, which is not described in detail herein.
Optionally, the transceiver control circuit is further configured to:
judging whether the first processing core is a main processing core or not according to the original multicast information;
validating the multicast enable signal in response to the first processing core being a primary processing core;
the multicast enable signal is disabled in response to the first processing core not being a master processing core.
Wherein the main processing core is the first processing core in the data multicast process, see the above example, where core0 is the first processing core participating in data multicast, and core0 is the main processing core. As shown in fig. 3, the original multicast information coremap and the core identifier coreid of the first processing core are input into the transceiver control circuit, and the transceiver control circuit determines whether the first processing core is a main processing core according to the original multicast information and the core identifier of the first processing core. For example, whether the first processing core is the main processing core is determined by the least significant bit with value 1 in the original multicast information, as in the above example, the least significant bit with value 1 is coremap [0] =1 if the original multicast information is 1111_1111, which indicates that core0 is the main processing core among the cores participating in the multicast. Then, when the core identifier of the first processing core is 0, it may be determined that the first processing core, that is, core0 is the main processing core; when the core identification of the first processing core is 1, core1 is not the main processing core.
In response to the first processing core being a master processing core, the transceiver control circuit enables the multicast enabling signal to be valid, namely the master processing core needs to send multicast data to a next target processing core next; in response to the first processing core not being a master processing core, the multicast enable signal is deactivated, i.e. the slave processing core does not transmit multicast data at that time, but waits to receive the multicast data from the other processing cores.
Optionally, the data multicast circuit further includes:
a starting point processing core calculation circuit 303, configured to calculate a next target processing core of the multicast data according to the original multicast information;
in response to the multicast enable signal being active, the transceiver control circuitry receives the multicast data and sends the multicast data packet to the next target processing core.
The starting point processing core calculation circuit is used in this embodiment to calculate the first target processing core in the data multicast process, i.e. the next target processing core of the main processing core. Optionally, determining a next target processing core of the multicast data through the position of the second value of 1 from the low order to the high order of the original multicast information; for example, the original multicast information is 1111_1111, the second value from the lower level to the higher level is 1 is coremap [1], and core1 is the next target processing core of the multicast data, i.e. the multicast data is sent from core0 to core1. Because the receiving and transmitting control circuit judges that the multicast enabling signal is valid at this time, the receiving and transmitting control circuit receives the multicast data and transmits a multicast data packet to the next target processing core.
Optionally, the transceiver control circuit 302 is further configured to:
generating the first multicast information according to the original multicast information and the core identification of the first processing core;
and generating the multicast data packet according to the first multicast information and the multicast data.
In the above step, in response to the multicast enable signal being valid, that is, when the first processing core is the master core, the transceiver control circuit generates first multicast information, where the first multicast information is used to identify each processing core that has not participated in the data multicast in all the processing cores. For example, if the original multicast information is 1111_1111, the first processing core is the main processing core0 in the data multicast process, the core identifier of the first processing core is 0, which represents a vector of 0000_0001, and the value of the core identifier corresponding to the first processing core is subtracted from the original multicast information to obtain the first multicast information as 1111_1110, that is, each processing core that has not participated in multicast is core1-core7. After generating the first multicast information, combining the first multicast information and the multicast data into a multicast data packet.
Alternatively, the multicast packet [ n+m:0] may be represented by coremap1[ n:0] _Data [ m:0], where the low m+1 bits in the multicast packet are multicast Data and the high n+1 bits are the first multicast information. After generating the multicast data packet, the transceiving control circuitry transmits the multicast data packet to the next target processing core over a Network On Chip (NOC).
Optionally, in response to the multicast enable signal being inactive, the transceiver control circuit is further configured to:
receiving a multicast data packet; determining whether the multicast data needs to be forwarded according to first multicast information in the multicast data packet; determining a next target processing core according to the first multicast information in response to the need to forward the multicast data; generating new first multicast information according to the first multicast information and the core identification of the first processing core; generating a new multicast data packet according to the multicast data and the new first multicast information; forwarding the new multicast data packet to the next target processing core.
In this alternative embodiment, the multicast enable signal is inactive, i.e. the first processing core is not the master core in the data multicast process, in which case the transceiving control circuitry receives multicast data packets directly from the other processing cores; after receiving the multicast data packet, determining whether the multicast data needs to be forwarded according to first multicast information in the multicast data packet. Specifically, by setting the value of the position corresponding to the first processing core in the first multicast information to 0 to determine whether forwarding of the multicast data is needed, and by way of example, if the core identifier of the first processing core is 1, that is, if the first processing core is core1, the first multicast information in the received multicast data packet is 1111_1110, setting the value of the position corresponding to the core identifier bit in the first multicast information to 0 to obtain 1111_1100, and then performing logical OR operation on 1111_1100 according to the bit to obtain the result of 1, that is, after the first processing core is removed, there are other processing cores on the multicast data link, so that forwarding of the multicast data is determined to be needed. And when the multicast data needs to be forwarded, determining a next target processing core according to the first multicast information.
The processing core corresponding to the position with the first value of 1 after the position of the first processing core in the first multicast information is the next target processing core, as shown in the above example, the first multicast information is 1111_1110, the core identifier of the first processing core is 1, and then the position with the first value of 1 is coremap1[2] =1, and the next target processing core is core2.
Setting the value of the position corresponding to the first processing core in the first multicast information to be 0, so as to obtain new first multicast information, if the received first multicast information is 1111_1110, and the core identifier of the first processing core is 1, setting coremap [1] in the first multicast information to be 0, so as to obtain new first multicast information coremap 1=1111_1100. The new first multicast information is used to identify individual ones of the processing cores that have not participated in the data multicast.
And then synthesizing the multicast data and the new first multicast information into a multicast data packet according to the same mode, and forwarding the multicast data packet to the next target processing core.
Optionally, the data multicast circuit further includes an address decoding circuit 304, which is configured to receive a core identifier of a next target processing core to determine an address of the next target processing core. As described above, the transceiver control circuit determines the next target processing core, i.e. the core identifier of the next target processing core is obtained; each core identifier is unique in the network-on-chip, and the core identifier can be used as the address of the processing core in the network-on-chip, or the address of the processing core in the network-on-chip corresponds to the core identifier of the processing core, and the address of the processing core in the network-on-chip corresponding to the core identifier can be queried through the core identifier. The address decoding circuit determines the address of the next target processing core on the network-on-chip through the core identification of the next target processing core, and then the network-on-chip sends the multicast data packet to the next target processing core after obtaining the multicast data packet and the address of the target processing core of the multicast data packet.
Optionally, the data multicast circuit further includes a local data buffer circuit and a local data address storage circuit, when the first processing core receives the multicast data, whether the first processing core forwards the multicast data or not, the multicast data is cached in the local data buffer circuit, and the local storage address of the multicast data is stored in the local data address storage circuit, and when the local memory of the first processing core is idle, the multicast data is stored in the local storage address designated in the local data address storage circuit.
Optionally, the data multicast circuit further includes a transmit data buffer circuit and a transmit data address storage circuit, when the first processing core transmits or forwards the multicast data packet, it firstly transmits multicast data to the transmit data buffer circuit, the transmit data address storage circuit is configured to store the address of the next target processing core determined by the address decoding circuit, and when the network on chip is idle, transmit the multicast data packet of the transmit data buffer circuit to the address indicated by the transmit data address storage circuit.
Optionally, in response to the multicast enable signal being inactive, the transceiver control circuit is further configured to: and in response to the first processing core being the last processing core participating in the data multicast, sending a transmission end signal to all processing cores participating in the data multicast. In this embodiment, in response to the multicast enable signal being invalid, the first processing core is indicated as a slave processing core, where if the transceiver control circuit determines that there is no next target processing core, the first processing core is the last processing core participating in the data multicast, and, illustratively, when a result of the logical or operation is 0, or values of positions other than the position corresponding to the first processing core in the first multicast information are all 0, the first processing core is indicated as the last processing core, where the first processing core sends a transmission end signal trans_done to all the processing cores participating in the data multicast.
Further, the data multicast circuit further includes:
endpoint processing core calculation circuitry 305 for determining the last processing core to participate in the data multicast based on the original multicast information;
And the multicast end signal generating circuit 306 is configured to generate a multicast end signal according to the output result of the endpoint processing core computing circuit and the transmission end signal.
The endpoint processing core calculation circuit 305 determines the core identifier of the last processing core participating in the data multicast according to the original multicast information, specifically, may determine the last processing core by determining the most significant 1 in the original multicast information. For example, the original multicast information is 1111_1111, then the most significant 1 is coremap [7] =1, then core7 is the last processing core in the data multicast, if the original multicast information is 0011_1110, then the most significant 1 is coremap [5] =1, then core5 is the last processing core in the data multicast.
The multicast end signal generating circuit 306 is configured to generate a multicast end signal of the first processing core according to the calculation result of the endpoint processing core calculating circuit 305 and the transmission end signal. Optionally, the multicast end signal generating circuit 306 is implemented by a data selector (MUX), and fig. 4 is a schematic diagram of one implementation of the multicast end signal generating circuit; as shown in fig. 4, the multicast end signal generating circuit and the data selector are implemented, and the multicast end signal generating circuit includes a plurality of first input terminals for receiving the end signal of each processing core, and if the chip includes 8 processing cores, the data selector includes 8 first input terminals; the data selector further includes a second input terminal, in the above example, it may include 8 second input terminals to receive the output result of the endpoint processing core computing circuit 305, where the output result of the endpoint processing core computing circuit 305 is a representation vector of the core identifier of the last processing core, and then perform logic and operation on the representation vector and the input signal of the first input terminal, and output the computation result through 8 output ports; the multicast end signal trans-done i can thus only be generated if the end of transmission signal sent by the last processing core is received.
Illustratively, the coremap [7:0] =1111_1111, the output result of the endpoint processing core computing circuit 305 is 1000_000, at this time, the data selector will output the trans_done_7 from the output port numbered 7 only when the value of the trans_done of the core7 is 1, and the other output ports are all 0 because the output result of the endpoint processing core computing circuit 305 is 1000_000, so that the master core and the other processing cores can generate the multicast end signal to end the data multicast process after the last processing core sends the transmission end signal. Specifically, the transceiver control circuit is further configured to: the multicast end signal is received to end the data multicast.
Fig. 5 is a flowchart of a data multicast method provided in an embodiment of the present disclosure. As shown in fig. 5, the method is used in a first processing core, and includes the following steps:
s501, generating a multicast enabling signal according to the original multicast information and the core identification of the first processing core; wherein the multicast information is used to identify all processing cores participating in the data multicast;
s502, generating a multicast data packet according to the multicast enabling signal; wherein, the multicast data packet comprises multicast data and first multicast information; wherein the first multicast information is used to identify individual ones of the processing cores that have not participated in the data multicast.
Further, the data multicast method further includes:
judging whether the first processing core is a main processing core or not according to the original multicast information;
validating the multicast enable signal in response to the first processing core being a primary processing core;
the multicast enable signal is disabled in response to the first processing core not being a master processing core.
Further, the data multicast method further includes:
calculating a next target processing core of the multicast data according to the original multicast information;
and receiving the multicast data and transmitting the multicast data packet to the next target processing core in response to the multicast enable signal being valid.
Further, the data multicast method further includes:
generating the first multicast information according to the original multicast information and the core identification of the first processing core;
and generating the multicast data packet according to the first multicast information and the multicast data. Further, in response to the multicast enable signal being inactive, the data multicast method further comprises:
receiving a multicast data packet;
determining whether the multicast data needs to be forwarded according to first multicast information in the multicast data packet;
Determining a next target processing core according to the first multicast information in response to the need to forward the multicast data;
generating new first multicast information according to the first multicast information and the core identification of the first processing core;
generating a new multicast data packet according to the multicast data and the new first multicast information;
forwarding the new multicast data packet to the next target processing core.
Further, the data multicast method further includes:
a core identification of a next target processing core is received to determine an address of the next target processing core.
Further, the data multicast method further includes:
and in response to the first processing core being the last processing core participating in the data multicast, sending a transmission end signal to all processing cores participating in the data multicast.
Further, the data multicast method further includes:
determining the last processing core participating in the data multicasting according to the original multicasting information;
and generating a multicast ending signal according to the output result of the end point processing core computing circuit and the transmission ending signal.
Further, the data multicast method further includes:
The multicast end signal is received to end the data multicast.
Further, the generating a multicast enabling signal according to the original multicast information and the core identifier of the first processing core includes:
generating a representation vector of a core identification of the first processing core;
and generating the multicast enabling signal according to the representation vector and the original multicast information.
The steps in the data multicast method are the steps executed by the data multicast circuit, and details of the specific execution process of the steps can be referred to the description of the data multicast circuit, which is not repeated herein.
In the foregoing, although the steps in the foregoing method embodiments are described in the foregoing order, it should be clear to those skilled in the art that the steps in the embodiments of the disclosure are not necessarily performed in the foregoing order, but may be performed in reverse order, parallel, cross, etc., and other steps may be further added to those skilled in the art on the basis of the foregoing steps, and these obvious modifications or equivalent manners are also included in the protection scope of the disclosure and are not repeated herein.
The disclosed embodiments also provide a chip comprising any of the above embodiments of the data multicast circuit.
The following illustrates the operation of the data multicast circuit in the embodiments of the present disclosure in a practical application scenario.
The chip includes 8 processing cores, core0-core7, each including a data multicast circuit as described in the above embodiments. Each processing core further includes a multicast information register and a core identification register, which are set by an upper layer program to hold original multicast information coremap [ n:0] before data multicasting. Each bit in coremap [ n:0] corresponds to a processing core, and is configured to identify whether the corresponding processing core participates in data multicast, and in the data multicast process, configure original multicast information coremap [7:0] =1111_1111, and each processing core in the chip participates in the data multicast.
The transceiver control circuit receives the multicast enable signal bc_en generated by each processing core through the multicast enable signal generating circuit, and also receives the coremap [7:0] and the core identification coreid. The receiving and transmitting control circuit judges which processing core is the main core according to coremap and coreid. In this example, coremap [7:0] =1111_1111, where coremap [0] =1 indicates that the lowest bit with value 1 is the master core, and thus core0 is the master core and core1-core7 are the slaves. Thus, the transmit-receive control circuit asserts the multicast enable signal generated in core0 and negates the multicast enable signal generated in core1-core7, forwarding multicast data from the master core0, and the other cores receive and forward multicast packets.
The starting point processing core calculation circuit in the main core0 calculates that the next target processing core is core1, the receiving and transmitting control circuit generates first multicast information coremap1[7:0] =1111_1110, receives multicast data, combines the multicast data with the first multicast information to generate a multicast data packet, and forwards the multicast data packet to the next target processing core1, and meanwhile, core0 stores the multicast data in a local memory.
After receiving the multicast data packet sent by the core0, the core1 obtains first multicast information coremap1[7:0] =1111_1110 in the multicast data packet, thereby determining that the next target processing core is core2, updating the first multicast information into new first multicast information coremap1[7:0] =1111_1100 by a transceiving control circuit of the core1, merging the multicast data with the new first multicast information to generate a new multicast data packet, and forwarding the new multicast data packet to the next target processing core2, and meanwhile, storing the multicast data in a local memory by the core 1. core2-core6 performs similar operations to core1 in turn, forwards the multicast data in turn, and saves the multicast data to its own local memory.
After the core7 receives the multicast data packet, determining that there is no next target processing core, and then the core7 generates a transmission end signal to all processing cores participating in data multicast; and after receiving the transmission end signal of the core7, performing logic AND operation on the calculation result of the end processing core calculation circuit and the received multicast end signal by the multicast end signal generation circuit to obtain a multicast end signal, and sending the multicast end signal to the receiving and transmitting control circuit to end the data multicast process.
In the embodiment of the disclosure, multicast data is transmitted among a plurality of processing cores in a relay way through a data multicast circuit, so that the network-on-chip design complexity of the multi-core processor chip is reduced, and corresponding physical windings are reduced; due to the arrangement of the multicast information, the technical scheme of the present disclosure supports arbitrary multicast core combination, and a plurality of network groups can be formed by combining the multicast information, so that the maximum flexibility is given to the application; in the present disclosure, a data multicast circuit multicasts data in parallel while receiving the data, and as the data amount and the core number increase, the transmission efficiency is improved by times; in the method, through transmitting the end signal, the delay of transmission is reduced, the transmission efficiency is further improved, and the network design dependence is reduced.
An embodiment of the present disclosure provides an electronic device, including: a memory for storing computer readable instructions; and one or more processors configured to execute the computer-readable instructions such that the processors, when executed, implement the data multicasting method according to any one of the embodiments.
The disclosed embodiments also provide a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the data multicast method according to any of the preceding embodiments.
The disclosed embodiments also provide a computer program product, wherein the computer program product is characterized by: comprising computer instructions which, when executed by a computing device, can perform the data multicasting method according to any of the preceding embodiments.
The disclosed embodiments also provide a computing device, comprising a chip as described in any of the embodiments.
The flowcharts and block diagrams in the figures of this disclosure illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units involved in the embodiments of the present disclosure may be implemented by means of software, or may be implemented by means of hardware. Wherein the names of the units do not constitute a limitation of the units themselves in some cases.
The functions described above herein may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), an Application Specific Standard Product (ASSP), a system on a chip (SOC), a Complex Programmable Logic Device (CPLD), and the like.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

Claims (10)

1. A data multicast circuit corresponding to a first processing core, comprising:
a multicast enabling signal generating circuit for generating a multicast enabling signal according to the original multicast information and the core identification of the first processing core; wherein the original multicast information is used to identify all processing cores participating in the data multicast;
the receiving and transmitting control circuit is used for generating a multicast data packet according to the multicast enabling signal; wherein the multicast data packet comprises multicast data and first multicast information; wherein the first multicast information is used to identify individual ones of the processing cores that have not participated in the data multicast.
2. The data multicast circuit as recited in claim 1, wherein said transceiving control circuit is further configured to:
judging whether the first processing core is a main processing core or not according to the original multicast information;
validating the multicast enable signal in response to the first processing core being a primary processing core;
the multicast enable signal is disabled in response to the first processing core not being a master processing core.
3. The data multicast circuit according to claim 2, wherein the data multicast circuit further comprises:
A starting point processing core calculating circuit for calculating a next target processing core of the multicast data according to the original multicast information;
in response to the multicast enable signal being active, the transceiver control circuitry receives the multicast data and sends the multicast data packet to the next target processing core.
4. The data multicast circuit as recited in claim 3 wherein said transceiving control circuit is further configured to:
generating the first multicast information according to the original multicast information and the core identification of the first processing core;
and generating the multicast data packet according to the first multicast information and the multicast data.
5. The data multicast circuit as recited in claim 2, wherein in response to the multicast enable signal being inactive, the transceiving control circuit is further to:
receiving a multicast data packet;
determining whether the multicast data needs to be forwarded according to first multicast information in the multicast data packet;
determining a next target processing core according to the first multicast information in response to the need to forward the multicast data;
generating new first multicast information according to the first multicast information and the core identification of the first processing core;
Generating a new multicast data packet according to the multicast data and the new first multicast information;
forwarding the new multicast data packet to the next target processing core.
6. The data multicast circuit according to any of claims 1-5, wherein the data multicast circuit further comprises:
address decoding circuitry to receive a core identification of a next target processing core to determine an address of the next target processing core.
7. The data multicast circuit as recited in claim 2, wherein in response to the multicast enable signal being inactive, the transceiving control circuit is further to:
and in response to the first processing core being the last processing core participating in the data multicast, sending a transmission end signal to all processing cores participating in the data multicast.
8. The data multicast circuit as recited in claim 7, further comprising:
an endpoint processing core calculation circuit for determining the last processing core participating in the data multicast according to the original multicast information;
and the multicast ending signal generating circuit is used for generating a multicast ending signal according to the output result of the end point processing core calculating circuit and the transmission ending signal.
9. The data multicast circuit as recited in claim 1, wherein the multicast enable signal generating circuit comprises:
processing core representation vector generation circuitry for generating a representation vector of a core identification of the first processing core;
and the signal generating circuit is used for generating the multicast enabling signal according to the representation vector and the original multicast information.
10. A method of multicasting data for use in a first processing core, comprising:
generating a multicast enabling signal according to the original multicast information and the core identification of the first processing core;
wherein the multicast information is used to identify all processing cores participating in the data multicast;
generating a multicast data packet according to the multicast enabling signal; wherein, the multicast data packet comprises multicast data and first multicast information; wherein the first multicast information is used to identify individual ones of the processing cores that have not participated in the data multicast.
CN202010787623.3A 2020-08-07 2020-08-07 Data multicast circuit, method, electronic device, and computer-readable storage medium Active CN114095289B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010787623.3A CN114095289B (en) 2020-08-07 2020-08-07 Data multicast circuit, method, electronic device, and computer-readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010787623.3A CN114095289B (en) 2020-08-07 2020-08-07 Data multicast circuit, method, electronic device, and computer-readable storage medium

Publications (2)

Publication Number Publication Date
CN114095289A CN114095289A (en) 2022-02-25
CN114095289B true CN114095289B (en) 2023-05-12

Family

ID=80295286

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010787623.3A Active CN114095289B (en) 2020-08-07 2020-08-07 Data multicast circuit, method, electronic device, and computer-readable storage medium

Country Status (1)

Country Link
CN (1) CN114095289B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106254254A (en) * 2016-09-19 2016-12-21 复旦大学 A kind of network-on-chip communication means based on Mesh topological structure
CN107005492A (en) * 2014-12-17 2017-08-01 英特尔公司 The system of multicast and reduction communication in on-chip network
CN109408257A (en) * 2018-11-09 2019-03-01 北京灵汐科技有限公司 Data transmission method, device and electronic equipment for network-on-chip NOC
US10608640B1 (en) * 2019-05-10 2020-03-31 Achronix Semiconductor Corporation On-chip network in programmable integrated circuit
CN111382114A (en) * 2018-12-28 2020-07-07 北京灵汐科技有限公司 Data transmission method and device for network on chip and electronic equipment
CN111382115A (en) * 2018-12-28 2020-07-07 北京灵汐科技有限公司 Path creating method and device for network on chip and electronic equipment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107005492A (en) * 2014-12-17 2017-08-01 英特尔公司 The system of multicast and reduction communication in on-chip network
CN106254254A (en) * 2016-09-19 2016-12-21 复旦大学 A kind of network-on-chip communication means based on Mesh topological structure
CN109408257A (en) * 2018-11-09 2019-03-01 北京灵汐科技有限公司 Data transmission method, device and electronic equipment for network-on-chip NOC
CN111382114A (en) * 2018-12-28 2020-07-07 北京灵汐科技有限公司 Data transmission method and device for network on chip and electronic equipment
CN111382115A (en) * 2018-12-28 2020-07-07 北京灵汐科技有限公司 Path creating method and device for network on chip and electronic equipment
US10608640B1 (en) * 2019-05-10 2020-03-31 Achronix Semiconductor Corporation On-chip network in programmable integrated circuit

Also Published As

Publication number Publication date
CN114095289A (en) 2022-02-25

Similar Documents

Publication Publication Date Title
US11223573B2 (en) Shortcut routing on segmented directional torus interconnection networks
US8654798B2 (en) Barrier synchronization apparatus, barrier synchronization system, and barrier synchronization method
CN1322452C (en) Arithmetric functions in torus and tree networks
CN108400880B (en) Network on chip, data transmission method and first switching node
CN113098773B (en) Data processing method, device and system
CN108199985B (en) NoC arbitration method based on global node information in GPGPU
US20220015588A1 (en) Dual mode interconnect
CN116383114B (en) Chip, chip interconnection system, data transmission method, electronic device and medium
CN114095289B (en) Data multicast circuit, method, electronic device, and computer-readable storage medium
CN112073321A (en) Information processing method, interconnection apparatus, and computer-readable storage medium
CN105550157A (en) Fractal tree structure commutation structure and method, control device and intelligent chip
CN115994040A (en) Computing system, method for data broadcasting and data reduction, and storage medium
Ueno et al. VCSN: Virtual circuit-switching network for flexible and simple-to-operate communication in HPC FPGA cluster
EP3822776A1 (en) System and method for transaction broadcast in a network-on-chip
CN112905523B (en) Chip and inter-core data transmission method
US10990552B1 (en) Streaming interconnect architecture for data processing engine array
CN114363246A (en) Many-core network-on-chip data transmission method, device, equipment and medium
CN115225708B (en) Message forwarding method computer equipment and storage medium
US11973697B2 (en) Composing diverse remote cores and FPGAs
CN117155846B (en) Routing method, device, computer equipment and storage medium of interconnection network
CN112988653B (en) Data processing circuit, device and method
CN116821044B (en) Processing system, access method and computer readable storage medium
CN112866180B (en) Data processing circuit, apparatus and method
WO2022007587A1 (en) Switch and data processing system
US20220391666A1 (en) Distributed Deep Learning System and Distributed Deep Learning Method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address

Address after: Room 201, No. 6 Fengtong Heng Street, Huangpu District, Guangzhou City, Guangdong Province, 510530

Patentee after: Guangzhou Ximu Semiconductor Technology Co.,Ltd.

Country or region after: China

Address before: 100080 202-24, building 6, yard 1, gaolizhang Road, Haidian District, Beijing

Patentee before: Beijing SIMM Computing Technology Co.,Ltd.

Country or region before: China