WO2024016661A1 - Proof-of-work chip and electronic device - Google Patents

Proof-of-work chip and electronic device Download PDF

Info

Publication number
WO2024016661A1
WO2024016661A1 PCT/CN2023/077718 CN2023077718W WO2024016661A1 WO 2024016661 A1 WO2024016661 A1 WO 2024016661A1 CN 2023077718 W CN2023077718 W CN 2023077718W WO 2024016661 A1 WO2024016661 A1 WO 2024016661A1
Authority
WO
WIPO (PCT)
Prior art keywords
unit
data
arbitration
units
routing
Prior art date
Application number
PCT/CN2023/077718
Other languages
French (fr)
Chinese (zh)
Inventor
刘明
蔡凯
田佩佳
张雨生
闫超
Original Assignee
声龙(新加坡)私人有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 声龙(新加坡)私人有限公司 filed Critical 声龙(新加坡)私人有限公司
Publication of WO2024016661A1 publication Critical patent/WO2024016661A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/25Routing or path finding in a switch fabric
    • H04L49/253Routing or path finding in a switch fabric using establishment or release of connections between ports
    • H04L49/254Centralised controller, i.e. arbitration or scheduling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication
    • G06F15/17Interprocessor communication using an input/output type connection, e.g. channel, I/O port
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication
    • G06F15/173Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
    • G06F15/17306Intercommunication techniques
    • G06F15/17312Routing techniques specific to parallel machines, e.g. wormhole, store and forward, shortest path problem congestion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7807System on chip, i.e. computer system on a single chip; System in package, i.e. computer system on one or more chips in a single package
    • G06F15/781On-chip cache; Off-chip memory
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/10Packet switching elements characterised by the switching fabric construction
    • H04L49/109Integrated on microchip, e.g. switch-on-chip

Definitions

  • Embodiments of the present disclosure relate to but are not limited to the field of computer application technology, and particularly refer to a proof-of-work chip and electronic equipment.
  • Proof of Work is a hash function that can use Central Processing Unit (CPU), graphics Processor (Graphic Processing Unit, GPU) or Field-Programmable Gate Array (FPGA) to solve it.
  • CPU Central Processing Unit
  • GPU Graphic Processing Unit
  • FPGA Field-Programmable Gate Array
  • the process of solving requires random address access to a large data set, and the entire data set is generally stored in memory or video memory.
  • computing power is directly proportional to data bandwidth, so high on-chip bandwidth is required.
  • traditional CPU, GPU or FPGA structures cannot solve this problem well.
  • Embodiments of the present disclosure provide a proof-of-work chip, including at least one set of computing units, at least one set of storage units, at least two first data nodes, and at least two second data nodes, each of the first data nodes
  • Each of the second data nodes includes multiple sets of first inlets and multiple sets of first outlets.
  • the set of computing units is connected to a set of first inlets of a first data node in a one-to-one correspondence.
  • the first data A group of first outlets of a node is connected to a group of storage units in a one-to-one correspondence.
  • the group of storage units is connected to a group of first entrances of a second data node in a one-to-one correspondence.
  • a group of first entrances of the second data node are connected in a one-to-one correspondence.
  • An outlet is connected to the group of computing units in a one-to-one correspondence; wherein:
  • the computing unit is configured to send a message for requesting data to the storage unit through the first data node when performing proof-of-work calculations;
  • the storage unit is configured to store a data set used in the proof-of-work calculation, and in response to a message from the computing unit, sends a message containing the data to the computing unit through the second data node;
  • the first data node is configured to send a message requesting data sent by the computing unit to the storage unit;
  • the second data node is configured to send the message containing data sent by the storage unit to the computing unit.
  • An embodiment of the present disclosure also provides an electronic device including the above-mentioned proof-of-work chip.
  • Figure 1 is a schematic structural diagram of a proof-of-work chip provided by an embodiment of the present disclosure
  • Figure 2 is a schematic structural diagram of another data node provided by an embodiment of the present disclosure.
  • Figure 3 is a schematic diagram of a compression unit and decompression unit according to an embodiment of the present disclosure
  • Figure 4 is a schematic structural diagram of a data exchange subunit provided by an embodiment of the present disclosure.
  • Figure 5A is a schematic diagram including four first data nodes provided by an embodiment of the present disclosure.
  • Figure 5B is a schematic diagram including four second data nodes provided by an embodiment of the present disclosure.
  • Figure 6 is a schematic connection diagram of a data switching unit when including six first data nodes (or second data nodes) provided by an embodiment of the present disclosure
  • Figure 7 is a schematic diagram of the internal structure of the data subunit in the data exchange unit shown in Figure 6;
  • Figure 8 is a schematic connection diagram of a data switching unit when nine first data nodes (or second data nodes) are included according to an embodiment of the present disclosure
  • Figure 9 is a schematic node entry of the internal structure of the data subunit in the data exchange unit shown in Figure 8;
  • FIG. 10 is a schematic diagram of an electronic device including a proof-of-work chip provided by an embodiment of the present disclosure.
  • FIG. 1 is a schematic structural diagram of a proof-of-work chip provided by an embodiment of the present disclosure.
  • the proof-of-work chip includes at least one set of computing units, at least one set of storage units, at least two first data nodes and at least two second data nodes.
  • each data node includes multiple groups of first node entrances (the first node entrance is the first entrance of the data node, referred to as the first entrance in the text), multiple groups The first node exit (the first node exit is the first exit of the data node, referred to as the first exit in the text), the set of computing units is connected to a set of first entrances of a first data node in a one-to-one correspondence, A group of first outlets of the first data node is connected to a group of storage units in a one-to-one correspondence, and the group of storage units is connected to a group of first inlets of a second data node in a one-to-one correspondence.
  • the second data A group of first outlets of a node is connected to a group of computing units in a one-to-one correspondence, that is, a group of computing units is connected to a group of storage units through at least 2 first data nodes, and the group of storage units is connected to a group of storage units through at least 2 second data nodes.
  • the node is connected to the set of computing units; where:
  • the computing unit is configured to send a message for requesting data to the storage unit through the first data node when performing proof-of-work calculations;
  • the storage unit is configured to store a data set used in the proof-of-work calculation, and in response to a message from the computing unit, sends a message containing the data to the computing unit through the second data node;
  • the first data node is configured to send a message requesting data sent by the computing unit to the storage unit;
  • the second data node is configured to send the message containing data sent by the storage unit to the computing unit.
  • Each of the above data nodes includes multiple sets of first inlets and multiple sets of first outlets.
  • the computing unit and the storage unit When connected to the computing unit and the storage unit, it can be one of the multiple sets of first nodes and the corresponding unit (computing unit or storage unit) connection, or multiple sets of first inlets and multiple sets of first outlets can also be understood as multiple first inlets and multiple first outlets.
  • the multiple first inlets or outlets are connected to a corresponding set of units (computing units or storage units). unit) connection, as shown in Figure 1.
  • a three-dimensional network structure including a request transmission layer and a data transmission layer can be formed.
  • message transmission from the computing unit to the storage unit is implemented through the first data node network, that is, at the request transmission layer, the computing unit sends a request to the storage unit through the first data node; storage is implemented through the second data node network Message transmission in the direction of the unit to the computing unit, that is, at the data transmission layer, the storage unit returns the requested data to the computing unit through the second data node.
  • the workload proof chip may include at least one group of computing units, at least one group of storage units, a first data node and a second data node.
  • the first data node and the second data node Each includes a group (multiple) of first entrances and a group (multiple) of first exits.
  • the group of computing units are connected to a group of first entrances of the first data node in a one-to-one correspondence.
  • the first data node A group of first outlets is connected to a group of storage units in a one-to-one correspondence.
  • the group of storage units is connected to a group of first inlets of the second data node in a one-to-one correspondence.
  • a group of first outlets of the second data node is connected in a one-to-one correspondence.
  • There is a one-to-one connection with a group of computing units that is, a group of computing units is connected to a group of storage units through a first data node, and the group of storage units is connected to the group of computing units through a second data node.
  • Figure 1 takes two first data nodes and two second data nodes as an example.
  • the two first data nodes are connected to each other, and the two second data nodes are connected to each other.
  • a group of computing units passes through two One interconnected first data node is connected to two groups of storage units, and one group of storage units is connected to two groups of computing units through two interconnected second data nodes.
  • the chip When the chip includes multiple first data nodes and multiple second data nodes, for example, including y first data nodes and y second data nodes, then a group of computing units are connected through y first data nodes.
  • the data nodes are connected to y groups of storage units, and a group of storage units are connected to y groups of computing units through y interconnected second data nodes.
  • the y first data nodes are interconnected to form a grid structure, and the y second data nodes are interconnected to form a grid structure.
  • y is a positive integer greater than or equal to 2.
  • the first data nodes may be interconnected using a grid
  • the second data nodes may be interconnected using a grid.
  • Each data node includes Multiple routing units, multiple arbitration units, a data exchange unit, an interconnection unit, multiple first entrances (i.e., the aforementioned first node entrances), multiple first exits (i.e., the aforementioned first node exits), one or A plurality of second node entrances (the second node entrances are the second entrances of the data node, referred to as second entrances herein) and one or more second node exits (the second node exits are the third entrances of the data node)
  • Two outlets herein referred to as the second outlet
  • the input end of each routing unit is connected to one of the first inlets
  • the first output end of each routing unit is connected to the first input end of one of the arbitration units.
  • each routing unit is connected to the first input end of the data exchange unit, the first output end of the data exchange unit is connected to the second outlet, the The second input end of the data exchange unit is connected to the second inlet, the second output end of the data exchange unit is connected to the second input end of each arbitration unit, and the output end of each arbitration unit is connected to The input end of the interconnection unit is connected in a one-to-one correspondence, the output end of the interconnection unit is connected in a one-to-one correspondence with the first outlet, and the second inlet and the second outlet are configured to be connected to other data nodes, where:
  • the routing unit is configured to receive messages from the first portal and send the messages to the arbitration unit and/or the data exchange unit;
  • the data exchange unit is configured to receive messages from the second entrance, send the messages to the arbitration unit, and is configured to receive messages sent from the routing unit, and output the messages through the second outlet. ;
  • the arbitration unit is configured to receive messages sent by the routing unit and/or the data exchange unit, and send the messages to the first outlet through the interconnection unit.
  • the interconnection unit can realize sending the message sent by the arbitration unit to any first outlet.
  • the messages described herein may include requests or data or other similar information.
  • the data node when the first entry of the data node is connected to the computing unit, that is, the data node is the first data node, then the data node transmits The message includes a message for requesting data.
  • the second entrance and the second exit of the data node are still connected to the first data node; when the first entrance of the data node is connected to the storage unit, that is, the data node is the second data node. node, then the message transmitted by the data node includes a message containing data, and the second inlet and the second outlet of the data node are still connected to the second data node.
  • the plurality mentioned herein includes 2 or more than 2.
  • Each data node includes: a plurality of routing units, a plurality of arbitration units, a data exchange unit, an interconnection unit, a plurality of first entrances, a plurality of first exits, at least one second entrance, and at least one second exit, in:
  • the routing unit includes an input port, a first output port and a second output port. Each input port is connected to a first entrance, the first output port is connected to the input of the arbitration unit, and the second output port is connected to the input of the data exchange unit. connection, the routing unit is configured to receive the message input by the first portal and forward it to the arbitration unit or data exchange unit; in this example, each first portal is connected to an independent routing unit, and each routing unit is connected to a corresponding An independent arbitration unit, and each routing unit is connected to the data exchange unit, and the routing unit can route the message to the arbitration unit or the data exchange unit according to the destination contained in the message;
  • the data exchange unit includes a plurality of first input ports, a plurality of first output ports, a plurality of second input ports and a plurality of second output ports, and each first input port is connected to the second output port of the routing unit. connection, each first output The port is connected to a second outlet, each second input port is connected to a second inlet, and each second output port is connected to an input of an arbitration unit, the data exchange unit being configured to forward messages to other data nodes, or Receive messages sent by other data nodes;
  • the arbitration unit includes a first input port connected to the first output port of the routing unit, a second input port connected to the second output port of the data exchange unit, and an output port connected to the interconnection unit, so The arbitration unit is configured to receive messages sent by the routing unit and/or the data exchange unit and send them to the interconnection unit through the output port connected to the interconnection unit;
  • the interconnection unit includes a plurality of input ports and a plurality of output ports, each input port is connected to an output port of an arbitration unit, each output port is connected to a first outlet, and the interconnection unit is configured to connect the The message output by the arbitration unit is sent to any first outlet.
  • the proof-of-work chip provided by the embodiment of the present disclosure, message intercommunication is realized between the computing unit and the storage unit through the data node. Since the data node adopts a grid topology, it is connected to other data nodes through the data exchange unit, and through the interconnection The unit realizes data node output, has a simple structure, high efficiency, and high on-chip bandwidth. Therefore, the workload proof chip implemented through this data node also has high efficiency and on-chip bandwidth.
  • the number of the first inlet and the first outlet may be the same or different, that is, the number of computing units connected to each first data node and the number of storage units connected to each second data node may be the same or different.
  • the first inlet and the number of the first outlet may be the same or different.
  • the number of outlets can range from 2 to 16348, that is, the number of a group of computing units can range from 2 to 16348, and the number of a group of storage units can range from 2 to 16348.
  • the number of each type of data node may be 1 or 2 or more, for example, it may be 4, 6, 9 or even more, and this application does not limit this.
  • a mesh interconnection topology can be used between multiple first data nodes, and a mesh interconnection topology can also be used between multiple second data nodes. There is no connection between the first data nodes and the second data nodes. , the data nodes are arranged in a regular grid, and each data node is only connected to its adjacent data nodes in the same row or column.
  • the following describes the internal units of the data node.
  • the internal units of the first data node and the second data node have the same composition.
  • the arbitration unit may be an arbitration structure with backpressure and caching.
  • the arbitration unit may cache a certain number of messages, and when the message can be received by the corresponding interconnection unit, send it to the corresponding interconnection unit.
  • the connection unit When the cache is full, the connection unit generates back pressure on the previous level unit (routing unit or data exchange unit) to prevent messages sent by the previous level unit from being received and lost.
  • the routing unit may also be a routing structure with backpressure and caching.
  • the arbitration unit may also be configured to set different weights for multiple input ports of the arbitration unit.
  • the weight value of each input port represents the number of messages that the input port can continuously process.
  • the arbitration unit can design the weight ratio of each port according to the data volume of each input port, which determines the proportion of messages passed by each port. When the ratio is set to match the proportion of requests or data that actually need to pass, it will increase overall system efficiency.
  • the weights of any two input ports may be the same, indicating that the two input ports can continuously process the same number of messages.
  • the arbitration unit may also be configured to set different priorities for multiple input ports of the arbitration unit.
  • the arbitration unit processes a message
  • the arbitration unit selects the one with the highest priority and is pending.
  • An input port that processes messages.
  • the priority of the input port is readjusted.
  • the adjustment method may be, for example: after the message processing of the input port with the highest priority and a message to be processed is completed, The priority of this input port is adjusted to the lowest. In other embodiments, it is not excluded that the priorities can be set to the same.
  • the following description takes the arbitration unit adopting weighted polling arbitration with a weight ratio of 1:3 as an example.
  • the S1 port can be used for connection data exchange.
  • port of the unit, the S2 port can be Structure connecting routing units.
  • the number of weights is related to the number of sending requests.
  • a weight of 3 means that a maximum of 3x (3 ⁇ x) messages can be sent continuously.
  • a weight of 1 means that a maximum of x messages can be sent continuously.
  • x is greater than or equal to 1.
  • the principle of priority adjustment is to adjust the priority of the port to the lowest after the port has sent messages or has no messages.
  • An example of the weighted polling arbitration process of the arbitration unit is as follows: Assume that port S1 receives a request, and the current port S1 has the highest priority. Since the weight of port S1 is 3, the port S1 can send up to 3x requests continuously. When port S1 After sending 3x messages continuously or there is no message on port S1, the arbitration unit adjusts the priority order to: S2>S1; at this time, when there is a request on port S2, because S2 is the port with the highest priority currently, and because the port The weight of S2 is 1, so port S2 can send up to x messages continuously. When port S2 sends x messages continuously or there is no message on port S2, the arbitration unit adjusts the priority order to: S1>S2.
  • Using the above weighted polling arbitration method can improve the processing efficiency of the arbitration unit, and the effect is obvious when the data pressure is high.
  • a fixed weight round-robin arbitration scheme for example, the weight ratio of each port is fixed at 1:1
  • a fixed priority arbitration scheme may be adopted.
  • the interconnection unit includes multiple input ports and multiple output ports. Data input from any input port can be output through any output port. That is to say, the interconnection unit can send messages to any destination according to the destination of the message. The first exit.
  • the number of input ports and output ports can be the same or different. The number can be set according to the needs of the chip, for example, it can be set to 128 or 4096, etc.
  • the interconnection unit can be implemented by, for example, a full crossbar switch (or fully associated crossbar switch).
  • the full crossbar switch is a multi-entry and multi-outlet structure, and data can enter from any entrance and reach any exit.
  • the chip may further include a compression unit and a decompression unit, each routing unit is connected to the data exchange unit through the compression unit, and the data exchange unit is connected to each arbitration unit through the decompression unit.
  • Figure 2 is a schematic diagram of another data node structure provided by an embodiment of the present disclosure.
  • the second output port of each routing unit is connected to the input port of the compression unit, and the output port of the compression unit is connected to the third port of the data exchange unit.
  • the second output port of the data exchange unit is connected to the input port of the decompression unit, and the output port of the decompression unit is connected to the second input port of each arbitration unit.
  • the compression unit is used to compress the number of buses, compressing m buses input by m routing units into n buses and outputting them to the data exchange unit.
  • the compression unit includes m input ports and n output ports, and m and n are both A positive integer greater than zero, and m>n.
  • the compression unit can compress the number of buses connected to multiple routing units to reduce the number of buses, that is, the number of buses is compressed from m to n, thereby reducing the complexity of the data exchange unit.
  • the number of buses can be compressed because when the messages entering the first entrance pass through the routing unit, some of the messages are routed to the arbitration unit, so the bus pressure routed to the compression unit must be reduced, so a smaller number of messages can be used through the compression unit.
  • the compressed bus (still a multi-group bus) is connected to the data exchange unit. In other embodiments, the compression ratio may be set to other ratios.
  • the function of the decompression unit is opposite to that of the compression unit. It restores the number of buses to the same number as the arbitration unit, including n input ports and m output ports. It restores the n buses input by the data exchange unit to m buses and inputs m respectively.
  • the arbitration unit decompresses the number of buses from n to m to facilitate bus arbitration operations.
  • Figure 3 is an example of a compression unit and decompression unit.
  • compression and decompression of 4 groups of buses are taken as an example.
  • the compression unit can compress 4 groups of buses into 3 groups, and the decompression unit can compress 3 groups of buses. Restore to 4 groups, so that fewer buses can be used to transmit data without affecting chip functionality.
  • S00, S01, S02, and S03 are data sources, which are connected to the buses S10, S11, S12, and S13 respectively; the buses S10, S11, S12, and S13 are connected to the compression unit S2. are connected to the arbitration units S220, S221, and S222 in the compression unit S2 respectively.
  • the arbitration units S220, S221, and S222 may be weighted round robin arbiters. In some examples, the arbitration units S220, S221, and S222 may also use ordinary arbiters or round robin arbiters.
  • the routing unit S20 is connected to the cache units S210, S211, and S212 respectively; the cache units S210, S211, and S212 are connected to the arbitration units S220, S221, and S222 respectively; the arbitration units S220, S221, and S222 are connected to the compressed buses S30, S31, and S32.
  • Buses S30, S31, and S32 are connected to the decompression unit S4, and are respectively connected to the routing units S400, S401, and S402 of S4; the routing units S400, S401, and S402 are respectively connected to the restored buses S50, S51, and S52; the routing unit S400, S401 and S402 are both connected to the arbitration unit S41; the arbitration unit S41 can be a polling arbiter or an ordinary arbiter; the arbitration unit S41 is connected to the restored bus S53; the buses S50, S51, S52, and S53 are respectively connected to the data endpoints. S60, S61, S62, S63 are connected.
  • the data compression workflow is as follows:
  • Data sources S00, S01, S02, and S03 send data to buses S10, S11, S12, and S13 respectively; among them: the data of bus S13 is divided into 3 parts through the routing unit S20 and cached in the cache units S210, S211, and S212 respectively; The data of the cache unit S210 and the data of the bus S10 pass through the arbitration unit S220 to generate the data of the bus S30; the data of the cache unit S211 and the data of the bus S11 pass through the arbitration unit S221 to generate the data of the bus S31; the data of the cache unit S212 and the data of the bus S12 The data of bus S32 is generated through arbitration unit S222; now the data compression is completed;
  • the data decompression workflow is as follows:
  • Buses S30, S31, S32 transmit data to decompression unit S4; routing unit S400 receives the data of bus S30, separates the data of bus S10 and sends it to bus S50, completes the restoration of the data of bus S10, and separates the data of bus S13 Sent to arbitration unit S41; routing unit S401 receives the data of bus S31, separates the data of bus S11 and sends it to bus S51, completes the restoration of bus S11 data, and sends the separated data of bus S13 to arbitration unit S41; routing unit S402 receives the data of bus S32, separates the data of bus S12 and sends it to bus S52, completes the restoration of bus S12 data, and sends the separated data of bus S13 to arbitration unit S41; arbitration unit S41 receives routing units S400, S401 , the data of S402 is sent to the bus S53 to complete the data restoration of the bus S13; the buses S50, S51, S52, and S53 send the data to the data end points
  • the data exchange unit in each data node may include k data exchange sub-units, where k is a positive integer greater than or equal to 2. The value of k depends on the number of routing units or the compression ratio of the compression unit. in:
  • Each data exchange sub-unit includes a group of data exchange units used to connect to the routing unit and arbitration unit.
  • the input and output ports - the first input port, the second output port, and one or more groups of input and output ports for connecting with the second inlet and the second outlet - the second input port, the first output port, wherein , the first input port is connected to the routing unit, a first output port is connected to a second outlet, a second input port is connected to a second inlet, and the second output port is connected to the arbitration unit.
  • each data exchange subunit includes a set of input and output ports for connecting with the compression unit and decompression unit - a first input port, a second output port, and one or more sets for connecting with the second inlet and the second output port.
  • the input and output ports connected to the second outlet - the second input port and the first output port, wherein the first input port is connected to the compression unit, a first output port is connected to a second outlet, and a second input port is connected to A second inlet is connected, and a second output port is connected to the decompression unit.
  • Each data exchange subunit includes multiple groups of routing subunits and arbitration subunits. Routing subunits and arbitration subunits The number of routing subunits and arbitration subunits are interconnected in pairs. The number of routing subunits and arbitration subunits depends on the number of nodes adjacent to the data node where the data exchange unit is located. For example, it can be the number of adjacent data nodes + 1 , when the current data node has 2 adjacent nodes, the number of routing sub-units and arbitration sub-units is both 2+1.
  • the first input port is connected to a routing subunit
  • the first output port is connected to an arbitration subunit
  • a second input port is connected to a routing subunit
  • a second output The port is connected to an arbitration subunit.
  • the data exchange subunit in the figure is a pairwise interconnection structure including three groups of routing subunits and arbitration subunits.
  • One group of routing subunits and arbitration subunits are respectively connected with the compression unit (with the routing unit when there is no compression unit) and the decompression unit. (when there is no decompression unit, it is connected to the arbitration unit).
  • the other two sets of routing subunits and arbitration subunits are respectively connected to the data exchange units of two adjacent data nodes. Among them, the routing subunit is connected to the data exchange subunit of the adjacent node.
  • the arbitration subunit is connected to the routing subunit of the data exchange subunit of the adjacent node.
  • K data exchange sub-units form a data exchange unit.
  • the arbitration subunit within the data exchange subunit can adopt a weighted round robin arbitration method.
  • each input port can be configured with a weight.
  • the weight ratio represents the ratio of the amount of messages passing through each input port. Taking the workload proof chip as an example, the chip contains 4 first data nodes.
  • the implementation process of the weighted polling arbitration method is as described above, and will not be repeated here. The use of the weighted polling arbitration method can improve the efficiency of the data exchange unit.
  • the arbitration subunit within the data exchange subunit may also use round-robin arbitration or fixed priority arbitration.
  • Figure 5 is a schematic structural diagram of a proof-of-work chip provided by an embodiment of the present disclosure.
  • Figure 5A is a schematic diagram of the connection relationship of four first data nodes.
  • Figure 5B is a schematic diagram of the connection relationship of four second data nodes.
  • each The data node includes a compression unit and a decompression unit.
  • the structure of each data node in the figure is the same.
  • the first data node and the second data node both adopt a 2 ⁇ 2 mesh topology.
  • Each data exchange unit contains a data exchange subunit.
  • the structure is shown in Figure 4. Assume that computing unit A11 starts to perform workload proof calculation and needs to request data in storage unit B41, which is recorded as request 1.
  • this request 1 is first sent to the first data node 1 and is connected to the corresponding computing unit A11.
  • the request 1 is cached by the corresponding routing unit in the first data node 1.
  • the request 1 is sent to the data exchange unit of the first data node 1 through the compression unit.
  • the request 1 is sent to the data node 4 through the data exchange unit of the first data node 4, and then sent to the storage unit B41 through the decompression unit, arbitration unit and interconnection unit of the first data node 4.
  • Request 1 accesses storage unit B41 and obtains the requested data, which is recorded as data 1.
  • data 1 is sent to computing unit A11 through second data node 4 and second data node 1 in sequence.
  • the process is the same as request 1. Similar, will not be repeated here. So far the computing unit A11 has completed the request for the data located on the storage unit B41. Any computing unit can refer to the above process to obtain the data required for proof of work from any storage unit and perform proof of work calculations.
  • Figure 6 is a schematic diagram of the interconnection of 6 data exchange units in the 6 data nodes when the workload proof chip structure includes 6 first data nodes (or second data nodes).
  • the data exchange units adopt a 2 ⁇ 3 mesh topology distribution. That is, 6 data nodes are distributed in a 2 ⁇ 3 mesh topology.
  • the number of routing subunits and arbitration subunits in the data exchange subunit in each data exchange unit is 3+1.
  • the internal structure of each data subunit is shown in Figure 7, including 4 groups of routing subunits and arbitration subunits interconnected in pairs, that is, 4 routing subunits and 4 arbitration subunits are interconnected in pairs.
  • Figure 8 is a schematic diagram of the interconnection of 9 data exchange units among the 9 data nodes when the workload proof chip includes 9 first data nodes (or second data nodes).
  • the data exchange units adopt a 3 ⁇ 3 mesh topology distribution, that is, 9 Data nodes are distributed in a 3 ⁇ 3 mesh topology.
  • the number of routing subunits and arbitration subunits in the data exchange subunit in each data exchange unit is 4+1.
  • the internal structure of each data subunit is shown in Figure 9, including 5 groups of routing subunits and arbitration subunits interconnected in pairs.
  • a workload proof chip with a total of [2 to 9]*2 data nodes, where 2 to 9 represents the number of first data nodes or second data nodes.
  • 2 to 9 represents the number of first data nodes or second data nodes.
  • the mesh distribution is used as an example for explanation, it does not rule out that other topologies can be used. For example, when the number of data nodes is small, such as 3 or 5, a star structure can be used.
  • each data node includes 30 sets of entrances and exits, that is, each data node is connected to 30 computing nodes.
  • the interconnection unit of each data node can be a 30 ⁇ 30 full crossbar switch, then the workload proof chip only requires 4*2 30 ⁇ 30 full crossbar switches and 2 groups of 2 ⁇ 2
  • the mesh interconnection can realize the exchange of messages between any computing unit and any storage unit. Since multiple mesh nodes share the number of ports, it avoids the problem of simply using a full crossbar switch with too many ports, resulting in a scale that is too large to be implemented. It can be implemented using fewer data nodes, with a simple structure and high efficiency. At the same time, higher on-chip bandwidth can be obtained by using the workload proof chip implementation provided by the embodiments of the present disclosure.
  • the workload proof chip can achieve an on-chip bandwidth of approximately 6144GB/s under a port width of 1024bit and a clock frequency of 500M. , far exceeding the 1004GB/s on-chip bandwidth of the current most high-end GPUs.
  • an embodiment of the present disclosure also provides an electronic device including the above-mentioned proof-of-work chip.
  • the electronic device may be, for example, a terminal device that can provide computing services, such as a tablet computer, a notebook computer, a handheld computer, Mobile Internet devices, wearable devices, etc.
  • a server that can provide computing services such as cloud services, cloud computing, cloud storage, network services, cloud communications, middleware services, domain name services, security services, etc.
  • connection should be understood in a broad sense.
  • it can be a fixed connection or a fixed connection.
  • Detachable connection, or integral connection it can be a mechanical connection or an electrical connection; it can be a direct connection or an indirect connection through an intermediate medium; it can be an internal connection between two components.
  • connection should be understood in a broad sense.
  • it can be a fixed connection or a fixed connection.
  • Detachable connection, or integral connection it can be a mechanical connection or an electrical connection; it can be a direct connection or an indirect connection through an intermediate medium; it can be an internal connection between two components.
  • computer storage media includes volatile and nonvolatile media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. removable, removable and non-removable media.
  • Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disk (DVD) or other optical disk storage, magnetic cassettes, tapes, disk storage or other magnetic storage devices, or may Any other medium used to store the desired information and that can be accessed by a computer.
  • communication media typically embodies computer readable instructions, data structures, program modules or other information in a modulated data signal such as a carrier wave or other transport mechanism. other data and may include any information delivery medium.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Software Systems (AREA)
  • Signal Processing (AREA)
  • Computing Systems (AREA)
  • Microelectronics & Electronic Packaging (AREA)
  • Mathematical Physics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

A proof-of-work chip and an electronic device. The proof-of-work chip comprises at least one group of calculation units, at least one group of storage units, at least two first data nodes and at least two second data nodes, wherein each first data node and each second data node comprise a plurality of first ingresses and a plurality of first egresses, a group of calculation units are correspondingly connected to a group of first ingresses of a first data node on a one-to-one basis, a group of first egresses of each first data node are correspondingly connected to a group of storage units on a one-to-one basis, a group of storage units are correspondingly connected to a group of first ingresses of a second data node on a one-to-one basis, and a group of first egresses of each second data node are correspondingly connected to a group of calculation units on a one-to-one basis; the calculation units are configured to send, during proof-of-work calculation, messages for requesting data to the storage units and by means of the first data nodes; and the storage units are configured to store data sets used in the proof-of-work calculation, and send, in response to the messages from the calculation units, messages including data to the calculation units and by means of the second data nodes.

Description

工作量证明芯片及电子设备Proof-of-work chips and electronic equipment
本申请要求于2022年7月18日提交中国专利局、申请号为202210838519.1、发明名称为“工作量证明芯片”的中国专利申请的优先权,其内容应理解为通过引用的方式并入本申请中。This application claims priority to the Chinese patent application filed with the China Patent Office on July 18, 2022, with application number 202210838519.1 and the invention name "Workload Proof Chip". Its content should be understood to be incorporated into this application by reference. middle.
技术领域Technical field
本公开实施例涉及但不限于计算机应用技术领域,尤指一种工作量证明芯片及电子设备。Embodiments of the present disclosure relate to but are not limited to the field of computer application technology, and particularly refer to a proof-of-work chip and electronic equipment.
背景技术Background technique
区块链技术中,区块的产生需要靠工作量证明算法(Proof of Work,POW)来完成,工作量证明是一种哈希函数,可以使用中央处理器(Central Processing Unit,CPU)、图形处理器(Graphic Processing Unit,GPU)或现场可编程逻辑门阵列(Field-Programmable Gate Array,FPGA)等对其求解,求解的过程中需要对一个大数据集进行随机地址访问,整个数据集一般存放于内存或者显存中。在区块链工作量证明应用中,算力与数据带宽成正比,所以需要用到很高的片内带宽,但传统CPU、GPU或FPGA结构都无法很好的解决这个问题。In blockchain technology, the generation of blocks needs to be completed by the Proof of Work (POW) algorithm. Proof of Work is a hash function that can use Central Processing Unit (CPU), graphics Processor (Graphic Processing Unit, GPU) or Field-Programmable Gate Array (FPGA) to solve it. The process of solving requires random address access to a large data set, and the entire data set is generally stored in memory or video memory. In blockchain proof-of-work applications, computing power is directly proportional to data bandwidth, so high on-chip bandwidth is required. However, traditional CPU, GPU or FPGA structures cannot solve this problem well.
发明内容Contents of the invention
以下是对本文详细描述的主题的概述。本概述并非是为了限制权利要求的保护范围。The following is an overview of the topics described in detail in this article. This summary is not intended to limit the scope of the claims.
本公开实施例提供了一种工作量证明芯片,包括至少一组计算单元、至少一组存储单元、至少两个第一数据节点以及至少两个第二数据节点,每个所述第一数据节点和每个所述第二数据节点包括多组第一入口、多组第一出口,所述一组计算单元与一个第一数据节点的一组第一入口一一对应连接,所述第一数据节点的一组第一出口与一组存储单元一一对应连接,所述一组存储单元与一个第二数据节点的一组第一入口一一对应连接,所述第二数据节点的一组第一出口与所述一组计算单元一一对应连接;其中:Embodiments of the present disclosure provide a proof-of-work chip, including at least one set of computing units, at least one set of storage units, at least two first data nodes, and at least two second data nodes, each of the first data nodes Each of the second data nodes includes multiple sets of first inlets and multiple sets of first outlets. The set of computing units is connected to a set of first inlets of a first data node in a one-to-one correspondence. The first data A group of first outlets of a node is connected to a group of storage units in a one-to-one correspondence. The group of storage units is connected to a group of first entrances of a second data node in a one-to-one correspondence. A group of first entrances of the second data node are connected in a one-to-one correspondence. An outlet is connected to the group of computing units in a one-to-one correspondence; wherein:
所述计算单元设置为在进行工作量证明计算时通过所述第一数据节点向所述存储单元发送用于请求数据的消息;The computing unit is configured to send a message for requesting data to the storage unit through the first data node when performing proof-of-work calculations;
所述存储单元设置为存储工作量证明计算中使用的数据集,响应于计算单元的消息,通过所述第二数据节点向计算单元发送包含数据的消息;The storage unit is configured to store a data set used in the proof-of-work calculation, and in response to a message from the computing unit, sends a message containing the data to the computing unit through the second data node;
所述第一数据节点设置为将所述计算单元发送的用于请求数据的消息发送至所述存储单元;The first data node is configured to send a message requesting data sent by the computing unit to the storage unit;
所述第二数据节点设置为将所述存储单元发送的包含数据的消息发送至所述计算单元。The second data node is configured to send the message containing data sent by the storage unit to the computing unit.
本公开实施例还提供了一种包括上述工作量证明芯片的电子设备。An embodiment of the present disclosure also provides an electronic device including the above-mentioned proof-of-work chip.
本公开的其它特征和优点将在随后的说明书中阐述,并且,部分地从说明书中变得显而易见,或者通过实施本公开而了解。本公开的其他优点可通过在说明书、权利要求书以及附图中所描述的方案来实现和获得。 Additional features and advantages of the disclosure will be set forth in the description which follows, and, in part, will be apparent from the description, or may be learned by practice of the disclosure. Other advantages of the disclosure may be realized and obtained by the arrangements described in the specification, claims, and drawings.
在阅读并理解了附图和详细描述后,可以明白其他方面。Other aspects will be apparent after reading and understanding the drawings and detailed description.
附图概述Figure overview
附图用来提供对本公开技术方案的理解,并且构成说明书的一部分,与本公开的实施例一起用于解释本公开的技术方案,并不构成对本公开技术方案的限制。附图中各部件的形状和大小不反映真实比例,目的只是示意说明本公开内容。The drawings are used to provide an understanding of the technical solution of the present disclosure and constitute a part of the specification. They are used to explain the technical solution of the present disclosure together with the embodiments of the present disclosure and do not constitute a limitation of the technical solution of the present disclosure. The shapes and sizes of components in the drawings do not reflect true proportions and are intended only to illustrate the present disclosure.
图1为本公开实施例提供的一种工作量证明芯片结构示意图;Figure 1 is a schematic structural diagram of a proof-of-work chip provided by an embodiment of the present disclosure;
图2为本公开实施例提供的另一种数据节点结构示意图;Figure 2 is a schematic structural diagram of another data node provided by an embodiment of the present disclosure;
图3为本公开实施例一种压缩单元和解压单元的示意图;Figure 3 is a schematic diagram of a compression unit and decompression unit according to an embodiment of the present disclosure;
图4为本公开实施例提供的一种数据交换子单元的结构示意图Figure 4 is a schematic structural diagram of a data exchange subunit provided by an embodiment of the present disclosure.
图5A为本公开实施例提供的包含4个第一数据节点的示意图;Figure 5A is a schematic diagram including four first data nodes provided by an embodiment of the present disclosure;
图5B为本公开实施例提供的包含4个第二数据节点的示意图Figure 5B is a schematic diagram including four second data nodes provided by an embodiment of the present disclosure.
图6为本公开实施例提供的包含6个第一数据节点(或第二数据节点)时数据交换单元连接示意图;Figure 6 is a schematic connection diagram of a data switching unit when including six first data nodes (or second data nodes) provided by an embodiment of the present disclosure;
图7为图6所示数据交换单元中数据子单元内部结构示意图;Figure 7 is a schematic diagram of the internal structure of the data subunit in the data exchange unit shown in Figure 6;
图8为本公开实施例提供的包含9个第一数据节点(或第二数据节点)时数据交换单元连接示意图;Figure 8 is a schematic connection diagram of a data switching unit when nine first data nodes (or second data nodes) are included according to an embodiment of the present disclosure;
图9为图8所示数据交换单元中数据子单元内部结构示意图节点入口;Figure 9 is a schematic node entry of the internal structure of the data subunit in the data exchange unit shown in Figure 8;
图10为本公开实施例提供的包含工作量证明芯片的电子设备的示意图。FIG. 10 is a schematic diagram of an electronic device including a proof-of-work chip provided by an embodiment of the present disclosure.
详述Elaborate
本公开描述了多个实施例,但是该描述是示例性的,而不是限制性的,并且对于本领域的普通技术人员来说显而易见的是,在本公开所描述的实施例包含的范围内可以有更多的实施例和实现方案。尽管在附图中示出了许多可能的特征组合,并在详述中进行了讨论,但是所公开的特征的许多其它组合方式也是可能的。除非特意加以限制的情况以外,任何实施例的任何特征或元件可以与任何其它实施例中的任何其他特征或元件结合使用,或可以替代任何其它实施例中的任何其他特征或元件。The present disclosure describes multiple embodiments, but the description is illustrative rather than restrictive, and it is obvious to a person of ordinary skill in the art that within the scope of the embodiments described in the present disclosure, There are many more examples and implementations. Although many possible combinations of features are shown in the drawings and discussed in the detailed description, many other combinations of the disclosed features are possible. Unless expressly limited, any feature or element of any embodiment may be used in combination with, or may be substituted for, any other feature or element of any other embodiment.
本公开包括并设想了与本领域普通技术人员已知的特征和元件的组合。本公开已经公开的实施例、特征和元件也可以与任何常规特征或元件组合,以形成由权利要求限定的独特的发明方案。任何实施例的任何特征或元件也可以与来自其它发明方案的特征或元件组合,以形成另一个由权利要求限定的独特的发明方案。因此,应当理解,在本公开中示出和/或讨论的任何特征可以单独地或以任何适当的组合来实现。因此,除了根据所附权利要求及其等同替换所做的限制以外,实施例不受其它限制。此外,可以在所附权利要求的保护范围内进行各种修改和改变。The present disclosure includes and contemplates combinations with features and elements known to those of ordinary skill in the art. The disclosed embodiments, features and elements of this disclosure may also be combined with any conventional features or elements to form unique inventive solutions as defined by the claims. Any feature or element of any embodiment may also be combined with features or elements from other inventive solutions to form another unique inventive solution as defined by the claims. Accordingly, it should be understood that any feature shown and/or discussed in this disclosure may be implemented individually or in any suitable combination. Accordingly, the embodiments are not to be limited except by those appended claims and their equivalents. Furthermore, various modifications and changes may be made within the scope of the appended claims.
此外,在描述具有代表性的实施例时,说明书可能已经将方法和/或过程呈现为特定的步骤序列。然而,在该方法或过程不依赖于本文所述步骤的特定顺序的程度上,该方法或过程不应限于所述的特定顺序的步骤。如本领域普通技术人员将理解的,其它的步骤顺序也是可能的。因此,说明书中阐述的步骤的特定顺序不应被解释为对权利要求的限制。此外,针对该方法和/或过程的权利要求不应限于按照所写顺序执行它们的步骤,本领域 技术人员可以容易地理解,这些顺序可以变化,并且仍然保持在本公开实施例的精神和范围内。Additionally, in describing representative embodiments, the specification may have presented methods and/or processes as a specific sequence of steps. However, to the extent that the method or process does not rely on the specific order of steps described herein, the method or process should not be limited to the specific order of steps described. As one of ordinary skill in the art will appreciate, other sequences of steps are possible. Therefore, the specific order of steps set forth in the specification should not be construed as limiting the claims. Furthermore, claims directed to the method and/or process should not be limited to steps performing them in the order written. Skilled artisans will readily appreciate that these orders may be varied and still remain within the spirit and scope of the disclosed embodiments.
图1为本公开实施例提供的工作量证明芯片结构示意图,所述工作量证明芯片包括至少一组计算单元、至少一组存储单元、至少两个第一数据节点以及至少两个第二数据节点,每个数据节点(包括第一数据节点和第二数据节点)均包括多组第一节点入口(该第一节点入口为所述数据节点的第一入口,文中简称第一入口)、多组第一节点出口(该第一节点出口为所述数据节点的第一出口,文中简称第一出口),所述一组计算单元与一个第一数据节点的一组第一入口一一对应连接,所述第一数据节点的一组第一出口与一组存储单元一一对应连接,所述一组存储单元与一个第二数据节点的一组第一入口一一对应连接,所述第二数据节点的一组第一出口与一组计算单元一一对应连接,即一组计算单元通过至少2个第一数据节点与一组存储单元连接,所述一组存储单元通过至少2个第二数据节点与所述一组计算单元连接;其中:Figure 1 is a schematic structural diagram of a proof-of-work chip provided by an embodiment of the present disclosure. The proof-of-work chip includes at least one set of computing units, at least one set of storage units, at least two first data nodes and at least two second data nodes. , each data node (including the first data node and the second data node) includes multiple groups of first node entrances (the first node entrance is the first entrance of the data node, referred to as the first entrance in the text), multiple groups The first node exit (the first node exit is the first exit of the data node, referred to as the first exit in the text), the set of computing units is connected to a set of first entrances of a first data node in a one-to-one correspondence, A group of first outlets of the first data node is connected to a group of storage units in a one-to-one correspondence, and the group of storage units is connected to a group of first inlets of a second data node in a one-to-one correspondence. The second data A group of first outlets of a node is connected to a group of computing units in a one-to-one correspondence, that is, a group of computing units is connected to a group of storage units through at least 2 first data nodes, and the group of storage units is connected to a group of storage units through at least 2 second data nodes. The node is connected to the set of computing units; where:
所述计算单元设置为在进行工作量证明计算时通过所述第一数据节点向所述存储单元发送用于请求数据的消息;The computing unit is configured to send a message for requesting data to the storage unit through the first data node when performing proof-of-work calculations;
所述存储单元设置为存储工作量证明计算中使用的数据集,响应于所述计算单元的消息,通过所述第二数据节点向所述计算单元发送包含数据的消息;The storage unit is configured to store a data set used in the proof-of-work calculation, and in response to a message from the computing unit, sends a message containing the data to the computing unit through the second data node;
所述第一数据节点设置为将所述计算单元发送的用于请求数据的消息发送至所述存储单元;The first data node is configured to send a message requesting data sent by the computing unit to the storage unit;
所述第二数据节点设置为将所述存储单元发送的包含数据的消息发送至所述计算单元。The second data node is configured to send the message containing data sent by the storage unit to the computing unit.
上述每个数据节点包括多组第一入口和多组第一出口,与计算单元和存储单元连接时,可以是多组第一节点中的一组与相应的单元(计算的单元或存储单元)连接,或者也可以将多组第一入口和多组第一出口理解为多个第一入口和多个第一出口,该多个第一入口或出口与相应的一组单元(计算单元或存储单元)连接,如图1所示。Each of the above data nodes includes multiple sets of first inlets and multiple sets of first outlets. When connected to the computing unit and the storage unit, it can be one of the multiple sets of first nodes and the corresponding unit (computing unit or storage unit) connection, or multiple sets of first inlets and multiple sets of first outlets can also be understood as multiple first inlets and multiple first outlets. The multiple first inlets or outlets are connected to a corresponding set of units (computing units or storage units). unit) connection, as shown in Figure 1.
可以理解,在该芯片结构中,可以形成一种包括请求传输层和数据传输层的立体的网络结构。在本实施例中,通过第一数据节点网络实现计算单元到存储单元方向的消息传输,即在请求传输层,计算单元通过第一数据节点向存储单元发送请求;通过第二数据节点网络实现存储单元到计算单元方向的消息传输,即在数据传输层,存储单元通过第二数据节点向计算单元返回所请求的数据。通过该巧妙的结构设计,使得消息在片内传输效率高,实现片内高带宽。It can be understood that in this chip structure, a three-dimensional network structure including a request transmission layer and a data transmission layer can be formed. In this embodiment, message transmission from the computing unit to the storage unit is implemented through the first data node network, that is, at the request transmission layer, the computing unit sends a request to the storage unit through the first data node; storage is implemented through the second data node network Message transmission in the direction of the unit to the computing unit, that is, at the data transmission layer, the storage unit returns the requested data to the computing unit through the second data node. Through this ingenious structural design, messages can be transmitted efficiently within the chip and achieve high bandwidth within the chip.
在一示例性实施例中,所述工作量证明芯片可以包括至少一组计算单元、至少一组存储单元、一个第一数据节点以及一个第二数据节点,该第一数据节点和第二数据节点均包括一组(多个)第一入口、一组(多个)第一出口,所述一组计算单元与第一数据节点的一组第一入口一一对应连接,所述第一数据节点的一组第一出口与一组存储单元一一对应连接,所述一组存储单元与第二数据节点的一组第一入口一一对应连接,所述第二数据节点的一组第一出口与一组计算单元一一对应连接,即一组计算单元通过1个第一数据节点与一组存储单元连接,该组存储单元通过1个第二数据节点与该组计算单元连接。In an exemplary embodiment, the workload proof chip may include at least one group of computing units, at least one group of storage units, a first data node and a second data node. The first data node and the second data node Each includes a group (multiple) of first entrances and a group (multiple) of first exits. The group of computing units are connected to a group of first entrances of the first data node in a one-to-one correspondence. The first data node A group of first outlets is connected to a group of storage units in a one-to-one correspondence. The group of storage units is connected to a group of first inlets of the second data node in a one-to-one correspondence. A group of first outlets of the second data node is connected in a one-to-one correspondence. There is a one-to-one connection with a group of computing units, that is, a group of computing units is connected to a group of storage units through a first data node, and the group of storage units is connected to the group of computing units through a second data node.
图1以包括两个第一数据节点和两个第二数据节点为例进行示意,该两个第一数据节点互相相连,该两个第二数据节点互相相连,此时一组计算单元通过两个互连的所述第一数据节点与两组存储单元连接,一组存储单元通过两个互连的所述第二数据节点与两组计算单元连接。 Figure 1 takes two first data nodes and two second data nodes as an example. The two first data nodes are connected to each other, and the two second data nodes are connected to each other. At this time, a group of computing units passes through two One interconnected first data node is connected to two groups of storage units, and one group of storage units is connected to two groups of computing units through two interconnected second data nodes.
当芯片中包括多个第一数据节点和多个第二数据节点时,例如包括y个第一数据节点和y个第二数据节点,则一组计算单元通过y个互连的所述第一数据节点与y组存储单元连接,一组存储单元通过y个互连的所述第二数据节点与y组计算单元连接。y个第一数据节点之间互连形成网格结构,y个第二数据节点之间互连形成网格结构。y为大或等于2的正整数。When the chip includes multiple first data nodes and multiple second data nodes, for example, including y first data nodes and y second data nodes, then a group of computing units are connected through y first data nodes. The data nodes are connected to y groups of storage units, and a group of storage units are connected to y groups of computing units through y interconnected second data nodes. The y first data nodes are interconnected to form a grid structure, and the y second data nodes are interconnected to form a grid structure. y is a positive integer greater than or equal to 2.
在示例性实施例中,所述第一数据节点可采用网格互联,所述第二数据节点之间采用网格互连,每个数据节点(包括第一数据节点和第二数据节点)包括多个路由单元、多个仲裁单元、一个数据交换单元、一个互连单元、多个第一入口(即前述第一节点入口)、多个第一出口(即前述第一节点出口)、一个或多个第二节点入口(该第二节点入口为所述数据节点的第二入口,文中简称第二入口)以及一个或多个第二节点出口(该第二节点出口为所述数据节点的第二出口,文中简称第二出口),每个所述路由单元的输入端与一个所述第一入口连接,每个所述路由单元的第一输出端与一个所述仲裁单元的第一输入端一一对应连接,每个所述路由单元的第二输出端与所述数据交换单元的第一输入端连接,所述数据交换单元的第一输出端与所述的第二出口连接,所述数据交换单元的第二输入端与所述第二入口连接,所述数据交换单元的第二输出端与每个所述仲裁单元的第二输入端连接,每个所述仲裁单元的输出端与所述互连单元的输入端一一对应连接,所述互连单元的输出端与所述第一出口一一对应连接,第二入口和第二出口被设置为与其他数据节点连接,其中:In an exemplary embodiment, the first data nodes may be interconnected using a grid, and the second data nodes may be interconnected using a grid. Each data node (including the first data node and the second data node) includes Multiple routing units, multiple arbitration units, a data exchange unit, an interconnection unit, multiple first entrances (i.e., the aforementioned first node entrances), multiple first exits (i.e., the aforementioned first node exits), one or A plurality of second node entrances (the second node entrances are the second entrances of the data node, referred to as second entrances herein) and one or more second node exits (the second node exits are the third entrances of the data node) Two outlets (herein referred to as the second outlet), the input end of each routing unit is connected to one of the first inlets, and the first output end of each routing unit is connected to the first input end of one of the arbitration units. One-to-one correspondence connection, the second output end of each routing unit is connected to the first input end of the data exchange unit, the first output end of the data exchange unit is connected to the second outlet, the The second input end of the data exchange unit is connected to the second inlet, the second output end of the data exchange unit is connected to the second input end of each arbitration unit, and the output end of each arbitration unit is connected to The input end of the interconnection unit is connected in a one-to-one correspondence, the output end of the interconnection unit is connected in a one-to-one correspondence with the first outlet, and the second inlet and the second outlet are configured to be connected to other data nodes, where:
所述路由单元设置为接收所述第一入口的消息,将所述消息发送至所述仲裁单元和/或所述数据交换单元;The routing unit is configured to receive messages from the first portal and send the messages to the arbitration unit and/or the data exchange unit;
所述数据交换单元设置为接收所述第二入口的消息,将所述消息发送至所述仲裁单元,以及设置为接收所述路由单元发送的消息,将所述消息通过所述第二出口输出;The data exchange unit is configured to receive messages from the second entrance, send the messages to the arbitration unit, and is configured to receive messages sent from the routing unit, and output the messages through the second outlet. ;
所述仲裁单元设置为接收所述路由单元和/或所述数据交换单元发送的消息,将所述消息通过所述互连单元发送至所述第一出口。The arbitration unit is configured to receive messages sent by the routing unit and/or the data exchange unit, and send the messages to the first outlet through the interconnection unit.
所述互连单元可以实现将仲裁单元发送的消息发送至任一第一出口。The interconnection unit can realize sending the message sent by the arbitration unit to any first outlet.
本文中所述消息可以包括请求或数据或其他类似信息,对于任一数据节点,当该数据节点第一入口连接的是计算单元,即该数据节点为第一数据节点,则该数据节点传输的消息包括用于请求数据的消息,该数据节点的第二入口和第二出口连接的仍是第一数据节点;当该数据节点第一入口连接的是存储单元,即该数据节点为第二数据节点,则该数据节点传输的消息包括包含数据的消息,该数据节点的第二入口和第二出口连接的仍是第二数据节点。本文中所述多个包括2个或2个以上。The messages described herein may include requests or data or other similar information. For any data node, when the first entry of the data node is connected to the computing unit, that is, the data node is the first data node, then the data node transmits The message includes a message for requesting data. The second entrance and the second exit of the data node are still connected to the first data node; when the first entrance of the data node is connected to the storage unit, that is, the data node is the second data node. node, then the message transmitted by the data node includes a message containing data, and the second inlet and the second outlet of the data node are still connected to the second data node. The plurality mentioned herein includes 2 or more than 2.
下面对上述数据节点结构进行说明。每个数据节点包括:多个路由单元、多个仲裁单元、一个数据交换单元、一个互连单元、多个第一入口、多个第一出口、至少一个第二入口,至少一个第二出口,其中:The above data node structure is explained below. Each data node includes: a plurality of routing units, a plurality of arbitration units, a data exchange unit, an interconnection unit, a plurality of first entrances, a plurality of first exits, at least one second entrance, and at least one second exit, in:
所述路由单元包括输入端口、第一输出端口和第二输出端口,每个输入端口与一个第一入口连接,第一输出端口与仲裁单元的输入连接,第二输出端口与数据交换单元的输入连接,所述路由单元设置为接收第一入口输入的消息,并向仲裁单元或数据交换单元转发;在本示例中,每个第一入口连接一个独立的路由单元,每个路由单元对应连接一个独立的仲裁单元,并且每个路由单元均连接所述数据交换单元,路由单元可以根据消息中包含的目的地,将消息路由至仲裁单元或者数据交换单元;The routing unit includes an input port, a first output port and a second output port. Each input port is connected to a first entrance, the first output port is connected to the input of the arbitration unit, and the second output port is connected to the input of the data exchange unit. connection, the routing unit is configured to receive the message input by the first portal and forward it to the arbitration unit or data exchange unit; in this example, each first portal is connected to an independent routing unit, and each routing unit is connected to a corresponding An independent arbitration unit, and each routing unit is connected to the data exchange unit, and the routing unit can route the message to the arbitration unit or the data exchange unit according to the destination contained in the message;
所述数据交换单元包括多个第一输入端口,多个第一输出端口,多个第二输入端口和多个第二输出端口,每个第一输入端口与所述路由单元的第二输出端连接,每个第一输出 端口与一个第二出口连接,每个第二输入端口与一个第二入口连接,每个第二输出端口与一个仲裁单元的输入连接,所述数据交换单元设置为向其他数据节点转发消息,或者接收其他数据节点发送的消息;The data exchange unit includes a plurality of first input ports, a plurality of first output ports, a plurality of second input ports and a plurality of second output ports, and each first input port is connected to the second output port of the routing unit. connection, each first output The port is connected to a second outlet, each second input port is connected to a second inlet, and each second output port is connected to an input of an arbitration unit, the data exchange unit being configured to forward messages to other data nodes, or Receive messages sent by other data nodes;
所述仲裁单元包括与所述路由单元第一输出端口连接的第一输入端口、与所述数据交换单元第二输出端口连接的第二输入端口和与所述互连单元连接的输出端口,所述仲裁单元设置为接收所述路由单元和/或所述数据交换单元发送的消息,通过与互连单元连接的输出端口发送给互连单元;The arbitration unit includes a first input port connected to the first output port of the routing unit, a second input port connected to the second output port of the data exchange unit, and an output port connected to the interconnection unit, so The arbitration unit is configured to receive messages sent by the routing unit and/or the data exchange unit and send them to the interconnection unit through the output port connected to the interconnection unit;
所述互连单元包括多个输入端口和多个输出端口,每个输入端口与一个仲裁单元的输出端口连接,每个输出端口与一个第一出口连接,所述互连单元设置为将所述仲裁单元输出的消息发送至任意一个第一出口。The interconnection unit includes a plurality of input ports and a plurality of output ports, each input port is connected to an output port of an arbitration unit, each output port is connected to a first outlet, and the interconnection unit is configured to connect the The message output by the arbitration unit is sent to any first outlet.
采用本公开实施例提供的工作量证明芯片,计算单元和存储单元之间通过数据节点实现消息互通,由于该数据节点采用网格拓扑结构,通过数据交换单元与其他数据节点连接,并且通过互连单元实现数据节点输出,结构简单,效率高,且片内带宽高,因此通过该数据节点实现的工作量证明芯片也具有较高的效率和片内带宽。Using the proof-of-work chip provided by the embodiment of the present disclosure, message intercommunication is realized between the computing unit and the storage unit through the data node. Since the data node adopts a grid topology, it is connected to other data nodes through the data exchange unit, and through the interconnection The unit realizes data node output, has a simple structure, high efficiency, and high on-chip bandwidth. Therefore, the workload proof chip implemented through this data node also has high efficiency and on-chip bandwidth.
第一入口和第一出口的数量可以相同也可以不同,即每个第一数据节点连接的计算单元和每个第二数据节点连接的存储单元的数量可以相同也可以不同,第一入口和第一出口的数量范围可以是在2至16348个,也即一组计算单元的数量范围可以为2-16348个,一组存储单元的数量范围可以为2-16348个。每种数据节点的数量可以为1个或2个或2个以上,例如可以是4个、6个、9个甚至更多个,本申请对此不做限制。多个第一数据节点之间可采用网格(mesh)互联的拓扑结构,多个第二数据节点之间同样可采用mesh互联的拓扑结构,第一数据节点与第二数据节点之间无连接,数据节点排列成规则的网格,每个数据节点只与其同行或同列的相邻数据节点连接。The number of the first inlet and the first outlet may be the same or different, that is, the number of computing units connected to each first data node and the number of storage units connected to each second data node may be the same or different. The first inlet and the number of the first outlet may be the same or different. The number of outlets can range from 2 to 16348, that is, the number of a group of computing units can range from 2 to 16348, and the number of a group of storage units can range from 2 to 16348. The number of each type of data node may be 1 or 2 or more, for example, it may be 4, 6, 9 or even more, and this application does not limit this. A mesh interconnection topology can be used between multiple first data nodes, and a mesh interconnection topology can also be used between multiple second data nodes. There is no connection between the first data nodes and the second data nodes. , the data nodes are arranged in a regular grid, and each data node is only connected to its adjacent data nodes in the same row or column.
下面对数据节点内部单元进行说明,第一数据节点和第二数据节点内部单元组成相同。The following describes the internal units of the data node. The internal units of the first data node and the second data node have the same composition.
在示例性实施例中,仲裁单元可以为带反压和缓存的仲裁结构,仲裁单元可以缓存一定数量的消息,并在该消息能被对应的互连单元接收时,将其发送到对应的互连单元,当缓存满时,对前一级单元(路由单元或者数据交换单元)产生反压,防止前一级单元发出的消息无法被接收而丢失,当缓存不满时,反压解除。同样地,所述路由单元也可以为带反压和缓存的路由结构。In an exemplary embodiment, the arbitration unit may be an arbitration structure with backpressure and caching. The arbitration unit may cache a certain number of messages, and when the message can be received by the corresponding interconnection unit, send it to the corresponding interconnection unit. When the cache is full, the connection unit generates back pressure on the previous level unit (routing unit or data exchange unit) to prevent messages sent by the previous level unit from being received and lost. When the cache is full, the back pressure is released. Similarly, the routing unit may also be a routing structure with backpressure and caching.
在示例性实施例中,所述仲裁单元还可设置为为本仲裁单元的多个输入端口分别设置不同的权重,每个输入端口的权重值表示该输入端口能连续处理的消息的数量。仲裁单元可以根据每个输入端口的数据量设计每个端口的权重比,这决定了每个端口通过的消息的比例,当该比例设置与实际需要通过的请求或数据的比例相符,则会提升整个系统的效率。在其他实施例中,任意两个输入端口的权重可以相同,表示该两个输入端口能连续处理的消息数量相同。In an exemplary embodiment, the arbitration unit may also be configured to set different weights for multiple input ports of the arbitration unit. The weight value of each input port represents the number of messages that the input port can continuously process. The arbitration unit can design the weight ratio of each port according to the data volume of each input port, which determines the proportion of messages passed by each port. When the ratio is set to match the proportion of requests or data that actually need to pass, it will increase overall system efficiency. In other embodiments, the weights of any two input ports may be the same, indicating that the two input ports can continuously process the same number of messages.
在示例性实施例中,所述仲裁单元还可以设置为为本仲裁单元的多个输入端口分别设置不同的优先级,在本仲裁单元处理消息时,所述仲裁单元选择优先级最高的且有待处理消息的输入端口,所述输入端口的消息处理完成后,重新调整所述输入端口的优先级,调整方法例如可以是:在优先级最高且有待处理消息的输入端口的消息处理完成后,将该输入端口的优先级调至最低。在其他实施例中,不排除优先级可以设置为相同。In an exemplary embodiment, the arbitration unit may also be configured to set different priorities for multiple input ports of the arbitration unit. When the arbitration unit processes a message, the arbitration unit selects the one with the highest priority and is pending. An input port that processes messages. After the message processing of the input port is completed, the priority of the input port is readjusted. The adjustment method may be, for example: after the message processing of the input port with the highest priority and a message to be processed is completed, The priority of this input port is adjusted to the lowest. In other embodiments, it is not excluded that the priorities can be set to the same.
下面以仲裁单元采用权重比为1:3的加权轮询仲裁为例进行说明。以仲裁单元包括两个输入端口S1和S2为例,假设两个输入端口的默认优先级为S1>S2,并且假设S1的权重为3,S2的权重为1,其中S1端口可以为连接数据交换单元的端口,S2端口可以为 连接路由单元的结构。在本例中,权重数量与发送请求的数量相关,权重为3表示最多可连续发送3x(3×x)个消息,权重为1表示最多可连续发送x个消息,x为大于或等于1的整数,当权重为0则可视为端口关闭,不允许消息通过。本例中,优先级调整的原则是在端口发送完消息或者没有消息后,将该端口的优先级调至最低。The following description takes the arbitration unit adopting weighted polling arbitration with a weight ratio of 1:3 as an example. Take the arbitration unit including two input ports S1 and S2 as an example. Assume that the default priority of the two input ports is S1>S2, and assume that the weight of S1 is 3 and the weight of S2 is 1. The S1 port can be used for connection data exchange. port of the unit, the S2 port can be Structure connecting routing units. In this example, the number of weights is related to the number of sending requests. A weight of 3 means that a maximum of 3x (3×x) messages can be sent continuously. A weight of 1 means that a maximum of x messages can be sent continuously. x is greater than or equal to 1. An integer. When the weight is 0, the port can be considered closed and no messages are allowed to pass through. In this example, the principle of priority adjustment is to adjust the priority of the port to the lowest after the port has sent messages or has no messages.
仲裁单元加权轮询仲裁过程举例说明如下:假设端口S1接收到请求,且当前端口S1的优先级最高,由于端口S1的权重为3,因此该端口S1最多可以连续发送3x个请求,当端口S1连续发送完3x个消息或者端口S1没有消息后,仲裁单元将优先级顺序调整为:S2>S1;此时当端口S2有请求,由于S2是目前优先级最高的有请求的端口,且由于端口S2的权重为1,因此端口S2最多可以连续发送x个消息,当端口S2连续发送完x个消息或者端口S2没有消息后,仲裁单元将优先级顺序调整为:S1>S2。采用上述加权轮询仲裁方式可以提高仲裁单元的处理效率,在数据压力大时效果明显。在其他实施例中,可以采用固定权重轮询仲裁方案(例如每个端口权重比固定为1:1),或者固定优先级仲裁方案。An example of the weighted polling arbitration process of the arbitration unit is as follows: Assume that port S1 receives a request, and the current port S1 has the highest priority. Since the weight of port S1 is 3, the port S1 can send up to 3x requests continuously. When port S1 After sending 3x messages continuously or there is no message on port S1, the arbitration unit adjusts the priority order to: S2>S1; at this time, when there is a request on port S2, because S2 is the port with the highest priority currently, and because the port The weight of S2 is 1, so port S2 can send up to x messages continuously. When port S2 sends x messages continuously or there is no message on port S2, the arbitration unit adjusts the priority order to: S1>S2. Using the above weighted polling arbitration method can improve the processing efficiency of the arbitration unit, and the effect is obvious when the data pressure is high. In other embodiments, a fixed weight round-robin arbitration scheme (for example, the weight ratio of each port is fixed at 1:1) or a fixed priority arbitration scheme may be adopted.
所述互连单元包括多个输入端口和多个输出端口,任一输入端口输入的数据可以通过任一输出端口输出,也就是说互连单元可以根据消息的目的地,将消息发往任一第一出口。输入端口和输出端口的数量可以相同也可以不同,其数量可根据芯片需要进行设置,例如可以设置为128或4096个等等。所述互连单元例如可以采用全交叉开关(或称全相联交叉开关)实现,全交叉开关是一种多入口多出口的结构,数据可以从任一入口进入,到达任一出口。The interconnection unit includes multiple input ports and multiple output ports. Data input from any input port can be output through any output port. That is to say, the interconnection unit can send messages to any destination according to the destination of the message. The first exit. The number of input ports and output ports can be the same or different. The number can be set according to the needs of the chip, for example, it can be set to 128 or 4096, etc. The interconnection unit can be implemented by, for example, a full crossbar switch (or fully associated crossbar switch). The full crossbar switch is a multi-entry and multi-outlet structure, and data can enter from any entrance and reach any exit.
在一示例性实施例中,该芯片还可以包括压缩单元和解压单元,每个路由单元通过压缩单元与数据交换单元连接,该数据交换单元通过解压单元与每个仲裁单元连接。图2为本公开实施例提供的另一种数据节点结构示意图,在本示例中,每个路由单元的第二输出端口与压缩单元的输入端口连接,压缩单元的输出端口与数据交换单元的第一输入端口连接。数据交换单元的第二输出端口与解压单元的输入端口连接,解压单元的输出端口与每个仲裁单元的第二输入端口连接。In an exemplary embodiment, the chip may further include a compression unit and a decompression unit, each routing unit is connected to the data exchange unit through the compression unit, and the data exchange unit is connected to each arbitration unit through the decompression unit. Figure 2 is a schematic diagram of another data node structure provided by an embodiment of the present disclosure. In this example, the second output port of each routing unit is connected to the input port of the compression unit, and the output port of the compression unit is connected to the third port of the data exchange unit. One input port connection. The second output port of the data exchange unit is connected to the input port of the decompression unit, and the output port of the decompression unit is connected to the second input port of each arbitration unit.
压缩单元用于压缩总线数量,将m个路由单元输入的m路总线压缩为n路输出至所述数据交换单元,例如,压缩单元包括m个输入端口和n个输出端口,m和n均为大于零的正整数,且m>n。压缩单元可以将与多个路由单元连接的总线数量进行压缩,使总线数量减少,即总线数量由m个被压缩为n个,从而减小数据交换单元的复杂程度。总线数量可以压缩是由于第一入口进入的消息经过路由单元时,有一部分消息被路由到了仲裁单元,所以被路由到压缩单元的总线压力必然减小,因此就可以通过压缩单元使用数量更少的总线来承载这些消息,以工作量证明芯片包含4个第一数据节点为例,总线可以按4:3的压缩比压缩,即m:n=4:3,因为第一入口发来的消息经过路由单元后,概率上有1/4发到仲裁单元,3/4发到压缩单元。压缩后的总线(依然为多组总线)连接到数据交换单元。在其他实施例中,压缩比可以设置为其他比值。The compression unit is used to compress the number of buses, compressing m buses input by m routing units into n buses and outputting them to the data exchange unit. For example, the compression unit includes m input ports and n output ports, and m and n are both A positive integer greater than zero, and m>n. The compression unit can compress the number of buses connected to multiple routing units to reduce the number of buses, that is, the number of buses is compressed from m to n, thereby reducing the complexity of the data exchange unit. The number of buses can be compressed because when the messages entering the first entrance pass through the routing unit, some of the messages are routed to the arbitration unit, so the bus pressure routed to the compression unit must be reduced, so a smaller number of messages can be used through the compression unit. The bus carries these messages. Taking the proof-of-work chip as an example containing 4 first data nodes, the bus can be compressed according to a compression ratio of 4:3, that is, m:n=4:3, because the message sent by the first entrance passes through After the routing unit, with probability 1/4 is sent to the arbitration unit and 3/4 is sent to the compression unit. The compressed bus (still a multi-group bus) is connected to the data exchange unit. In other embodiments, the compression ratio may be set to other ratios.
解压单元的作用与压缩单元相反,将总线数量恢复成与仲裁单元数量相同,包括n个输入端口和m个输出端口,将所述数据交换单元输入的n路总线恢复为m路分别输入m个仲裁单元,通过将总线数量由n个被解压为m个,方便总线进行仲裁操作。The function of the decompression unit is opposite to that of the compression unit. It restores the number of buses to the same number as the arbitration unit, including n input ports and m output ports. It restores the n buses input by the data exchange unit to m buses and inputs m respectively. The arbitration unit decompresses the number of buses from n to m to facilitate bus arbitration operations.
图3为一种压缩单元和解压单元的示例,在本示例中,以4组总线的压缩和解压为例进行说明,压缩单元可以将4组总线压缩为3组,解压单元可以将3组总线还原为4组,由此可以使用更少的总线来传输数据而不影响芯片功能。图中S00,S01,S02,S03为数据源,分别与总线S10,S11,S12,S13相连;总线S10,S11,S12,S13与压缩单元S2 相连,分别与压缩单元S2中的仲裁单元S220,S221,S222相连。仲裁单元S220,S221,S222可以为加权轮询仲裁器,在一些示例中,仲裁单元S220,S221,S222也可使用普通仲裁器或者轮询仲裁器。路由单元S20分别与缓存单元S210,S211,S212相连;缓存单元S210,S211,S212分别与仲裁单元S220,S221,S222相连;仲裁单元S220,S221,S222与压缩后的总线S30,S31,S32相连;总线S30,S31,S32与解压单元S4相连,分别与S4的路由单元S400,S401,S402相连;路由单元S400,S401,S402分别与还原出的总线S50,S51,S52相连;路由单元S400,S401,S402均与仲裁单元S41相连;仲裁单元S41可以为轮询仲裁器,或者可以为普通仲裁器;仲裁单元S41与还原出的总线S53相连;总线S50,S51,S52,S53分别与数据终点S60,S61,S62,S63相连。Figure 3 is an example of a compression unit and decompression unit. In this example, compression and decompression of 4 groups of buses are taken as an example. The compression unit can compress 4 groups of buses into 3 groups, and the decompression unit can compress 3 groups of buses. Restore to 4 groups, so that fewer buses can be used to transmit data without affecting chip functionality. In the figure, S00, S01, S02, and S03 are data sources, which are connected to the buses S10, S11, S12, and S13 respectively; the buses S10, S11, S12, and S13 are connected to the compression unit S2. are connected to the arbitration units S220, S221, and S222 in the compression unit S2 respectively. The arbitration units S220, S221, and S222 may be weighted round robin arbiters. In some examples, the arbitration units S220, S221, and S222 may also use ordinary arbiters or round robin arbiters. The routing unit S20 is connected to the cache units S210, S211, and S212 respectively; the cache units S210, S211, and S212 are connected to the arbitration units S220, S221, and S222 respectively; the arbitration units S220, S221, and S222 are connected to the compressed buses S30, S31, and S32. ; Buses S30, S31, and S32 are connected to the decompression unit S4, and are respectively connected to the routing units S400, S401, and S402 of S4; the routing units S400, S401, and S402 are respectively connected to the restored buses S50, S51, and S52; the routing unit S400, S401 and S402 are both connected to the arbitration unit S41; the arbitration unit S41 can be a polling arbiter or an ordinary arbiter; the arbitration unit S41 is connected to the restored bus S53; the buses S50, S51, S52, and S53 are respectively connected to the data endpoints. S60, S61, S62, S63 are connected.
数据压缩工作流程如下:The data compression workflow is as follows:
数据源S00,S01,S02,S03分别将数据发送到总线S10,S11,S12,S13;其中:总线S13的数据通过路由单元S20被分为3部分,分别缓存到缓存单元S210,S211,S212;缓存单元S210的数据与总线S10的数据通过仲裁单元S220生成总线S30的数据;缓存单元S211的数据与总线S11的数据通过仲裁单元S221生成总线S31的数据;缓存单元S212的数据与总线S12的数据通过仲裁单元S222生成总线S32的数据;至此数据压缩完成;Data sources S00, S01, S02, and S03 send data to buses S10, S11, S12, and S13 respectively; among them: the data of bus S13 is divided into 3 parts through the routing unit S20 and cached in the cache units S210, S211, and S212 respectively; The data of the cache unit S210 and the data of the bus S10 pass through the arbitration unit S220 to generate the data of the bus S30; the data of the cache unit S211 and the data of the bus S11 pass through the arbitration unit S221 to generate the data of the bus S31; the data of the cache unit S212 and the data of the bus S12 The data of bus S32 is generated through arbitration unit S222; now the data compression is completed;
数据解压工作流程如下:The data decompression workflow is as follows:
总线S30,S31,S32将数据传送到解压单元S4;路由单元S400接收到总线S30的数据,分离出总线S10的数据发送到总线S50,完成对总线S10数据的还原,分离出的总线S13的数据发送到仲裁单元S41;路由单元S401接收到总线S31的数据,分离出总线S11的数据发送到总线S51,完成对总线S11数据的还原,分离出的总线S13的数据发送到仲裁单元S41;路由单元S402接收到总线S32的数据,分离出总线S12的数据发送到总线S52,完成对总线S12数据的还原,分离出的总线S13的数据发送到仲裁单元S41;仲裁单元S41接收到路由单元S400,S401,S402的数据,将其发送到总线S53,完成对总线S13的数据还原;总线S50,S51,S52,S53分别将数据发送到数据终点S60,S61,S62,S63。Buses S30, S31, S32 transmit data to decompression unit S4; routing unit S400 receives the data of bus S30, separates the data of bus S10 and sends it to bus S50, completes the restoration of the data of bus S10, and separates the data of bus S13 Sent to arbitration unit S41; routing unit S401 receives the data of bus S31, separates the data of bus S11 and sends it to bus S51, completes the restoration of bus S11 data, and sends the separated data of bus S13 to arbitration unit S41; routing unit S402 receives the data of bus S32, separates the data of bus S12 and sends it to bus S52, completes the restoration of bus S12 data, and sends the separated data of bus S13 to arbitration unit S41; arbitration unit S41 receives routing units S400, S401 , the data of S402 is sent to the bus S53 to complete the data restoration of the bus S13; the buses S50, S51, S52, and S53 send the data to the data end points S60, S61, S62, and S63 respectively.
每个数据节点中的数据交换单元可以包括k个数据交换子单元,k为大于或等于2的正整数,k的取值取决与路由单元的数量或压缩单元的压缩比。其中:The data exchange unit in each data node may include k data exchange sub-units, where k is a positive integer greater than or equal to 2. The value of k depends on the number of routing units or the compression ratio of the compression unit. in:
当数据交换单元与路由单元直接连接(如图1所示结构),则数据交换子单元的数量与路由单元的数量相同,每个数据交换子单元包括一组用于与路由单元和仲裁单元连接的输入输出端口——第一输入端口、第二输出端口,以及一组或多组用于与第二入口和第二出口连接的输入输出端口——第二输入端口、第一输出端口,其中,第一输入端口与路由单元连接,一个第一输出端口与一个第二出口连接,一个第二输入端口与一个第二入口连接,第二输出端口与仲裁单元连接。When the data exchange unit is directly connected to the routing unit (the structure shown in Figure 1), the number of data exchange sub-units is the same as the number of routing units. Each data exchange sub-unit includes a group of data exchange units used to connect to the routing unit and arbitration unit. The input and output ports - the first input port, the second output port, and one or more groups of input and output ports for connecting with the second inlet and the second outlet - the second input port, the first output port, wherein , the first input port is connected to the routing unit, a first output port is connected to a second outlet, a second input port is connected to a second inlet, and the second output port is connected to the arbitration unit.
当数据交换单元分别与压缩单元和解压缩单元连接(如图2所示结构),则数据交换子单元的数量与压缩单元输出端口数量相同。每个数据交换子单元包括一组用于与所述压缩单元和解压单元连接的输入、输出端口——第一输入端口、第二输出端口,以及一组或多组用于与第二入口和第二出口连接的输入、输出端口——第二输入端口、第一输出端口,其中,第一输入端口与压缩单元连接,一个第一输出端口与一个第二出口连接,一个第二输入端口与一个第二入口连接,第二输出端口与解压单元连接。可见,连接压缩单元和解压缩单元后,由于总线数量的减少,可以降低数据交换单元的复杂度。When the data exchange unit is connected to the compression unit and decompression unit respectively (the structure is shown in Figure 2), the number of data exchange sub-units is the same as the number of output ports of the compression unit. Each data exchange subunit includes a set of input and output ports for connecting with the compression unit and decompression unit - a first input port, a second output port, and one or more sets for connecting with the second inlet and the second output port. The input and output ports connected to the second outlet - the second input port and the first output port, wherein the first input port is connected to the compression unit, a first output port is connected to a second outlet, and a second input port is connected to A second inlet is connected, and a second output port is connected to the decompression unit. It can be seen that after connecting the compression unit and decompression unit, the complexity of the data exchange unit can be reduced due to the reduction in the number of buses.
每个数据交换子单元中包括多组路由子单元和仲裁子单元,路由子单元和仲裁子单元 的数量相同,路由子单元和仲裁子单元两两互连,路由子单元和仲裁子单元的数量取决于该数据交换单元所在数据节点相邻的节点数,例如可以是相邻数据节点数+1,当当前数据节点相邻节点为2个,则路由子单元和仲裁子单元的数量均为2+1。每个数据交换子单元中,所述第一输入端口与一个路由子单元连接,所述第一输出端口与一个仲裁子单元连接,一个第二输入端口与一个路由子单元连接,一个第二输出端口与一个仲裁子单元连接。Each data exchange subunit includes multiple groups of routing subunits and arbitration subunits. Routing subunits and arbitration subunits The number of routing subunits and arbitration subunits are interconnected in pairs. The number of routing subunits and arbitration subunits depends on the number of nodes adjacent to the data node where the data exchange unit is located. For example, it can be the number of adjacent data nodes + 1 , when the current data node has 2 adjacent nodes, the number of routing sub-units and arbitration sub-units is both 2+1. In each data exchange subunit, the first input port is connected to a routing subunit, the first output port is connected to an arbitration subunit, a second input port is connected to a routing subunit, and a second output The port is connected to an arbitration subunit.
以相邻两个数据节点为例,一组(包括一个输入和一个输出)总线连接的数据交换子单元结构如图4所示。图中数据交换子单元为包括三组路由子单元和仲裁子单元的两两互连结构,其中一组路由子单元和仲裁子单元分别与压缩单元(无压缩单元时与路由单元)和解压单元(无解压单元时与仲裁单元)连接,另外两组路由子单元和仲裁子单元分别与相邻两个数据节点的数据交换单元连接,其中,路由子单元与相邻节点的数据交换子单元的仲裁子单元连接,仲裁子单元与相邻节点的数据交换子单元的路由子单元连接。k个数据交换子单元组成一个数据交换单元。Taking two adjacent data nodes as an example, the structure of a set of bus-connected data exchange subunits (including one input and one output) is shown in Figure 4. The data exchange subunit in the figure is a pairwise interconnection structure including three groups of routing subunits and arbitration subunits. One group of routing subunits and arbitration subunits are respectively connected with the compression unit (with the routing unit when there is no compression unit) and the decompression unit. (when there is no decompression unit, it is connected to the arbitration unit). The other two sets of routing subunits and arbitration subunits are respectively connected to the data exchange units of two adjacent data nodes. Among them, the routing subunit is connected to the data exchange subunit of the adjacent node. The arbitration subunit is connected to the routing subunit of the data exchange subunit of the adjacent node. K data exchange sub-units form a data exchange unit.
数据交换子单元内的仲裁子单元可以采用加权轮询仲裁方式。加权轮询仲裁时可以给每个输入端口配置权重,权重比值表示每个输入端口通过的消息量之比,以工作量证明芯片包含4个第一数据节点为例,4个第一数据节点中的数据交换单元按2×2排布且消息路由方式按先横向路由后纵向路由时(即当位于对角线的两节点要传输消息,消息首先被路由到横向相邻节点,再被路由到消息终点),每个数据交换子单元内,与解压单元相连的仲裁子单元中与横向节点相连的入口:与纵向节点相连的入口的权重比为1:2;数据交换子单元内其他仲裁子单元的入口权重比为1:1。加权轮询仲裁方式的实现过程参见前文所述,此处不再赘述,采用加权轮询仲裁方式可以提高数据交换单元的效率。在其他实施例中,数据交换子单元内的仲裁子单元也可以使用轮询仲裁或者固定优先级仲裁。The arbitration subunit within the data exchange subunit can adopt a weighted round robin arbitration method. During weighted polling arbitration, each input port can be configured with a weight. The weight ratio represents the ratio of the amount of messages passing through each input port. Taking the workload proof chip as an example, the chip contains 4 first data nodes. Among the 4 first data nodes When the data exchange unit is arranged in 2×2 and the message routing method is first horizontal routing and then vertical routing (that is, when two nodes located on the diagonal want to transmit a message, the message is first routed to the horizontal adjacent node, and then routed to Message end point), in each data exchange sub-unit, the weight ratio of the entrance connected to the horizontal node in the arbitration sub-unit connected to the decompression unit: the entrance connected to the vertical node is 1:2; other arbitrators in the data exchange sub-unit The entrance weight ratio of the unit is 1:1. The implementation process of the weighted polling arbitration method is as described above, and will not be repeated here. The use of the weighted polling arbitration method can improve the efficiency of the data exchange unit. In other embodiments, the arbitration subunit within the data exchange subunit may also use round-robin arbitration or fixed priority arbitration.
图5为本公开实施例提供的一种工作量证明芯片结构示意图,图5A为4个第一数据节点连接关系示意图,图5B为4个第二数据节点连接关系示意图,在本示例中每个数据节点包含压缩单元和解压单元,图中每个数据节点的结构相同,第一数据节点和第二数据节点均采用2×2的mesh拓扑结构,每个数据交换单元包含的数据交换子单元的结构如图4所示。假设计算单元A11开始进行工作量证明计算,需要请求存储单元B41中的数据,记为请求1,如图5A所示,该请求1首先被发送到第一数据节点1中与计算单元A11对应连接的路由单元中,该请求1被第一数据节点1中的对应路由单元缓存,对应路由单元在处理缓存的请求时,将该请求1通过压缩单元发往第一数据节点1的数据交换单元,该请求1通过第一数据节点4的数据交换单元发往数据节点4,再通过第一数据节点4的解压单元、仲裁单元和互连单元发往存储单元B41。请求1访问存储单元B41,获得所请求的数据,记为数据1,如图5B所示,数据1依次通过第二数据节点4、第二数据节点1被发送到计算单元A11,过程与请求1类似,此处不再赘述。至此计算单元A11完成对位于存储单元B41上的数据的请求。任一计算单元可参照以上过程从任意存储单元获得工作量证明需要的数据,进行工作量证明计算。Figure 5 is a schematic structural diagram of a proof-of-work chip provided by an embodiment of the present disclosure. Figure 5A is a schematic diagram of the connection relationship of four first data nodes. Figure 5B is a schematic diagram of the connection relationship of four second data nodes. In this example, each The data node includes a compression unit and a decompression unit. The structure of each data node in the figure is the same. The first data node and the second data node both adopt a 2×2 mesh topology. Each data exchange unit contains a data exchange subunit. The structure is shown in Figure 4. Assume that computing unit A11 starts to perform workload proof calculation and needs to request data in storage unit B41, which is recorded as request 1. As shown in Figure 5A, this request 1 is first sent to the first data node 1 and is connected to the corresponding computing unit A11. In the routing unit, the request 1 is cached by the corresponding routing unit in the first data node 1. When the corresponding routing unit processes the cached request, the request 1 is sent to the data exchange unit of the first data node 1 through the compression unit. The request 1 is sent to the data node 4 through the data exchange unit of the first data node 4, and then sent to the storage unit B41 through the decompression unit, arbitration unit and interconnection unit of the first data node 4. Request 1 accesses storage unit B41 and obtains the requested data, which is recorded as data 1. As shown in Figure 5B, data 1 is sent to computing unit A11 through second data node 4 and second data node 1 in sequence. The process is the same as request 1. Similar, will not be repeated here. So far the computing unit A11 has completed the request for the data located on the storage unit B41. Any computing unit can refer to the above process to obtain the data required for proof of work from any storage unit and perform proof of work calculations.
图6为工作量证明芯片结构包括6个第一数据节点(或第二数据节点)时,6数据节点中的6个数据交换单元的互联示意图,数据交换单元采用2×3的mesh拓扑分布,即6个数据节点采用2×3的mesh拓扑分布。此时由于位于中间行的数据交换单元与相邻3个数据节点连接,因此每个数据交换单元中的数据交换子单元中路由子单元和仲裁子单元的数量均为3+1个。每个数据子单元内部结构如图7所示,包括4组路由子单元和仲裁子单元两两互连,即4个路由子单元和4个仲裁子单元两两互连。Figure 6 is a schematic diagram of the interconnection of 6 data exchange units in the 6 data nodes when the workload proof chip structure includes 6 first data nodes (or second data nodes). The data exchange units adopt a 2×3 mesh topology distribution. That is, 6 data nodes are distributed in a 2×3 mesh topology. At this time, since the data exchange unit located in the middle row is connected to three adjacent data nodes, the number of routing subunits and arbitration subunits in the data exchange subunit in each data exchange unit is 3+1. The internal structure of each data subunit is shown in Figure 7, including 4 groups of routing subunits and arbitration subunits interconnected in pairs, that is, 4 routing subunits and 4 arbitration subunits are interconnected in pairs.
图8为工作量证明芯片包括9个第一数据节点(或第二数据节点)时,9数据节点中的9个数据交换单元的互联示意图,数据交换单元采用3×3的mesh拓扑分布,即9个 数据节点采用3×3的mesh拓扑分布。此时由于位于中部的数据交换单元与相邻4个数据节点连接,因此每个数据交换单元中的数据交换子单元中路由子单元和仲裁子单元的数量均为4+1个。每个数据子单元内部结构如图9所示,包括5组路由子单元和仲裁子单元两两互连。Figure 8 is a schematic diagram of the interconnection of 9 data exchange units among the 9 data nodes when the workload proof chip includes 9 first data nodes (or second data nodes). The data exchange units adopt a 3×3 mesh topology distribution, that is, 9 Data nodes are distributed in a 3×3 mesh topology. At this time, since the data exchange unit located in the middle is connected to four adjacent data nodes, the number of routing subunits and arbitration subunits in the data exchange subunit in each data exchange unit is 4+1. The internal structure of each data subunit is shown in Figure 9, including 5 groups of routing subunits and arbitration subunits interconnected in pairs.
采用本公开实施例的方案,至少可以实现共[2至9]*2个数据节点的工作量证明芯片,其中2至9表示第一数据节点或第二数据节点的数量。虽然文中以mesh分布为例进行说明,但不排除可以采用其他拓扑结构,例如,在数据节点数量较少时,如3个或5个时,可以采用星型(star)结构。Using the solutions of the embodiments of the present disclosure, it is possible to implement at least a workload proof chip with a total of [2 to 9]*2 data nodes, where 2 to 9 represents the number of first data nodes or second data nodes. Although the mesh distribution is used as an example for explanation, it does not rule out that other topologies can be used. For example, when the number of data nodes is small, such as 3 or 5, a star structure can be used.
以共需要连接120个计算单元和120个存储单元为例,当单纯使用全交叉开关实现,则需要120×120的全交叉开关,当前科技水平难以实现,当单纯使用mesh结构实现,则需要16×8的mesh排布,会导致效率非常低下。而采用本实施例提供的工作量证明芯片结构,以采用4个第一数据节点和4个第二数据节点结构为例,每个数据节点包括30组出入口,即每个数据节点连接30个计算单元和30个存储单元,每个数据节点的互连单元可以为30×30的全交叉开关,则该工作量证明芯片只需要4*2个30×30的全交叉开关和2组2×2的mesh互连即可实现消息在任一计算单元和任一存储单元之间互通。由于多个mesh节点分担了端口数,避免了单纯使用全交叉开关端口数量太多导致规模太大无法实现的问题,且使用较少的数据节点即可实现,结构简单、效率高。同时,采用本公开实施例提供的工作量证明芯片实现可以获得较高的片内带宽,实测在端口1024bit位宽,500M时钟频率下,该工作量证明芯片可实现大约6144GB/s的片内带宽,远超过当前最高端的GPU的1004GB/s的片内带宽。Taking a total of 120 computing units and 120 storage units as an example, if a full crossbar switch is used to implement it, a 120×120 full crossbar switch is required, which is difficult to achieve at the current technological level. When a mesh structure is used to implement it, 16 ×8 mesh arrangement will lead to very low efficiency. Using the workload proof chip structure provided in this embodiment, taking the structure of 4 first data nodes and 4 second data nodes as an example, each data node includes 30 sets of entrances and exits, that is, each data node is connected to 30 computing nodes. unit and 30 storage units, the interconnection unit of each data node can be a 30×30 full crossbar switch, then the workload proof chip only requires 4*2 30×30 full crossbar switches and 2 groups of 2×2 The mesh interconnection can realize the exchange of messages between any computing unit and any storage unit. Since multiple mesh nodes share the number of ports, it avoids the problem of simply using a full crossbar switch with too many ports, resulting in a scale that is too large to be implemented. It can be implemented using fewer data nodes, with a simple structure and high efficiency. At the same time, higher on-chip bandwidth can be obtained by using the workload proof chip implementation provided by the embodiments of the present disclosure. It is actually measured that the workload proof chip can achieve an on-chip bandwidth of approximately 6144GB/s under a port width of 1024bit and a clock frequency of 500M. , far exceeding the 1004GB/s on-chip bandwidth of the current most high-end GPUs.
如图10所示,本公开实施例还提供了一种包括上述工作量证明芯片的电子设备,所述电子设备例如可以是可提供计算服务的终端设备,例如平板电脑、笔记本电脑、掌上电脑、移动互联网设备、可穿戴设备等。或者可以是可提供云服务、云计算、云存储、网络服务、云通信、中间件服务、域名服务、安全服务等计算服务的服务器。As shown in Figure 10, an embodiment of the present disclosure also provides an electronic device including the above-mentioned proof-of-work chip. The electronic device may be, for example, a terminal device that can provide computing services, such as a tablet computer, a notebook computer, a handheld computer, Mobile Internet devices, wearable devices, etc. Or it can be a server that can provide computing services such as cloud services, cloud computing, cloud storage, network services, cloud communications, middleware services, domain name services, security services, etc.
在本公开实施例的描述中,需要说明的是,除非另有明确的规定和限定,术语“安装”、“相连”、“连接”应做广义理解,例如,可以是固定连接,也可以是可拆卸连接,或一体地连接;可以是机械连接,也可以是电连接;可以是直接相连,也可以通过中间媒介间接相连,可以是两个元件内部的连通。对于本领域的普通技术人员而言,可以根据情况理解上述术语在本公开中的含义。In the description of the embodiments of the present disclosure, it should be noted that, unless otherwise clearly stated and limited, the terms "installation", "connection" and "connection" should be understood in a broad sense. For example, it can be a fixed connection or a fixed connection. Detachable connection, or integral connection; it can be a mechanical connection or an electrical connection; it can be a direct connection or an indirect connection through an intermediate medium; it can be an internal connection between two components. For those of ordinary skill in the art, the meanings of the above terms in this disclosure can be understood according to the circumstances.
本领域普通技术人员可以理解,上文中所公开方法中的全部或某些步骤、系统、装置中的功能模块/单元可以被实施为软件、固件、硬件及其适当的组合。在硬件实施方式中,在以上描述中提及的功能模块/单元之间的划分不一定对应于物理组件的划分;例如,一个物理组件可以具有多个功能,或者一个功能或步骤可以由若干物理组件合作执行。某些组件或所有组件可以被实施为由处理器,如数字信号处理器或微处理器执行的软件,或者被实施为硬件,或者被实施为集成电路,如专用集成电路。这样的软件可以分布在计算机可读介质上,计算机可读介质可以包括计算机存储介质(或非暂时性介质)和通信介质(或暂时性介质)。如本领域普通技术人员公知的,术语计算机存储介质包括在用于存储信息(诸如计算机可读指令、数据结构、程序模块或其他数据)的任何方法或技术中实施的易失性和非易失性、可移除和不可移除介质。计算机存储介质包括但不限于RAM、ROM、EEPROM、闪存或其他存储器技术、CD-ROM、数字多功能盘(DVD)或其他光盘存储、磁盒、磁带、磁盘存储或其他磁存储装置、或者可以用于存储期望的信息并且可以被计算机访问的任何其他的介质。此外,本领域普通技术人员公知的是,通信介质通常包含计算机可读指令、数据结构、程序模块或者诸如载波或其他传输机制之类的调制数据信号中的 其他数据,并且可包括任何信息递送介质。 Those of ordinary skill in the art can understand that all or some steps, systems, and functional modules/units in the devices disclosed above can be implemented as software, firmware, hardware, and appropriate combinations thereof. In hardware implementations, the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may consist of several physical components. Components execute cooperatively. Some or all of the components may be implemented as software executed by a processor, such as a digital signal processor or a microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer-readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). As is known to those of ordinary skill in the art, the term computer storage media includes volatile and nonvolatile media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. removable, removable and non-removable media. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disk (DVD) or other optical disk storage, magnetic cassettes, tapes, disk storage or other magnetic storage devices, or may Any other medium used to store the desired information and that can be accessed by a computer. Furthermore, as is known to those of ordinary skill in the art, communication media typically embodies computer readable instructions, data structures, program modules or other information in a modulated data signal such as a carrier wave or other transport mechanism. other data and may include any information delivery medium.

Claims (15)

  1. 一种工作量证明芯片,包括至少一组计算单元、至少一组存储单元、至少两个第一数据节点以及至少两个第二数据节点,每个所述第一数据节点和每个所述第二数据节点包括多组第一入口、多组第一出口,所述一组计算单元与一个第一数据节点的一组第一入口一一对应连接,所述第一数据节点的一组第一出口与一组存储单元一一对应连接,所述一组存储单元与一个第二数据节点的一组第一入口一一对应连接,所述第二数据节点的一组第一出口与所述一组计算单元一一对应连接;其中:A workload proof chip includes at least one group of computing units, at least one group of storage units, at least two first data nodes and at least two second data nodes, each of the first data nodes and each of the third data nodes. The two data nodes include multiple groups of first entrances and multiple groups of first exits. The group of computing units are connected to a group of first entrances of a first data node in a one-to-one correspondence. The group of first entrances of the first data node are connected in a one-to-one correspondence. The outlet is connected to a group of storage units in a one-to-one correspondence. The group of storage units is connected to a group of first entrances of a second data node in a one-to-one correspondence. A group of first exits of the second data node is connected to the first entrance of the second data node. Groups of computing units are connected in one-to-one correspondence; among them:
    所述计算单元设置为在进行工作量证明计算时通过所述第一数据节点向所述存储单元发送用于请求数据的消息;The computing unit is configured to send a message for requesting data to the storage unit through the first data node when performing proof-of-work calculations;
    所述存储单元设置为存储工作量证明计算中使用的数据集,响应于计算单元的消息,通过所述第二数据节点向计算单元发送包含数据的消息;The storage unit is configured to store a data set used in the proof-of-work calculation, and in response to a message from the computing unit, sends a message containing the data to the computing unit through the second data node;
    所述第一数据节点设置为将所述计算单元发送的用于请求数据的消息发送至所述存储单元;The first data node is configured to send a message requesting data sent by the computing unit to the storage unit;
    所述第二数据节点设置为将所述存储单元发送的包含数据的消息发送至所述计算单元。The second data node is configured to send the message containing data sent by the storage unit to the computing unit.
  2. 根据权利要求1所述的工作量证明芯片,其中,The proof-of-work chip according to claim 1, wherein,
    所述第一数据节点包括y个,所述第二数据节点包括y个,所述第一数据节点之间采用网格互联,所述第二数据节点之间采用网格互联;The first data nodes include y, the second data nodes include y, the first data nodes adopt grid interconnection, and the second data nodes adopt grid interconnection;
    一组计算单元通过网格互联的所述第一数据节点与y组存储单元连接,一组存储单元通过网格互联的所述第二数据节点与y组计算单元连接,y为大于或等于2的正整数。A group of computing units is connected to a y group of storage units through the first data nodes of the grid interconnection. A group of storage units is connected to a y group of computing units through the second data nodes of the grid interconnection. y is greater than or equal to 2 positive integer.
  3. 根据权利要求1或2所述的工作量证明芯片,其中,The proof-of-work chip according to claim 1 or 2, wherein,
    每个所述第一数据节点和每个所述第二数据节点包括多个路由单元、多个仲裁单元、一个数据交换单元、一个互连单元、多个第一入口、多个第一出口、一个或多个第二入口以及一个或多个第二出口,每个所述路由单元的输入端与一个所述第一入口连接,每个所述路由单元的第一输出端与一个所述仲裁单元的第一输入端一一对应连接,每个所述路由单元的第二输出端与所述数据交换单元的第一输入端连接,所述数据交换单元的第一输出端与所述的第二出口连接,所述数据交换单元的第二输入端与所述第二入口连接,所述数据交换单元的第二输出端与每个所述仲裁单元的第二输入端连接,每个所述仲裁单元的输出端与所述互连单元的输入端一一对应连接,所述互连单元的输出端与所述第一出口一一对应连接,第二入口和第二出口被设置为与其他数据节点连接,其中:Each of the first data nodes and each of the second data nodes includes a plurality of routing units, a plurality of arbitration units, a data switching unit, an interconnection unit, a plurality of first entrances, a plurality of first exits, One or more second inlets and one or more second outlets, the input end of each routing unit is connected to one of the first inlets, and the first output end of each routing unit is connected to one of the arbitration The first input terminals of the units are connected in one-to-one correspondence, the second output terminals of each routing unit are connected to the first input terminals of the data exchange units, and the first output terminals of the data exchange units are connected to the third Two outlets are connected, the second input end of the data exchange unit is connected to the second inlet, the second output end of the data exchange unit is connected to the second input end of each arbitration unit, each of the The output end of the arbitration unit is connected to the input end of the interconnection unit in a one-to-one correspondence, the output end of the interconnection unit is connected to the first outlet in a one-to-one correspondence, and the second inlet and the second outlet are configured to communicate with other Data node connections, where:
    所述路由单元设置为接收所述第一入口的消息,将所述消息发送至所述仲裁单元和/或所述数据交换单元;The routing unit is configured to receive messages from the first portal and send the messages to the arbitration unit and/or the data exchange unit;
    所述数据交换单元设置为接收所述第二入口的消息,将所述消息发送至所述仲裁单元,以及设置为接收所述路由单元发送的消息,将所述消息通过所述第二出口输出;The data exchange unit is configured to receive messages from the second entrance, send the messages to the arbitration unit, and is configured to receive messages sent from the routing unit, and output the messages through the second outlet. ;
    所述仲裁单元设置为接收所述路由单元和/或所述数据交换单元发送的消息,将所述消息通过所述互连单元发送至所述第一出口。The arbitration unit is configured to receive messages sent by the routing unit and/or the data exchange unit, and send the messages to the first outlet through the interconnection unit.
  4. 根据权利要求3所述的工作量证明芯片,其中,所述数据节点还包括压缩单元和解压单元,每个所述路由单元通过所述压缩单元与所述数据交换单元连接,所述数据交换单元通过所述解压单元与每个所述仲裁单元连接;其中: The workload proof chip according to claim 3, wherein the data node further includes a compression unit and a decompression unit, each of the routing units is connected to the data exchange unit through the compression unit, and the data exchange unit The decompression unit is connected to each arbitration unit; where:
    所述压缩单元包括m个输入端口和n个输出端口,设置为将m个路由单元输入的m路总线压缩为n路输出至所述数据交换单元;The compression unit includes m input ports and n output ports, and is configured to compress the m buses input by the m routing units into n buses and output them to the data exchange unit;
    所述解压单元包括n个输入端口和m个输出端口,设置为将所述数据交换单元输入的n路总线恢复为m路分别输入m个仲裁单元;The decompression unit includes n input ports and m output ports, and is configured to restore the n buses input by the data exchange unit to m buses and input m arbitration units respectively;
    其中,所述m和n均为大于零的正整数,且m>n。Wherein, the m and n are both positive integers greater than zero, and m>n.
  5. 根据权利要求3所述的工作量证明芯片,其中,The proof-of-work chip according to claim 3, wherein,
    所述数据交换单元包括多个数据交换子单元,所述数据交换子单元的数量与所述路由单元的数量相同,每个数据交换子单元包括用于与路由单元连接的第一输入端口,用于与第二出口连接的第一输出端口,用于与第二入口连接的第二输入端口,以及用于与仲裁单元连接的第二输出端口。The data exchange unit includes a plurality of data exchange sub-units. The number of the data exchange sub-units is the same as the number of the routing units. Each data exchange sub-unit includes a first input port for connecting with the routing unit. a first output port connected to the second outlet, a second input port connected to the second inlet, and a second output port connected to the arbitration unit.
  6. 根据权利要求4所述的工作量证明芯片,其中,The proof-of-work chip according to claim 4, wherein,
    所述数据交换单元包括n个数据交换子单元,每个数据交换子单元包括用于与所述压缩单元连接的第一输入端口,用于与第二出口连接的第一输出端口,用于与第二入口连接的第二输入端口,以及用于与所述解压单元连接的第二输出端口。The data exchange unit includes n data exchange subunits, and each data exchange subunit includes a first input port for connecting with the compression unit, a first output port for connecting with a second outlet, and a first output port for connecting with the second outlet. a second input port connected to the second inlet, and a second output port connected to the decompression unit.
  7. 根据权利要求5或6所述的工作量证明芯片,其中,The proof-of-work chip according to claim 5 or 6, wherein,
    每个数据交换子单元包括多组路由子单元和仲裁子单元,所述路由子单元与仲裁子单元两两互连,其中,所述第一输入端口与一个路由子单元连接,所述第一输出端口与一个仲裁子单元连接,一个第二输入端口与一个路由子单元连接,一个第二输入端口与一个仲裁子单元连接。Each data exchange subunit includes multiple groups of routing subunits and arbitration subunits. The routing subunits and arbitration subunits are interconnected in pairs. The first input port is connected to one routing subunit, and the first The output port is connected to an arbitration subunit, a second input port is connected to a routing subunit, and a second input port is connected to an arbitration subunit.
  8. 根据权利要求7所述的工作量证明芯片,其中,The proof-of-work chip according to claim 7, wherein,
    所述仲裁子单元设置为为本仲裁子单元的多个输入端口分别设置相同或不同的权重,每个输入端口的权重值表示该输入端口能连续处理的消息的数量。The arbitration subunit is configured to set the same or different weights for multiple input ports of the arbitration subunit, and the weight value of each input port represents the number of messages that the input port can continuously process.
  9. 根据权利要求7所述的工作量证明芯片,其中,The proof-of-work chip according to claim 7, wherein,
    所述仲裁子单元设置为为本仲裁子单元的多个输入端口分别设置不同的优先级,在本仲裁子单元处理消息时,所述仲裁子单元选择优先级最高的且有待处理消息的输入端口,所述输入端口的消息处理完成后,重新调整所述输入端口的优先级。The arbitration subunit is configured to set different priorities for multiple input ports of the arbitration subunit. When the arbitration subunit processes a message, the arbitration subunit selects the input port with the highest priority and the message to be processed. , after the message processing of the input port is completed, the priority of the input port is readjusted.
  10. 根据权利要求3所述的工作量证明芯片,其中,The proof-of-work chip according to claim 3, wherein,
    所述仲裁单元还设置为为本仲裁单元的多个输入端口分别设置相同或不同的权重,每个输入端口的权重值表示该输入端口能连续处理的消息的数量。The arbitration unit is also configured to set the same or different weights for multiple input ports of the arbitration unit, and the weight value of each input port represents the number of messages that the input port can continuously process.
  11. 根据权利要求3所述的工作量证明芯片,其中,The proof-of-work chip according to claim 3, wherein,
    所述仲裁单元还设置为为本仲裁单元的多个输入端口分别设置不同的优先级,在本仲裁单元处理消息时,所述仲裁单元选择优先级最高的且有待处理消息的输入端口,所述输入端口的消息处理完成后,重新调整所述输入端口的优先级。The arbitration unit is also configured to set different priorities for multiple input ports of the arbitration unit. When the arbitration unit processes a message, the arbitration unit selects the input port with the highest priority and the message to be processed. After the message processing of the input port is completed, the priority of the input port is readjusted.
  12. 根据权利要求3所述的工作量证明芯片,其中,The proof-of-work chip according to claim 3, wherein,
    所述互连单元包括多个输入端口和多个输出端口,每个输入端口与一个所述仲裁单元连接,每个输出端口与一个所述第一出口连接,所述互连单元设置为将所述仲裁单元输出的消息发送至任意一个所述第一出口。The interconnection unit includes a plurality of input ports and a plurality of output ports, each input port is connected to one of the arbitration units, each output port is connected to one of the first outlets, and the interconnection unit is configured to connect all The message output by the arbitration unit is sent to any one of the first exits.
  13. 根据权利要求12所述的工作量证明芯片,其中,所述互连单元为全交叉开关。The proof-of-work chip according to claim 12, wherein the interconnection unit is a full crossbar switch.
  14. 根据权利要求3所述的工作量证明芯片,其中, The proof-of-work chip according to claim 3, wherein,
    所述工作量证明芯片包括4个数据节点,所述4个数据节点采用2×2网格拓扑结构,每个数据节点中的数据交换单元包括n个数据交换子单元,每个数据交换子单元包括3组两两互连的路由子单元和仲裁子单元;或者The workload proof chip includes 4 data nodes. The 4 data nodes adopt a 2×2 grid topology. The data exchange unit in each data node includes n data exchange sub-units. Each data exchange sub-unit Including three groups of routing subunits and arbitration subunits interconnected in pairs; or
    所述工作量证明芯片包括6个数据节点,所述6个数据节点采用2×3网格拓扑结构,每个数据节点中的数据交换单元包括n个数据交换子单元,每个数据交换子单元包括4组两两互连的路由子单元和仲裁子单元;或者The workload proof chip includes 6 data nodes. The 6 data nodes adopt a 2×3 grid topology. The data exchange unit in each data node includes n data exchange sub-units. Each data exchange sub-unit Including 4 groups of routing sub-units and arbitration sub-units interconnected in pairs; or
    所述工作量证明芯片包括9个数据节点,所述9个数据节点采用3×3网格拓扑结构,每个数据节点中的数据交换单元包括n个数据交换子单元,每个数据交换子单元包括5组两两互连的路由子单元和仲裁子单元。The workload proof chip includes 9 data nodes. The 9 data nodes adopt a 3×3 grid topology. The data exchange unit in each data node includes n data exchange sub-units. Each data exchange sub-unit It includes 5 groups of routing subunits and arbitration subunits interconnected in pairs.
  15. 一种电子设备,包括如权利要求1-14中任一项所述的工作量证明芯片。 An electronic device including the proof-of-work chip according to any one of claims 1-14.
PCT/CN2023/077718 2022-07-18 2023-02-22 Proof-of-work chip and electronic device WO2024016661A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210838519.1A CN115002050B (en) 2022-07-18 2022-07-18 Workload proving chip
CN202210838519.1 2022-07-18

Publications (1)

Publication Number Publication Date
WO2024016661A1 true WO2024016661A1 (en) 2024-01-25

Family

ID=83021240

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/077718 WO2024016661A1 (en) 2022-07-18 2023-02-22 Proof-of-work chip and electronic device

Country Status (2)

Country Link
CN (1) CN115002050B (en)
WO (1) WO2024016661A1 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115002050B (en) * 2022-07-18 2022-09-30 中科声龙科技发展(北京)有限公司 Workload proving chip
CN114928578B (en) * 2022-07-19 2022-09-16 中科声龙科技发展(北京)有限公司 Chip structure
CN114968865B (en) * 2022-07-22 2022-09-27 中科声龙科技发展(北京)有限公司 Bus transmission structure and method and chip
CN115905088B (en) * 2022-12-27 2023-07-14 声龙(新加坡)私人有限公司 Data collection structure, method, chip and system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140156929A1 (en) * 2012-12-04 2014-06-05 Ecole Polytechnique Federale De Lausanne (Epfl) Network-on-chip using request and reply trees for low-latency processor-memory communication
CN209543343U (en) * 2018-10-30 2019-10-25 北京比特大陆科技有限公司 Big data operation acceleration system
CN112214427A (en) * 2020-10-10 2021-01-12 中科声龙科技发展(北京)有限公司 Cache structure, workload proving operation chip circuit and data calling method thereof
US20210409487A1 (en) * 2019-07-30 2021-12-30 Alibaba Group Holding Limited Apparatus and method for controlling data transmission in network system
CN113924556A (en) * 2019-04-26 2022-01-11 株式会社艾库塞尔 Information processing apparatus
CN115002050A (en) * 2022-07-18 2022-09-02 中科声龙科技发展(北京)有限公司 Workload proving chip

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10496578B2 (en) * 2017-01-06 2019-12-03 Samsung Electronics Co., Ltd. Central arbitration scheme for a highly efficient interconnection topology in a GPU
CN110620731B (en) * 2019-09-12 2021-03-23 中山大学 Routing device and routing method of network on chip
CN112491715B (en) * 2020-11-30 2022-06-03 清华大学 Routing device and routing equipment of network on chip
CN114003552B (en) * 2021-12-30 2022-03-29 中科声龙科技发展(北京)有限公司 Workload proving operation method, workload proving chip and upper computer
CN114388025B (en) * 2021-12-30 2022-09-13 中科声龙科技发展(北京)有限公司 Dynamic random access memory refreshing circuit, refreshing method and workload proving chip

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140156929A1 (en) * 2012-12-04 2014-06-05 Ecole Polytechnique Federale De Lausanne (Epfl) Network-on-chip using request and reply trees for low-latency processor-memory communication
CN209543343U (en) * 2018-10-30 2019-10-25 北京比特大陆科技有限公司 Big data operation acceleration system
CN113924556A (en) * 2019-04-26 2022-01-11 株式会社艾库塞尔 Information processing apparatus
US20210409487A1 (en) * 2019-07-30 2021-12-30 Alibaba Group Holding Limited Apparatus and method for controlling data transmission in network system
CN112214427A (en) * 2020-10-10 2021-01-12 中科声龙科技发展(北京)有限公司 Cache structure, workload proving operation chip circuit and data calling method thereof
CN115002050A (en) * 2022-07-18 2022-09-02 中科声龙科技发展(北京)有限公司 Workload proving chip

Also Published As

Publication number Publication date
CN115002050A (en) 2022-09-02
CN115002050B (en) 2022-09-30

Similar Documents

Publication Publication Date Title
WO2024016661A1 (en) Proof-of-work chip and electronic device
WO2024016660A1 (en) Chip structure and electronic device
JP6093867B2 (en) Non-uniform channel capacity in the interconnect
US10678738B2 (en) Memory extensible chip
US8930595B2 (en) Memory switch for interconnecting server nodes
US8325761B2 (en) System and method for establishing sufficient virtual channel performance in a parallel computing network
US9007920B2 (en) QoS in heterogeneous NoC by assigning weights to NoC node channels and using weighted arbitration at NoC nodes
US10248315B2 (en) Devices and methods for interconnecting server nodes
US10394738B2 (en) Technologies for scalable hierarchical interconnect topologies
JP2016531372A (en) Memory module access method and apparatus
CN114928577B (en) Workload proving chip and processing method thereof
WO2010062916A1 (en) Network-on-chip system, method, and computer program product for transmitting messages utilizing a centralized on-chip shared memory switch
EP3579507B1 (en) Dynamic scheduling methods, platform, system and switch apparatus.
US9304706B2 (en) Efficient complex network traffic management in a non-uniform memory system
WO2024016649A1 (en) Bus transmission structure and method, and chip
CN107239407B (en) Wireless access method and device for memory
CN116150082A (en) Access method, device, chip, electronic equipment and storage medium
End et al. Adaption of the n-way dissemination algorithm for gaspi split-phase allreduce
US20230129107A1 (en) Method and apparatus to aggregate objects to be stored in a memory to optimize the memory bandwidth
End et al. Butterfly-like algorithms for GASPI split-phase allreduce
CN117221212B (en) Optical network on chip low congestion routing method and related equipment
CN112949247B (en) Phase-based on-chip bus scheduling device and method
CN117135107B (en) Network communication topology system, routing method, device and medium
CN114095289B (en) Data multicast circuit, method, electronic device, and computer-readable storage medium
US11934332B2 (en) Data shuffle offload

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23841739

Country of ref document: EP

Kind code of ref document: A1