WO2024016661A1

WO2024016661A1 - Proof-of-work chip and electronic device

Info

Publication number: WO2024016661A1
Application number: PCT/CN2023/077718
Authority: WO
Inventors: 刘明; 蔡凯; 田佩佳; 张雨生; 闫超
Original assignee: 声龙(新加坡)私人有限公司
Priority date: 2022-07-18
Filing date: 2023-02-22
Publication date: 2024-01-25
Also published as: CN115002050A; CN115002050B

Abstract

A proof-of-work chip and an electronic device. The proof-of-work chip comprises at least one group of calculation units, at least one group of storage units, at least two first data nodes and at least two second data nodes, wherein each first data node and each second data node comprise a plurality of first ingresses and a plurality of first egresses, a group of calculation units are correspondingly connected to a group of first ingresses of a first data node on a one-to-one basis, a group of first egresses of each first data node are correspondingly connected to a group of storage units on a one-to-one basis, a group of storage units are correspondingly connected to a group of first ingresses of a second data node on a one-to-one basis, and a group of first egresses of each second data node are correspondingly connected to a group of calculation units on a one-to-one basis; the calculation units are configured to send, during proof-of-work calculation, messages for requesting data to the storage units and by means of the first data nodes; and the storage units are configured to store data sets used in the proof-of-work calculation, and send, in response to the messages from the calculation units, messages including data to the calculation units and by means of the second data nodes.

Description

Proof-of-work chips and electronic equipment

This application claims priority to the Chinese patent application filed with the China Patent Office on July 18, 2022, with application number 202210838519.1 and the invention name "Workload Proof Chip". Its content should be understood to be incorporated into this application by reference. middle.

Technical field

Embodiments of the present disclosure relate to but are not limited to the field of computer application technology, and particularly refer to a proof-of-work chip and electronic equipment.

Background technique

In blockchain technology, the generation of blocks needs to be completed by the Proof of Work (POW) algorithm. Proof of Work is a hash function that can use Central Processing Unit (CPU), graphics Processor (Graphic Processing Unit, GPU) or Field-Programmable Gate Array (FPGA) to solve it. The process of solving requires random address access to a large data set, and the entire data set is generally stored in memory or video memory. In blockchain proof-of-work applications, computing power is directly proportional to data bandwidth, so high on-chip bandwidth is required. However, traditional CPU, GPU or FPGA structures cannot solve this problem well.

Contents of the invention

The following is an overview of the topics described in detail in this article. This summary is not intended to limit the scope of the claims.

Embodiments of the present disclosure provide a proof-of-work chip, including at least one set of computing units, at least one set of storage units, at least two first data nodes, and at least two second data nodes, each of the first data nodes Each of the second data nodes includes multiple sets of first inlets and multiple sets of first outlets. The set of computing units is connected to a set of first inlets of a first data node in a one-to-one correspondence. The first data A group of first outlets of a node is connected to a group of storage units in a one-to-one correspondence. The group of storage units is connected to a group of first entrances of a second data node in a one-to-one correspondence. A group of first entrances of the second data node are connected in a one-to-one correspondence. An outlet is connected to the group of computing units in a one-to-one correspondence; wherein:

The computing unit is configured to send a message for requesting data to the storage unit through the first data node when performing proof-of-work calculations;

The storage unit is configured to store a data set used in the proof-of-work calculation, and in response to a message from the computing unit, sends a message containing the data to the computing unit through the second data node;

The first data node is configured to send a message requesting data sent by the computing unit to the storage unit;

The second data node is configured to send the message containing data sent by the storage unit to the computing unit.

An embodiment of the present disclosure also provides an electronic device including the above-mentioned proof-of-work chip.

Additional features and advantages of the disclosure will be set forth in the description which follows, and, in part, will be apparent from the description, or may be learned by practice of the disclosure. Other advantages of the disclosure may be realized and obtained by the arrangements described in the specification, claims, and drawings.

Other aspects will be apparent after reading and understanding the drawings and detailed description.

Figure overview

The drawings are used to provide an understanding of the technical solution of the present disclosure and constitute a part of the specification. They are used to explain the technical solution of the present disclosure together with the embodiments of the present disclosure and do not constitute a limitation of the technical solution of the present disclosure. The shapes and sizes of components in the drawings do not reflect true proportions and are intended only to illustrate the present disclosure.

Figure 1 is a schematic structural diagram of a proof-of-work chip provided by an embodiment of the present disclosure;

Figure 2 is a schematic structural diagram of another data node provided by an embodiment of the present disclosure;

Figure 3 is a schematic diagram of a compression unit and decompression unit according to an embodiment of the present disclosure;

Figure 4 is a schematic structural diagram of a data exchange subunit provided by an embodiment of the present disclosure.

Figure 5A is a schematic diagram including four first data nodes provided by an embodiment of the present disclosure;

Figure 5B is a schematic diagram including four second data nodes provided by an embodiment of the present disclosure.

Figure 6 is a schematic connection diagram of a data switching unit when including six first data nodes (or second data nodes) provided by an embodiment of the present disclosure;

Figure 7 is a schematic diagram of the internal structure of the data subunit in the data exchange unit shown in Figure 6;

Figure 8 is a schematic connection diagram of a data switching unit when nine first data nodes (or second data nodes) are included according to an embodiment of the present disclosure;

Figure 9 is a schematic node entry of the internal structure of the data subunit in the data exchange unit shown in Figure 8;

FIG. 10 is a schematic diagram of an electronic device including a proof-of-work chip provided by an embodiment of the present disclosure.

Elaborate

The present disclosure describes multiple embodiments, but the description is illustrative rather than restrictive, and it is obvious to a person of ordinary skill in the art that within the scope of the embodiments described in the present disclosure, There are many more examples and implementations. Although many possible combinations of features are shown in the drawings and discussed in the detailed description, many other combinations of the disclosed features are possible. Unless expressly limited, any feature or element of any embodiment may be used in combination with, or may be substituted for, any other feature or element of any other embodiment.

The present disclosure includes and contemplates combinations with features and elements known to those of ordinary skill in the art. The disclosed embodiments, features and elements of this disclosure may also be combined with any conventional features or elements to form unique inventive solutions as defined by the claims. Any feature or element of any embodiment may also be combined with features or elements from other inventive solutions to form another unique inventive solution as defined by the claims. Accordingly, it should be understood that any feature shown and/or discussed in this disclosure may be implemented individually or in any suitable combination. Accordingly, the embodiments are not to be limited except by those appended claims and their equivalents. Furthermore, various modifications and changes may be made within the scope of the appended claims.

Additionally, in describing representative embodiments, the specification may have presented methods and/or processes as a specific sequence of steps. However, to the extent that the method or process does not rely on the specific order of steps described herein, the method or process should not be limited to the specific order of steps described. As one of ordinary skill in the art will appreciate, other sequences of steps are possible. Therefore, the specific order of steps set forth in the specification should not be construed as limiting the claims. Furthermore, claims directed to the method and/or process should not be limited to steps performing them in the order written. Skilled artisans will readily appreciate that these orders may be varied and still remain within the spirit and scope of the disclosed embodiments.

Figure 1 is a schematic structural diagram of a proof-of-work chip provided by an embodiment of the present disclosure. The proof-of-work chip includes at least one set of computing units, at least one set of storage units, at least two first data nodes and at least two second data nodes. , each data node (including the first data node and the second data node) includes multiple groups of first node entrances (the first node entrance is the first entrance of the data node, referred to as the first entrance in the text), multiple groups The first node exit (the first node exit is the first exit of the data node, referred to as the first exit in the text), the set of computing units is connected to a set of first entrances of a first data node in a one-to-one correspondence, A group of first outlets of the first data node is connected to a group of storage units in a one-to-one correspondence, and the group of storage units is connected to a group of first inlets of a second data node in a one-to-one correspondence. The second data A group of first outlets of a node is connected to a group of computing units in a one-to-one correspondence, that is, a group of computing units is connected to a group of storage units through at least 2 first data nodes, and the group of storage units is connected to a group of storage units through at least 2 second data nodes. The node is connected to the set of computing units; where:

Each of the above data nodes includes multiple sets of first inlets and multiple sets of first outlets. When connected to the computing unit and the storage unit, it can be one of the multiple sets of first nodes and the corresponding unit (computing unit or storage unit) connection, or multiple sets of first inlets and multiple sets of first outlets can also be understood as multiple first inlets and multiple first outlets. The multiple first inlets or outlets are connected to a corresponding set of units (computing units or storage units). unit) connection, as shown in Figure 1.

It can be understood that in this chip structure, a three-dimensional network structure including a request transmission layer and a data transmission layer can be formed. In this embodiment, message transmission from the computing unit to the storage unit is implemented through the first data node network, that is, at the request transmission layer, the computing unit sends a request to the storage unit through the first data node; storage is implemented through the second data node network Message transmission in the direction of the unit to the computing unit, that is, at the data transmission layer, the storage unit returns the requested data to the computing unit through the second data node. Through this ingenious structural design, messages can be transmitted efficiently within the chip and achieve high bandwidth within the chip.

In an exemplary embodiment, the workload proof chip may include at least one group of computing units, at least one group of storage units, a first data node and a second data node. The first data node and the second data node Each includes a group (multiple) of first entrances and a group (multiple) of first exits. The group of computing units are connected to a group of first entrances of the first data node in a one-to-one correspondence. The first data node A group of first outlets is connected to a group of storage units in a one-to-one correspondence. The group of storage units is connected to a group of first inlets of the second data node in a one-to-one correspondence. A group of first outlets of the second data node is connected in a one-to-one correspondence. There is a one-to-one connection with a group of computing units, that is, a group of computing units is connected to a group of storage units through a first data node, and the group of storage units is connected to the group of computing units through a second data node.

Figure 1 takes two first data nodes and two second data nodes as an example. The two first data nodes are connected to each other, and the two second data nodes are connected to each other. At this time, a group of computing units passes through two One interconnected first data node is connected to two groups of storage units, and one group of storage units is connected to two groups of computing units through two interconnected second data nodes.

When the chip includes multiple first data nodes and multiple second data nodes, for example, including y first data nodes and y second data nodes, then a group of computing units are connected through y first data nodes. The data nodes are connected to y groups of storage units, and a group of storage units are connected to y groups of computing units through y interconnected second data nodes. The y first data nodes are interconnected to form a grid structure, and the y second data nodes are interconnected to form a grid structure. y is a positive integer greater than or equal to 2.

In an exemplary embodiment, the first data nodes may be interconnected using a grid, and the second data nodes may be interconnected using a grid. Each data node (including the first data node and the second data node) includes Multiple routing units, multiple arbitration units, a data exchange unit, an interconnection unit, multiple first entrances (i.e., the aforementioned first node entrances), multiple first exits (i.e., the aforementioned first node exits), one or A plurality of second node entrances (the second node entrances are the second entrances of the data node, referred to as second entrances herein) and one or more second node exits (the second node exits are the third entrances of the data node) Two outlets (herein referred to as the second outlet), the input end of each routing unit is connected to one of the first inlets, and the first output end of each routing unit is connected to the first input end of one of the arbitration units. One-to-one correspondence connection, the second output end of each routing unit is connected to the first input end of the data exchange unit, the first output end of the data exchange unit is connected to the second outlet, the The second input end of the data exchange unit is connected to the second inlet, the second output end of the data exchange unit is connected to the second input end of each arbitration unit, and the output end of each arbitration unit is connected to The input end of the interconnection unit is connected in a one-to-one correspondence, the output end of the interconnection unit is connected in a one-to-one correspondence with the first outlet, and the second inlet and the second outlet are configured to be connected to other data nodes, where:

The routing unit is configured to receive messages from the first portal and send the messages to the arbitration unit and/or the data exchange unit;

The data exchange unit is configured to receive messages from the second entrance, send the messages to the arbitration unit, and is configured to receive messages sent from the routing unit, and output the messages through the second outlet. ;

The arbitration unit is configured to receive messages sent by the routing unit and/or the data exchange unit, and send the messages to the first outlet through the interconnection unit.

The interconnection unit can realize sending the message sent by the arbitration unit to any first outlet.

The messages described herein may include requests or data or other similar information. For any data node, when the first entry of the data node is connected to the computing unit, that is, the data node is the first data node, then the data node transmits The message includes a message for requesting data. The second entrance and the second exit of the data node are still connected to the first data node; when the first entrance of the data node is connected to the storage unit, that is, the data node is the second data node. node, then the message transmitted by the data node includes a message containing data, and the second inlet and the second outlet of the data node are still connected to the second data node. The plurality mentioned herein includes 2 or more than 2.

The above data node structure is explained below. Each data node includes: a plurality of routing units, a plurality of arbitration units, a data exchange unit, an interconnection unit, a plurality of first entrances, a plurality of first exits, at least one second entrance, and at least one second exit, in:

The routing unit includes an input port, a first output port and a second output port. Each input port is connected to a first entrance, the first output port is connected to the input of the arbitration unit, and the second output port is connected to the input of the data exchange unit. connection, the routing unit is configured to receive the message input by the first portal and forward it to the arbitration unit or data exchange unit; in this example, each first portal is connected to an independent routing unit, and each routing unit is connected to a corresponding An independent arbitration unit, and each routing unit is connected to the data exchange unit, and the routing unit can route the message to the arbitration unit or the data exchange unit according to the destination contained in the message;

The data exchange unit includes a plurality of first input ports, a plurality of first output ports, a plurality of second input ports and a plurality of second output ports, and each first input port is connected to the second output port of the routing unit. connection, each first output The port is connected to a second outlet, each second input port is connected to a second inlet, and each second output port is connected to an input of an arbitration unit, the data exchange unit being configured to forward messages to other data nodes, or Receive messages sent by other data nodes;

The arbitration unit includes a first input port connected to the first output port of the routing unit, a second input port connected to the second output port of the data exchange unit, and an output port connected to the interconnection unit, so The arbitration unit is configured to receive messages sent by the routing unit and/or the data exchange unit and send them to the interconnection unit through the output port connected to the interconnection unit;

The interconnection unit includes a plurality of input ports and a plurality of output ports, each input port is connected to an output port of an arbitration unit, each output port is connected to a first outlet, and the interconnection unit is configured to connect the The message output by the arbitration unit is sent to any first outlet.

Using the proof-of-work chip provided by the embodiment of the present disclosure, message intercommunication is realized between the computing unit and the storage unit through the data node. Since the data node adopts a grid topology, it is connected to other data nodes through the data exchange unit, and through the interconnection The unit realizes data node output, has a simple structure, high efficiency, and high on-chip bandwidth. Therefore, the workload proof chip implemented through this data node also has high efficiency and on-chip bandwidth.

The number of the first inlet and the first outlet may be the same or different, that is, the number of computing units connected to each first data node and the number of storage units connected to each second data node may be the same or different. The first inlet and the number of the first outlet may be the same or different. The number of outlets can range from 2 to 16348, that is, the number of a group of computing units can range from 2 to 16348, and the number of a group of storage units can range from 2 to 16348. The number of each type of data node may be 1 or 2 or more, for example, it may be 4, 6, 9 or even more, and this application does not limit this. A mesh interconnection topology can be used between multiple first data nodes, and a mesh interconnection topology can also be used between multiple second data nodes. There is no connection between the first data nodes and the second data nodes. , the data nodes are arranged in a regular grid, and each data node is only connected to its adjacent data nodes in the same row or column.

The following describes the internal units of the data node. The internal units of the first data node and the second data node have the same composition.

In an exemplary embodiment, the arbitration unit may be an arbitration structure with backpressure and caching. The arbitration unit may cache a certain number of messages, and when the message can be received by the corresponding interconnection unit, send it to the corresponding interconnection unit. When the cache is full, the connection unit generates back pressure on the previous level unit (routing unit or data exchange unit) to prevent messages sent by the previous level unit from being received and lost. When the cache is full, the back pressure is released. Similarly, the routing unit may also be a routing structure with backpressure and caching.

In an exemplary embodiment, the arbitration unit may also be configured to set different weights for multiple input ports of the arbitration unit. The weight value of each input port represents the number of messages that the input port can continuously process. The arbitration unit can design the weight ratio of each port according to the data volume of each input port, which determines the proportion of messages passed by each port. When the ratio is set to match the proportion of requests or data that actually need to pass, it will increase overall system efficiency. In other embodiments, the weights of any two input ports may be the same, indicating that the two input ports can continuously process the same number of messages.

In an exemplary embodiment, the arbitration unit may also be configured to set different priorities for multiple input ports of the arbitration unit. When the arbitration unit processes a message, the arbitration unit selects the one with the highest priority and is pending. An input port that processes messages. After the message processing of the input port is completed, the priority of the input port is readjusted. The adjustment method may be, for example: after the message processing of the input port with the highest priority and a message to be processed is completed, The priority of this input port is adjusted to the lowest. In other embodiments, it is not excluded that the priorities can be set to the same.

The following description takes the arbitration unit adopting weighted polling arbitration with a weight ratio of 1:3 as an example. Take the arbitration unit including two input ports S1 and S2 as an example. Assume that the default priority of the two input ports is S1>S2, and assume that the weight of S1 is 3 and the weight of S2 is 1. The S1 port can be used for connection data exchange. port of the unit, the S2 port can be Structure connecting routing units. In this example, the number of weights is related to the number of sending requests. A weight of 3 means that a maximum of 3x (3×x) messages can be sent continuously. A weight of 1 means that a maximum of x messages can be sent continuously. x is greater than or equal to 1. An integer. When the weight is 0, the port can be considered closed and no messages are allowed to pass through. In this example, the principle of priority adjustment is to adjust the priority of the port to the lowest after the port has sent messages or has no messages.

An example of the weighted polling arbitration process of the arbitration unit is as follows: Assume that port S1 receives a request, and the current port S1 has the highest priority. Since the weight of port S1 is 3, the port S1 can send up to 3x requests continuously. When port S1 After sending 3x messages continuously or there is no message on port S1, the arbitration unit adjusts the priority order to: S2>S1; at this time, when there is a request on port S2, because S2 is the port with the highest priority currently, and because the port The weight of S2 is 1, so port S2 can send up to x messages continuously. When port S2 sends x messages continuously or there is no message on port S2, the arbitration unit adjusts the priority order to: S1>S2. Using the above weighted polling arbitration method can improve the processing efficiency of the arbitration unit, and the effect is obvious when the data pressure is high. In other embodiments, a fixed weight round-robin arbitration scheme (for example, the weight ratio of each port is fixed at 1:1) or a fixed priority arbitration scheme may be adopted.

The interconnection unit includes multiple input ports and multiple output ports. Data input from any input port can be output through any output port. That is to say, the interconnection unit can send messages to any destination according to the destination of the message. The first exit. The number of input ports and output ports can be the same or different. The number can be set according to the needs of the chip, for example, it can be set to 128 or 4096, etc. The interconnection unit can be implemented by, for example, a full crossbar switch (or fully associated crossbar switch). The full crossbar switch is a multi-entry and multi-outlet structure, and data can enter from any entrance and reach any exit.

In an exemplary embodiment, the chip may further include a compression unit and a decompression unit, each routing unit is connected to the data exchange unit through the compression unit, and the data exchange unit is connected to each arbitration unit through the decompression unit. Figure 2 is a schematic diagram of another data node structure provided by an embodiment of the present disclosure. In this example, the second output port of each routing unit is connected to the input port of the compression unit, and the output port of the compression unit is connected to the third port of the data exchange unit. One input port connection. The second output port of the data exchange unit is connected to the input port of the decompression unit, and the output port of the decompression unit is connected to the second input port of each arbitration unit.

The compression unit is used to compress the number of buses, compressing m buses input by m routing units into n buses and outputting them to the data exchange unit. For example, the compression unit includes m input ports and n output ports, and m and n are both A positive integer greater than zero, and m>n. The compression unit can compress the number of buses connected to multiple routing units to reduce the number of buses, that is, the number of buses is compressed from m to n, thereby reducing the complexity of the data exchange unit. The number of buses can be compressed because when the messages entering the first entrance pass through the routing unit, some of the messages are routed to the arbitration unit, so the bus pressure routed to the compression unit must be reduced, so a smaller number of messages can be used through the compression unit. The bus carries these messages. Taking the proof-of-work chip as an example containing 4 first data nodes, the bus can be compressed according to a compression ratio of 4:3, that is, m:n=4:3, because the message sent by the first entrance passes through After the routing unit, with probability 1/4 is sent to the arbitration unit and 3/4 is sent to the compression unit. The compressed bus (still a multi-group bus) is connected to the data exchange unit. In other embodiments, the compression ratio may be set to other ratios.

The function of the decompression unit is opposite to that of the compression unit. It restores the number of buses to the same number as the arbitration unit, including n input ports and m output ports. It restores the n buses input by the data exchange unit to m buses and inputs m respectively. The arbitration unit decompresses the number of buses from n to m to facilitate bus arbitration operations.

Figure 3 is an example of a compression unit and decompression unit. In this example, compression and decompression of 4 groups of buses are taken as an example. The compression unit can compress 4 groups of buses into 3 groups, and the decompression unit can compress 3 groups of buses. Restore to 4 groups, so that fewer buses can be used to transmit data without affecting chip functionality. In the figure, S00, S01, S02, and S03 are data sources, which are connected to the buses S10, S11, S12, and S13 respectively; the buses S10, S11, S12, and S13 are connected to the compression unit S2. are connected to the arbitration units S220, S221, and S222 in the compression unit S2 respectively. The arbitration units S220, S221, and S222 may be weighted round robin arbiters. In some examples, the arbitration units S220, S221, and S222 may also use ordinary arbiters or round robin arbiters. The routing unit S20 is connected to the cache units S210, S211, and S212 respectively; the cache units S210, S211, and S212 are connected to the arbitration units S220, S221, and S222 respectively; the arbitration units S220, S221, and S222 are connected to the compressed buses S30, S31, and S32. ; Buses S30, S31, and S32 are connected to the decompression unit S4, and are respectively connected to the routing units S400, S401, and S402 of S4; the routing units S400, S401, and S402 are respectively connected to the restored buses S50, S51, and S52; the routing unit S400, S401 and S402 are both connected to the arbitration unit S41; the arbitration unit S41 can be a polling arbiter or an ordinary arbiter; the arbitration unit S41 is connected to the restored bus S53; the buses S50, S51, S52, and S53 are respectively connected to the data endpoints. S60, S61, S62, S63 are connected.

The data compression workflow is as follows:

Data sources S00, S01, S02, and S03 send data to buses S10, S11, S12, and S13 respectively; among them: the data of bus S13 is divided into 3 parts through the routing unit S20 and cached in the cache units S210, S211, and S212 respectively; The data of the cache unit S210 and the data of the bus S10 pass through the arbitration unit S220 to generate the data of the bus S30; the data of the cache unit S211 and the data of the bus S11 pass through the arbitration unit S221 to generate the data of the bus S31; the data of the cache unit S212 and the data of the bus S12 The data of bus S32 is generated through arbitration unit S222; now the data compression is completed;

The data decompression workflow is as follows:

Buses S30, S31, S32 transmit data to decompression unit S4; routing unit S400 receives the data of bus S30, separates the data of bus S10 and sends it to bus S50, completes the restoration of the data of bus S10, and separates the data of bus S13 Sent to arbitration unit S41; routing unit S401 receives the data of bus S31, separates the data of bus S11 and sends it to bus S51, completes the restoration of bus S11 data, and sends the separated data of bus S13 to arbitration unit S41; routing unit S402 receives the data of bus S32, separates the data of bus S12 and sends it to bus S52, completes the restoration of bus S12 data, and sends the separated data of bus S13 to arbitration unit S41; arbitration unit S41 receives routing units S400, S401 , the data of S402 is sent to the bus S53 to complete the data restoration of the bus S13; the buses S50, S51, S52, and S53 send the data to the data end points S60, S61, S62, and S63 respectively.

The data exchange unit in each data node may include k data exchange sub-units, where k is a positive integer greater than or equal to 2. The value of k depends on the number of routing units or the compression ratio of the compression unit. in:

When the data exchange unit is directly connected to the routing unit (the structure shown in Figure 1), the number of data exchange sub-units is the same as the number of routing units. Each data exchange sub-unit includes a group of data exchange units used to connect to the routing unit and arbitration unit. The input and output ports - the first input port, the second output port, and one or more groups of input and output ports for connecting with the second inlet and the second outlet - the second input port, the first output port, wherein , the first input port is connected to the routing unit, a first output port is connected to a second outlet, a second input port is connected to a second inlet, and the second output port is connected to the arbitration unit.

When the data exchange unit is connected to the compression unit and decompression unit respectively (the structure is shown in Figure 2), the number of data exchange sub-units is the same as the number of output ports of the compression unit. Each data exchange subunit includes a set of input and output ports for connecting with the compression unit and decompression unit - a first input port, a second output port, and one or more sets for connecting with the second inlet and the second output port. The input and output ports connected to the second outlet - the second input port and the first output port, wherein the first input port is connected to the compression unit, a first output port is connected to a second outlet, and a second input port is connected to A second inlet is connected, and a second output port is connected to the decompression unit. It can be seen that after connecting the compression unit and decompression unit, the complexity of the data exchange unit can be reduced due to the reduction in the number of buses.

Each data exchange subunit includes multiple groups of routing subunits and arbitration subunits. Routing subunits and arbitration subunits The number of routing subunits and arbitration subunits are interconnected in pairs. The number of routing subunits and arbitration subunits depends on the number of nodes adjacent to the data node where the data exchange unit is located. For example, it can be the number of adjacent data nodes + 1 , when the current data node has 2 adjacent nodes, the number of routing sub-units and arbitration sub-units is both 2+1. In each data exchange subunit, the first input port is connected to a routing subunit, the first output port is connected to an arbitration subunit, a second input port is connected to a routing subunit, and a second output The port is connected to an arbitration subunit.

Taking two adjacent data nodes as an example, the structure of a set of bus-connected data exchange subunits (including one input and one output) is shown in Figure 4. The data exchange subunit in the figure is a pairwise interconnection structure including three groups of routing subunits and arbitration subunits. One group of routing subunits and arbitration subunits are respectively connected with the compression unit (with the routing unit when there is no compression unit) and the decompression unit. (when there is no decompression unit, it is connected to the arbitration unit). The other two sets of routing subunits and arbitration subunits are respectively connected to the data exchange units of two adjacent data nodes. Among them, the routing subunit is connected to the data exchange subunit of the adjacent node. The arbitration subunit is connected to the routing subunit of the data exchange subunit of the adjacent node. K data exchange sub-units form a data exchange unit.

The arbitration subunit within the data exchange subunit can adopt a weighted round robin arbitration method. During weighted polling arbitration, each input port can be configured with a weight. The weight ratio represents the ratio of the amount of messages passing through each input port. Taking the workload proof chip as an example, the chip contains 4 first data nodes. Among the 4 first data nodes When the data exchange unit is arranged in 2×2 and the message routing method is first horizontal routing and then vertical routing (that is, when two nodes located on the diagonal want to transmit a message, the message is first routed to the horizontal adjacent node, and then routed to Message end point), in each data exchange sub-unit, the weight ratio of the entrance connected to the horizontal node in the arbitration sub-unit connected to the decompression unit: the entrance connected to the vertical node is 1:2; other arbitrators in the data exchange sub-unit The entrance weight ratio of the unit is 1:1. The implementation process of the weighted polling arbitration method is as described above, and will not be repeated here. The use of the weighted polling arbitration method can improve the efficiency of the data exchange unit. In other embodiments, the arbitration subunit within the data exchange subunit may also use round-robin arbitration or fixed priority arbitration.

Figure 5 is a schematic structural diagram of a proof-of-work chip provided by an embodiment of the present disclosure. Figure 5A is a schematic diagram of the connection relationship of four first data nodes. Figure 5B is a schematic diagram of the connection relationship of four second data nodes. In this example, each The data node includes a compression unit and a decompression unit. The structure of each data node in the figure is the same. The first data node and the second data node both adopt a 2×2 mesh topology. Each data exchange unit contains a data exchange subunit. The structure is shown in Figure 4. Assume that computing unit A11 starts to perform workload proof calculation and needs to request data in storage unit B41, which is recorded as request 1. As shown in Figure 5A, this request 1 is first sent to the first data node 1 and is connected to the corresponding computing unit A11. In the routing unit, the request 1 is cached by the corresponding routing unit in the first data node 1. When the corresponding routing unit processes the cached request, the request 1 is sent to the data exchange unit of the first data node 1 through the compression unit. The request 1 is sent to the data node 4 through the data exchange unit of the first data node 4, and then sent to the storage unit B41 through the decompression unit, arbitration unit and interconnection unit of the first data node 4. Request 1 accesses storage unit B41 and obtains the requested data, which is recorded as data 1. As shown in Figure 5B, data 1 is sent to computing unit A11 through second data node 4 and second data node 1 in sequence. The process is the same as request 1. Similar, will not be repeated here. So far the computing unit A11 has completed the request for the data located on the storage unit B41. Any computing unit can refer to the above process to obtain the data required for proof of work from any storage unit and perform proof of work calculations.

Figure 6 is a schematic diagram of the interconnection of 6 data exchange units in the 6 data nodes when the workload proof chip structure includes 6 first data nodes (or second data nodes). The data exchange units adopt a 2×3 mesh topology distribution. That is, 6 data nodes are distributed in a 2×3 mesh topology. At this time, since the data exchange unit located in the middle row is connected to three adjacent data nodes, the number of routing subunits and arbitration subunits in the data exchange subunit in each data exchange unit is 3+1. The internal structure of each data subunit is shown in Figure 7, including 4 groups of routing subunits and arbitration subunits interconnected in pairs, that is, 4 routing subunits and 4 arbitration subunits are interconnected in pairs.

Figure 8 is a schematic diagram of the interconnection of 9 data exchange units among the 9 data nodes when the workload proof chip includes 9 first data nodes (or second data nodes). The data exchange units adopt a 3×3 mesh topology distribution, that is, 9 Data nodes are distributed in a 3×3 mesh topology. At this time, since the data exchange unit located in the middle is connected to four adjacent data nodes, the number of routing subunits and arbitration subunits in the data exchange subunit in each data exchange unit is 4+1. The internal structure of each data subunit is shown in Figure 9, including 5 groups of routing subunits and arbitration subunits interconnected in pairs.

Using the solutions of the embodiments of the present disclosure, it is possible to implement at least a workload proof chip with a total of [2 to 9]*2 data nodes, where 2 to 9 represents the number of first data nodes or second data nodes. Although the mesh distribution is used as an example for explanation, it does not rule out that other topologies can be used. For example, when the number of data nodes is small, such as 3 or 5, a star structure can be used.

Taking a total of 120 computing units and 120 storage units as an example, if a full crossbar switch is used to implement it, a 120×120 full crossbar switch is required, which is difficult to achieve at the current technological level. When a mesh structure is used to implement it, 16 ×8 mesh arrangement will lead to very low efficiency. Using the workload proof chip structure provided in this embodiment, taking the structure of 4 first data nodes and 4 second data nodes as an example, each data node includes 30 sets of entrances and exits, that is, each data node is connected to 30 computing nodes. unit and 30 storage units, the interconnection unit of each data node can be a 30×30 full crossbar switch, then the workload proof chip only requires 4*2 30×30 full crossbar switches and 2 groups of 2×2 The mesh interconnection can realize the exchange of messages between any computing unit and any storage unit. Since multiple mesh nodes share the number of ports, it avoids the problem of simply using a full crossbar switch with too many ports, resulting in a scale that is too large to be implemented. It can be implemented using fewer data nodes, with a simple structure and high efficiency. At the same time, higher on-chip bandwidth can be obtained by using the workload proof chip implementation provided by the embodiments of the present disclosure. It is actually measured that the workload proof chip can achieve an on-chip bandwidth of approximately 6144GB/s under a port width of 1024bit and a clock frequency of 500M. , far exceeding the 1004GB/s on-chip bandwidth of the current most high-end GPUs.

As shown in Figure 10, an embodiment of the present disclosure also provides an electronic device including the above-mentioned proof-of-work chip. The electronic device may be, for example, a terminal device that can provide computing services, such as a tablet computer, a notebook computer, a handheld computer, Mobile Internet devices, wearable devices, etc. Or it can be a server that can provide computing services such as cloud services, cloud computing, cloud storage, network services, cloud communications, middleware services, domain name services, security services, etc.

In the description of the embodiments of the present disclosure, it should be noted that, unless otherwise clearly stated and limited, the terms "installation", "connection" and "connection" should be understood in a broad sense. For example, it can be a fixed connection or a fixed connection. Detachable connection, or integral connection; it can be a mechanical connection or an electrical connection; it can be a direct connection or an indirect connection through an intermediate medium; it can be an internal connection between two components. For those of ordinary skill in the art, the meanings of the above terms in this disclosure can be understood according to the circumstances.

Those of ordinary skill in the art can understand that all or some steps, systems, and functional modules/units in the devices disclosed above can be implemented as software, firmware, hardware, and appropriate combinations thereof. In hardware implementations, the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may consist of several physical components. Components execute cooperatively. Some or all of the components may be implemented as software executed by a processor, such as a digital signal processor or a microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer-readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). As is known to those of ordinary skill in the art, the term computer storage media includes volatile and nonvolatile media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. removable, removable and non-removable media. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disk (DVD) or other optical disk storage, magnetic cassettes, tapes, disk storage or other magnetic storage devices, or may Any other medium used to store the desired information and that can be accessed by a computer. Furthermore, as is known to those of ordinary skill in the art, communication media typically embodies computer readable instructions, data structures, program modules or other information in a modulated data signal such as a carrier wave or other transport mechanism. other data and may include any information delivery medium.

Claims

A workload proof chip includes at least one group of computing units, at least one group of storage units, at least two first data nodes and at least two second data nodes, each of the first data nodes and each of the third data nodes. The two data nodes include multiple groups of first entrances and multiple groups of first exits. The group of computing units are connected to a group of first entrances of a first data node in a one-to-one correspondence. The group of first entrances of the first data node are connected in a one-to-one correspondence. The outlet is connected to a group of storage units in a one-to-one correspondence. The group of storage units is connected to a group of first entrances of a second data node in a one-to-one correspondence. A group of first exits of the second data node is connected to the first entrance of the second data node. Groups of computing units are connected in one-to-one correspondence; among them:

The computing unit is configured to send a message for requesting data to the storage unit through the first data node when performing proof-of-work calculations;

The storage unit is configured to store a data set used in the proof-of-work calculation, and in response to a message from the computing unit, sends a message containing the data to the computing unit through the second data node;

The first data node is configured to send a message requesting data sent by the computing unit to the storage unit;

The second data node is configured to send the message containing data sent by the storage unit to the computing unit.
The proof-of-work chip according to claim 1, wherein,

The first data nodes include y, the second data nodes include y, the first data nodes adopt grid interconnection, and the second data nodes adopt grid interconnection;

A group of computing units is connected to a y group of storage units through the first data nodes of the grid interconnection. A group of storage units is connected to a y group of computing units through the second data nodes of the grid interconnection. y is greater than or equal to 2 positive integer.
The proof-of-work chip according to claim 1 or 2, wherein,

Each of the first data nodes and each of the second data nodes includes a plurality of routing units, a plurality of arbitration units, a data switching unit, an interconnection unit, a plurality of first entrances, a plurality of first exits, One or more second inlets and one or more second outlets, the input end of each routing unit is connected to one of the first inlets, and the first output end of each routing unit is connected to one of the arbitration The first input terminals of the units are connected in one-to-one correspondence, the second output terminals of each routing unit are connected to the first input terminals of the data exchange units, and the first output terminals of the data exchange units are connected to the third Two outlets are connected, the second input end of the data exchange unit is connected to the second inlet, the second output end of the data exchange unit is connected to the second input end of each arbitration unit, each of the The output end of the arbitration unit is connected to the input end of the interconnection unit in a one-to-one correspondence, the output end of the interconnection unit is connected to the first outlet in a one-to-one correspondence, and the second inlet and the second outlet are configured to communicate with other Data node connections, where:

The routing unit is configured to receive messages from the first portal and send the messages to the arbitration unit and/or the data exchange unit;

The data exchange unit is configured to receive messages from the second entrance, send the messages to the arbitration unit, and is configured to receive messages sent from the routing unit, and output the messages through the second outlet. ;

The arbitration unit is configured to receive messages sent by the routing unit and/or the data exchange unit, and send the messages to the first outlet through the interconnection unit.
The workload proof chip according to claim 3, wherein the data node further includes a compression unit and a decompression unit, each of the routing units is connected to the data exchange unit through the compression unit, and the data exchange unit The decompression unit is connected to each arbitration unit; where:

The compression unit includes m input ports and n output ports, and is configured to compress the m buses input by the m routing units into n buses and output them to the data exchange unit;

The decompression unit includes n input ports and m output ports, and is configured to restore the n buses input by the data exchange unit to m buses and input m arbitration units respectively;

Wherein, the m and n are both positive integers greater than zero, and m>n.
The proof-of-work chip according to claim 3, wherein,

The data exchange unit includes a plurality of data exchange sub-units. The number of the data exchange sub-units is the same as the number of the routing units. Each data exchange sub-unit includes a first input port for connecting with the routing unit. a first output port connected to the second outlet, a second input port connected to the second inlet, and a second output port connected to the arbitration unit.
The proof-of-work chip according to claim 4, wherein,

The data exchange unit includes n data exchange subunits, and each data exchange subunit includes a first input port for connecting with the compression unit, a first output port for connecting with a second outlet, and a first output port for connecting with the second outlet. a second input port connected to the second inlet, and a second output port connected to the decompression unit.
The proof-of-work chip according to claim 5 or 6, wherein,

Each data exchange subunit includes multiple groups of routing subunits and arbitration subunits. The routing subunits and arbitration subunits are interconnected in pairs. The first input port is connected to one routing subunit, and the first The output port is connected to an arbitration subunit, a second input port is connected to a routing subunit, and a second input port is connected to an arbitration subunit.
The proof-of-work chip according to claim 7, wherein,

The arbitration subunit is configured to set the same or different weights for multiple input ports of the arbitration subunit, and the weight value of each input port represents the number of messages that the input port can continuously process.
The proof-of-work chip according to claim 7, wherein,

The arbitration subunit is configured to set different priorities for multiple input ports of the arbitration subunit. When the arbitration subunit processes a message, the arbitration subunit selects the input port with the highest priority and the message to be processed. , after the message processing of the input port is completed, the priority of the input port is readjusted.
The proof-of-work chip according to claim 3, wherein,

The arbitration unit is also configured to set the same or different weights for multiple input ports of the arbitration unit, and the weight value of each input port represents the number of messages that the input port can continuously process.
The proof-of-work chip according to claim 3, wherein,

The arbitration unit is also configured to set different priorities for multiple input ports of the arbitration unit. When the arbitration unit processes a message, the arbitration unit selects the input port with the highest priority and the message to be processed. After the message processing of the input port is completed, the priority of the input port is readjusted.
The proof-of-work chip according to claim 3, wherein,

The interconnection unit includes a plurality of input ports and a plurality of output ports, each input port is connected to one of the arbitration units, each output port is connected to one of the first outlets, and the interconnection unit is configured to connect all The message output by the arbitration unit is sent to any one of the first exits.
The proof-of-work chip according to claim 12, wherein the interconnection unit is a full crossbar switch.
The proof-of-work chip according to claim 3, wherein,

The workload proof chip includes 4 data nodes. The 4 data nodes adopt a 2×2 grid topology. The data exchange unit in each data node includes n data exchange sub-units. Each data exchange sub-unit Including three groups of routing subunits and arbitration subunits interconnected in pairs; or

The workload proof chip includes 6 data nodes. The 6 data nodes adopt a 2×3 grid topology. The data exchange unit in each data node includes n data exchange sub-units. Each data exchange sub-unit Including 4 groups of routing sub-units and arbitration sub-units interconnected in pairs; or

The workload proof chip includes 9 data nodes. The 9 data nodes adopt a 3×3 grid topology. The data exchange unit in each data node includes n data exchange sub-units. Each data exchange sub-unit It includes 5 groups of routing subunits and arbitration subunits interconnected in pairs.
An electronic device including the proof-of-work chip according to any one of claims 1-14.