WO2024016659A1 - 工作量证明芯片及其处理方法 - Google Patents

工作量证明芯片及其处理方法 Download PDF

Info

Publication number
WO2024016659A1
WO2024016659A1 PCT/CN2023/077712 CN2023077712W WO2024016659A1 WO 2024016659 A1 WO2024016659 A1 WO 2024016659A1 CN 2023077712 W CN2023077712 W CN 2023077712W WO 2024016659 A1 WO2024016659 A1 WO 2024016659A1
Authority
WO
WIPO (PCT)
Prior art keywords
unit
arbitration
data
routing
arbitration unit
Prior art date
Application number
PCT/CN2023/077712
Other languages
English (en)
French (fr)
Inventor
蔡凯
田佩佳
刘明
张雨生
闫超
Original Assignee
声龙(新加坡)私人有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 声龙(新加坡)私人有限公司 filed Critical 声龙(新加坡)私人有限公司
Publication of WO2024016659A1 publication Critical patent/WO2024016659A1/zh

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/25Routing or path finding in a switch fabric
    • H04L49/253Routing or path finding in a switch fabric using establishment or release of connections between ports
    • H04L49/254Centralised controller, i.e. arbitration or scheduling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication
    • G06F15/17Interprocessor communication using an input/output type connection, e.g. channel, I/O port
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication
    • G06F15/173Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
    • G06F15/17306Intercommunication techniques
    • G06F15/17312Routing techniques specific to parallel machines, e.g. wormhole, store and forward, shortest path problem congestion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7807System on chip, i.e. computer system on a single chip; System in package, i.e. computer system on one or more chips in a single package
    • G06F15/781On-chip cache; Off-chip memory
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/10Packet switching elements characterised by the switching fabric construction
    • H04L49/109Integrated on microchip, e.g. switch-on-chip
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the embodiments of the present disclosure relate to, but are not limited to, the field of computer application technology, and particularly refer to a proof-of-work chip and a processing method thereof.
  • Proof of Work is a hash function that can use Central Processing Unit (CPU), graphics Processor (Graphic Processing Unit, GPU) or Field-Programmable Gate Array (FPGA) to solve it.
  • CPU Central Processing Unit
  • GPU Graphic Processing Unit
  • FPGA Field-Programmable Gate Array
  • the process of solving requires random address access to a large data set, and the entire data set is generally stored in memory or video memory.
  • the common problems of using CPU, GPU or FPGA to complete workload proof are high power consumption, low efficiency and the need for external memory or video memory to store data sets.
  • embodiments of the present disclosure provide a proof-of-work chip that includes 2 or more nodes.
  • Each node includes: a computing unit, a storage unit, a first routing unit, and a first arbitration unit. , at least one second routing unit and at least one second arbitration unit, the number of the second routing unit and the second arbitration unit is the same, wherein the computing unit is connected to the storage unit, and the computing unit is connected to the storage unit.
  • the output port of the storage unit is connected to the input port of the first routing unit, and the output port of the first routing unit and the output port of the second routing unit are connected to the input port of the second arbitration unit.
  • the output port of the second arbitration unit is set to be connected to the input port of the second routing unit of other nodes, and the input ports of the computing unit and the storage unit are both connected to the output port of the first arbitration unit,
  • the input port of the first arbitration unit is connected to the output port of the second routing unit, and the input port of the second routing unit is configured to be connected to the output port of the second arbitration unit of other nodes.
  • the computing unit is configured to request data from the storage unit of this node or other nodes to perform workload proof calculations
  • the storage unit is configured to store data sets used in proof-of-work calculations, and in response to requests from computing units of this node or other nodes, send data to computing units of this node or other nodes;
  • the first routing unit is configured to receive a request sent by the computing unit or data sent by the storage unit, and forward the request or data to the second arbitration unit;
  • the first arbitration unit is configured to receive the request sent by the second routing unit and forward it to the storage unit, and to receive the data sent by the second routing unit and forward it to the computing unit;
  • the second routing unit is configured to receive requests or data sent by other nodes and forward them to the first arbitration unit or the second arbitration unit;
  • the second arbitration unit is configured to receive requests or data sent by the first routing unit or the second routing unit and forward them to other nodes.
  • inventions of the present disclosure also provide a processing method for a workload proof chip.
  • the workload proof chip is any one of the aforementioned workload proof chips.
  • the processing method includes:
  • the computing unit When the computing unit needs the data in the data set of other node storage units to perform workload proof calculation, it sends a request to the first routing unit of the node, and the first routing unit sends the request to the second arbitration unit of the node.
  • the second arbitration unit sends the request to other nodes;
  • the second routing unit After receiving the data requested by the computing unit sent by other nodes, the second routing unit sends the data to the first arbitration unit, and the first arbitration unit sends the data to the computing unit. .
  • Figure 1 is a schematic structural diagram of a workload proof chip according to an embodiment of the present disclosure
  • Figure 2 is a schematic structural diagram of a proof-of-work chip including 2 nodes provided by an embodiment of the present disclosure
  • Figure 3 is a schematic structural diagram of a proof-of-work chip including 4 nodes provided by an embodiment of the present disclosure
  • Figure 4 is a schematic structural diagram of a proof-of-work chip including 9 nodes provided by an embodiment of the present disclosure
  • Figure 5 is a schematic diagram of the internal structure of node S11 in Figure 4.
  • Figure 6 is a flow chart of a workload proof chip processing method according to an embodiment of the present disclosure.
  • the disclosed embodiment provides an Application Specific Integrated Circuit (ASIC) chip structure that can be used to complete proof-of-work calculations. Compared with traditional CPU, GPU or FPGA structures, the structure of this embodiment has lower power consumption. It is more efficient and gets rid of external memory or video memory, and can store data sets directly inside the ASIC chip.
  • ASIC Application Specific Integrated Circuit
  • FIG. 1 is a schematic structural diagram of a proof-of-work chip provided by an embodiment of the present disclosure, including two or more nodes.
  • Each node includes: a computing unit 10, a storage unit 20, a first routing unit 30, a third An arbitration unit 40, at least one second routing unit 50 and at least one second arbitration unit 60.
  • the number of the second routing units 50 and the second arbitration unit 60 is the same, which is 1 or 2 or more, wherein, the computing unit 10 is connected to the storage unit 20, and the output ports of the computing unit 10 and the storage unit 20 are connected to the input port of the first routing unit 30.
  • the output port of the first routing unit 30 and the second routing unit 50 The output ports of are all connected to the input ports of the second arbitration unit 60, wherein the output ports of the first routing unit 30 are respectively connected to the input ports of each second arbitration unit 60, and the output ports of the second routing unit 50 are connected to the input ports of the second arbitration units 60.
  • the input ports of the unit 60 are connected in a one-to-one correspondence, that is, the output port of a second routing unit 50 is connected to the input port of a second arbitration unit 60, and the output port of the second arbitration unit 60 is set to be connected to the second routing unit of other nodes.
  • each second arbitration unit 60 is connected to the input port of the second routing unit 50 of a different node, wherein the output port of the second arbitration unit 60 is connected to the input port of the second routing unit 50
  • the input port of the computing unit 10 and the storage unit 20 is connected to the output port of the first arbitration unit 40
  • the input port of the first arbitration unit 40 is connected to the output port of the second routing unit 50
  • the second The input port of the routing unit 50 is configured to be connected to the output port of the second arbitration unit 60 of other nodes
  • the input port of each second routing unit 50 is configured to be connected to the output port of the second arbitration unit 60 of a different node, wherein the first The input port of the second routing unit 50 is connected to the output port of the second arbitration unit 60 in a one-to-one correspondence;
  • the computing unit 10 is configured to request data from the storage unit 20 of this node or other nodes to perform workload proof calculation;
  • the storage unit 20 is configured to store data sets used in proof-of-work calculations, and send data to the computing units 10 of this node or other nodes in response to requests from the computing units 10 of this node or other nodes;
  • the first routing unit 30 is configured to receive the request sent by the computing unit 10 or the data sent by the storage unit 20, and forward the request or data to the second arbitration unit 60;
  • the first arbitration unit 40 is configured to receive the request sent by the second routing unit 50 and forward it to the storage unit 20, and to receive the data sent by the second routing unit 50 and forward it to the computing unit 10;
  • the second routing unit 50 is configured to receive requests or data sent by other nodes and forward them to the first arbitration unit 40 or the second arbitration unit 60;
  • the second arbitration unit 60 is configured to receive requests or data sent by the first routing unit 30 or the second routing unit 50 and forward them to other nodes.
  • the storage unit is set inside the node, without the need for external memory or video memory, and is not limited by the bandwidth of the memory interface and video memory structure. Higher bandwidth can be achieved inside the chip, and the data set is stored in Inside the node, the power consumption of proof-of-work calculations can be reduced and efficiency improved.
  • This disclosed embodiment uses requests and data as examples to illustrate the signal flow.
  • the information content transmitted within the node cannot be used as a limitation on this application.
  • other information content can be executed with reference to the signal flow of requests and data.
  • the second routing unit 50 and the second arbitration unit 60 each include n, n is a positive integer greater than or equal to 2, and the first arbitration unit 40 includes n input ports, each input port The port is connected to an output port of the second routing unit 50.
  • the second arbitration unit 60 includes n input ports, one of which is connected to the output port of the first routing unit 30, and the remaining n-1 input ports Connected to the output ports of other second routing units 50 in a one-to-one correspondence, the first routing unit 30 includes n output ports, each output port is connected to an input port of the second arbitration unit 60, the second routing unit 50 includes n output ports, one of which is connected to the input port of the first arbitration unit 40 , and the remaining n-1 input ports are connected to the input ports of other second arbitration units 60 in a one-to-one correspondence.
  • a second arbitration unit 60 is set as a second routing unit connected to a node, when there are n second arbitration units, it means that the maximum number of nodes connected to the current node is n, that is, the current node is connected to at most n nodes. , for example, when n is 1, the current node is connected to one node; when n is 2, the current node is connected to at most two nodes; when n is 3, the current node is connected to at most three nodes; when n is 4, The current node is connected to up to four nodes.
  • the arbitration unit when the arbitration unit has more than two input ports, the arbitration unit may be configured to set the same or different weights for each input port of the arbitration unit, and the weight value of each input port represents The expected value of the number of requests or data that the input port can continuously handle. For example, the weight ratio of each input port is equal to the expected ratio of the number of requests or data that each port should pass in the design. It is found through experiments that for each port After setting different weights, the computing speed of chip sorting can be increased and the processing efficiency can be improved.
  • the same or different priorities can be set for each input port, and the priority of the input port can also be reduced after there is no request or data at the input port.
  • Priority experiments have proven that dynamically adjusting priority can improve the processing efficiency of the chip compared to fixed priority.
  • the weight and priority can be set simultaneously for each input port of the arbitration unit.
  • the arbitration unit receives the request or data, it selects the pending request or data with the highest priority based on the priority of each port.
  • the port the number of processing requests or data is determined according to the weight of the port.
  • the priority of the port is readjusted, for example, the priority of the port is adjusted to the lowest, and the next highest priority is selected.
  • the priority of the port is adjusted, for example, the priority of the port is adjusted to the lowest.
  • FIG. 2 is a schematic structural diagram of a proof-of-work chip containing 2 nodes provided by an embodiment of the present disclosure.
  • the chip includes node 1 and node 2.
  • Each node has the same structure, and each node includes 1 computing unit and 1 storage unit. , 1 first routing unit, 1 first arbitration unit, 1 second routing unit and 1 second arbitration unit.
  • the computing unit is connected to the storage unit.
  • the output ports of the computing unit and the storage unit are both connected to the input port of the first routing unit.
  • the output port of the first routing unit is connected to the input port of the second arbitration unit.
  • the output port of the second arbitration unit is connected to the input port of the second routing unit of another node, the input ports of the computing unit and the storage unit are both connected to the output port of the first arbitration unit, and the input port of the first arbitration unit is connected to the output port of the second routing unit, The input port of the second routing unit is connected to the output port of the second arbitration unit of another node. Since there is only one bus for request and data transmission between two nodes, the routing unit and arbitration unit may be omitted in other embodiments.
  • FIG. 3 is a schematic structural diagram of a workload proof chip including 4 nodes provided by an embodiment of the present disclosure.
  • the chip includes a first node, a second node, a third node and a fourth node.
  • the structure of each node is the same.
  • the computing unit is connected to the storage unit.
  • the output ports of the computing unit and the storage unit are both connected to the input port of the first routing unit.
  • the two output ports of the first routing unit are respectively connected to the input ports of the two second arbitration units.
  • Each The output port of the second arbitration unit is connected to the input port of the second routing unit of the adjacent node.
  • the input ports of the computing unit and the storage unit are both connected to the output port of the first arbitration unit.
  • the two input ports of the first arbitration unit are respectively It is connected to the output ports of the two second routing units, and the input port of the second routing unit is connected to the output port of the second arbitration unit of the adjacent node.
  • each arbitration unit has two input ports.
  • the weight and priority can be set for each input port.
  • the arbitration unit processes the request. or data, you can choose which port to process the request or data first according to the priority, and determine the number of requests or data to be processed according to the weight of the port.
  • FIG 4 is a schematic structural diagram of a proof-of-work chip including 9 nodes provided by an embodiment of the present disclosure.
  • the 9 nodes included in the proof-of-work chip are: node S00, node S01, node S02, node S10, node S11, node S12, node S20, node S21 and node S22.
  • Each node is connected to adjacent nodes, and each node has the same structure.
  • Figure 5 is a schematic diagram of the internal structure of node S11.
  • the node S11 includes a computing unit S1180, a storage unit S1181 connected thereto, a first routing unit S1190 (abbreviated as routing unit in the figure) connected to the computing unit S1180 and the storage unit S1181 respectively, and a first routing unit S1190 connected to the computing unit S1180 and the storage unit S1181 respectively.
  • the connected first arbitration unit S1191 (abbreviated as arbitration unit in the figure).
  • the node S11 also includes four second arbitration units (abbreviated as arbitration unit in the figure) S110, S112, S114 and 4 connected to the first routing unit S1190.
  • S116 and four second routing units (abbreviated as routing units in the figure) S111, S113, S115 and S117 connected to the first arbitration unit S1191.
  • the arbitration units S110, S112, S114, S116, and S1191 may be arbitration structures with backpressure and caching. These arbitration units may cache a certain number of requests or data, and when the requests or data can be When the corresponding interconnection structure receives it, it is sent to the corresponding interconnection structure (referring to the node connected to this unit). When the cache is full, back pressure is generated on the previous level structure to prevent requests or requests from the previous level structure. The data cannot be received and is lost. When the cache is full, the backpressure is released.
  • the arbitration units can design the weight ratio of each port according to the data volume of each input port, which determines the proportion of requests or data passed by each port. When the ratio is set to the actual number of requests or data that needs to be passed, If the proportions are consistent, the efficiency of the entire system will be improved.
  • the arbitration unit S110 includes four input ports: S1100, S1101, S1102 and S1103. Assume that the default priorities of the four input ports are S1100>S1101>S1102>S1103, and assume that the weight of S1100 is 4, the weight of S1101 is 2, the weight of S1102 is 1, and the weight of S1103 is 0.
  • the number of weights is related to the number of requests sent. A weight of 4 means that up to 4 requests can be sent continuously. A weight of 0 means that the port is closed and the request is not allowed to pass.
  • the principle of priority adjustment is to adjust the priority of the port to the lowest after the port has sent a request or has no request.
  • port S1100 receives the request, and the current priority of port S1100 is the highest. Since the weight of port S1100 is 4, port S1100 can receive up to 4 consecutive requests.
  • the arbitration unit S110 adjusts the priority order to: S1101>S1102>S1103>S1100;
  • Case 1 There is a request on port S1101. Since S1101 is the port with the highest priority and the weight of port S1101 is 2, port S1101 can send up to 2 requests in a row. When port S1101 sends 2 requests in a row or After there is no request from S1101, the arbitration unit S110 adjusts the priority order to: S1102>S1103>S1100>S1101;
  • Each arbitration unit S110, S112, S114, S116 and S1191 can adopt the above weighted priority polling arbitration scheme, which can improve the efficiency of the entire node structure.
  • either a fixed weight round robin arbitration scheme (the weight ratio of each port is fixed at 1:1) or a fixed priority arbitration scheme can be adopted.
  • routing units S111, S113, S115, S117, and S1190 may be routing structures with backpressure and caching. These routing units may cache a certain number of requests or data, and when the requests or data can be When the corresponding interconnection structure receives it, it is sent to the corresponding interconnection structure; when the cache is full, back pressure is generated on the previous level structure to prevent requests or data sent by the previous level structure from being unable to be received and lost; when the cache When dissatisfied, counter-pressure is lifted.
  • the routing unit S1190 receives the request from the computing unit S1180 and caches it. When the cache is full, it backpressures the computing unit S1180 so that it no longer sends out requests.
  • the routing unit S1190 parses the locations where all cached requests are to be sent. For example, there is a request sent to the arbitration unit S114. When the arbitration unit S114 can receive the request, that is, there is no back pressure on the routing unit S1190, the request is sent to the arbitration unit S114. When there are requests sent to arbitration units S114 and S116 at the same time in the cache, and both arbitration units S114 and S116 can receive the requests, the two requests are sent to arbitration units S114 and S116 respectively at the same time. When there are requests to other ports, they are handled in the same way. When the corresponding structure cannot receive the request, the routing unit S1190 continues to cache it.
  • the computing unit S1180 is used to perform the calculation part of the proof of work
  • Storage unit S1181 is used to store the data set used in the proof of work.
  • the data set is split into multiple parts and stored in the storage units of multiple nodes.
  • Node S00 uses node S00 as an example to introduce the workflow of the workload proof chip according to the embodiment of the present disclosure. Other nodes can refer to the implementation. Node S00 has the same structure as node S11, refer to Figures 4 and 5.
  • the computing unit S0080 in the node S00 starts to perform the workload proof calculation and needs to request the data in the data set.
  • the computing unit S0080 continuously issues requests until the routing unit S0090 generates back pressure.
  • the requests issued by the computing unit S0080 are cached in the routing unit S0090;
  • request 1 is first sent to the routing unit S0090, and the routing unit S0090 caches the request 1;
  • Routing unit S0090 parses all cached requests at the same time and sends the requests in the cache to arbitration units S000, S002, S004 and S006 respectively. During this process, request 1 will be tried to be sent to arbitration unit S004;
  • routing unit S0090 continues to save request 1.
  • request 1 Sent to arbitration unit S004 through routing unit S0090;
  • the arbitration unit S004 analyzes the requests on all input ports and determines S003 corresponds to the weight of the input port of arbitration unit S004, receives requests from the above routing units in turn, and sends them to node S01 routing unit S011. In this process, request 1 is sent to routing unit S011;
  • Routing unit S011 caches all requests from arbitration unit S004; routing unit S011 parses all cached requests at the same time, and sends the requests in the cache to arbitration units S016, S0191, S014 and S012 respectively. In this process, request 1 will be Try to send to arbitration unit S016;
  • routing unit S011 When the buffer of arbitration unit S016 is full at this time, or it cannot receive the request of routing unit S011 due to arbitration, that is, there is back pressure on routing unit S011, then routing unit S011 continues to save request 1. When there is no back pressure, request 1 Sent to arbitration unit S016 through routing unit S011;
  • the arbitration unit S016 analyzes the requests on all input ports, receives the requests from the above routing units in sequence according to the weights of the routing units S011, S0190, S013 and S015 corresponding to the input ports of the arbitration unit S016, and sends them to the S11 node routing unit S113. , during this process, request 1 is sent to routing unit S113;
  • Routing unit S113 caches all requests from S01 node arbitration unit S016; routing unit S113 parses all cached requests at the same time, and sends the requests in the cache to arbitration units S110, S1191, S116 and S114 respectively. In this process, request 1 It is attempted to be sent to arbitration unit S1191;
  • routing unit S113 continues to save request 1.
  • request 1 Sent to arbitration unit S1191 through routing unit S113;
  • the arbitration unit S1191 analyzes the requests on all input ports, receives the requests from the above routing units in sequence according to the weights of the routing units S117, S111, S113 and S115 corresponding to the input ports of the arbitration unit S1191, and sends them to the storage unit S1181. During this process, request 1 is sent to storage unit S1181;
  • Request 1 accesses storage unit S1181 and obtains the requested data, which is recorded as data 1;
  • Data 1 is sent to the S00 node computing unit S0080 through the S11 node routing unit S1190, S11 node arbitration unit S110, S10 node routing unit S105, S10 node arbitration unit S102, S00 node routing unit S007, S00 node arbitration unit S0091, and the process is as follows Request 1 is similar and will not be repeated here.
  • the computing unit S0080 completes the request for the data located on the storage unit S1181.
  • the computing unit S0080 can obtain other data required for proof of work from any node according to the above process, and perform proof of work calculations.
  • the number of nodes can be from 2 to any number.
  • the foregoing embodiments take 2 nodes, 4 nodes and 9 nodes as examples for description.
  • the number of nodes is not limited to this.
  • the chip structure shown in Figure 4 it can be On the basis of this structure, the number of nodes can be increased, up to, for example, 1024 nodes.
  • the disclosed embodiment implements an ASIC unit for workload proof algorithm, solving the problems of high power consumption, low efficiency and the need for external memory or video memory to store data sets when CPU, GPU or FPGA perform workload proof. .
  • Embodiments of the present disclosure also provide a processing method for a proof-of-work chip.
  • the proof-of-work chip can be the proof-of-work chip described in any of the previous embodiments.
  • the processing method includes:
  • Step A1 The computing unit performs workload proof calculation and when it needs data from the storage unit data set in other nodes, it sends a request to the first routing unit of this node;
  • Step A2 the first routing unit sends the request to the second arbitration unit of the node, and sends the request to other nodes through the second arbitration unit;
  • Step A3 After receiving the data requested by the computing unit sent by other nodes, the second routing unit The data is sent to the first arbitration unit;
  • Step A4 The first arbitration unit sends the data to the computing unit.
  • steps A1 to A4 describe the process in which the computing unit obtains data from other nodes to perform workload proof calculations.
  • the method further includes:
  • Step B1 the storage unit receives the request sent by the other node computing unit sent by the first arbitration unit, and sends the requested data to the first routing unit;
  • Step B2 The first routing unit sends the data to the second arbitration unit of the node, and sends the request to the node requesting the data through the second arbitration unit.
  • steps B1 to B2 describe the process of the storage unit feeding back the requested data.
  • the method further includes:
  • the second routing unit After receiving the request or data sent by other nodes, the second routing unit sends the request or data to the second arbitration unit, and sends the data to the request or data through the second arbitration unit. target node.
  • the above steps describe the process of the current node serving as a routing node to forward requests or data.
  • both the first arbitration unit and the second arbitration unit include n input ports, n is a positive integer greater than or equal to 2, and the first arbitration unit or the second arbitration unit receives a request or data, process the request or data for each input port according to its weight and/or priority.
  • Processing requests or data according to priority may, for example, select the input port with the highest priority and which has pending requests or data, and after the request or data processing of the input port is completed, adjust the priority of the input port to the lowest.
  • Processing the request or data according to the weight may, for example, determine the number of requests or data that the input port can continuously process according to the weight value of the input port.
  • the embodiment of the present disclosure implements a processing method for workload proof algorithm, which solves the problem of high power consumption, low efficiency and the need for external memory or video memory to store data sets when CPU, GPU or FPGA performs workload proof. .
  • connection should be understood in a broad sense.
  • it can be a fixed connection or a fixed connection.
  • Detachable connection, or integral connection it can be a mechanical connection or an electrical connection; it can be a direct connection or an indirect connection through an intermediate medium; it can be an internal connection between two components.
  • connection should be understood in a broad sense.
  • it can be a fixed connection or a fixed connection.
  • Detachable connection, or integral connection it can be a mechanical connection or an electrical connection; it can be a direct connection or an indirect connection through an intermediate medium; it can be an internal connection between two components.
  • computer storage media includes volatile and nonvolatile media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. removable, removable and non-removable media.
  • Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disk (DVD) or other optical disk storage, magnetic cassettes, tapes, disk storage or other magnetic storage devices, or may Used to store desired information and can be calculated any other media that the machine can access.
  • communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism, and may include any information delivery media .

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Microelectronics & Electronic Packaging (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Bus Control (AREA)

Abstract

一种工作量证明芯片及其处理方法,包括:计算单元(10)进行工作量证明计算,需要其他节点中存储单元数据集中的数据时,向本节点第一路由单元(30)发送请求,第一路由单元(30)将请求发送至本节点第二仲裁单元(60),通过第二仲裁单元(60)将请求发送至其他节点;第二路由单元(50)接收到其他节点发送的计算单元(10)请求的数据后,将数据发送至第一仲裁单元(40),第一仲裁单元(40)将数据发送至计算单元(10)。

Description

工作量证明芯片及其处理方法
本申请要求于2022年7月19日提交中国专利局、申请号为202210844639.2、发明名称为“工作量证明芯片及其处理方法”的中国专利申请的优先权,其内容应理解为通过引用的方式并入本申请中。
技术领域
本公开实施例涉及但不限于计算机应用技术领域,尤指一种工作量证明芯片及其处理方法。
背景技术
区块链技术中,区块的产生需要靠工作量证明算法(Proof of Work,POW)来完成,工作量证明是一种哈希函数,可以使用中央处理器(Central Processing Unit,CPU)、图形处理器(Graphic Processing Unit,GPU)或现场可编程逻辑门阵列(Field-Programmable Gate Array,FPGA)等对其求解,求解的过程中需要对一个大数据集进行随机地址访问,整个数据集一般存放于内存或者显存中。采用CPU、GPU或FPGA完成工作量证明的共同问题是功耗高,效率低,需要外挂内存或者显存用来存储数据集。
发明内容
以下是对本文详细描述的主题的概述。本概述并非是为了限制权利要求的保护范围。
一方面,本公开实施例提供了一种工作量证明芯片,包括2个或2个以上个节点,每个节点包括:一个计算单元、一个存储单元、一个第一路由单元、一个第一仲裁单元、至少一个第二路由单元和至少一个第二仲裁单元,所述第二路由单元和第二仲裁单元的数量相同,其中,所述计算单元与所述存储单元相连,且所述计算单元和所述存储单元的输出端口均与所述第一路由单元的输入端口连接,所述第一路由单元的输出端口以及所述第二路由单元的输出端口均与所述第二仲裁单元的输入端口连接,所述第二仲裁单元的输出端口设置为与其他节点的第二路由单元的输入端口连接,所述计算单元和所述存储单元的输入端口均与所述第一仲裁单元的输出端口连接,所述第一仲裁单元的输入端口与所述第二路由单元的输出端口连接,所述第二路由单元的输入端口设置为与其他节点的第二仲裁单元的输出端口连接。
在示例性实施例中:
所述计算单元设置为向本节点或其他节点的存储单元请求数据进行工作量证明计算;
所述存储单元设置为存储工作量证明计算中使用的数据集,响应于本节点或其他节点计算单元的请求,向本节点或其他节点的计算单元发送数据;
所述第一路由单元设置为接收所述计算单元发送的请求或所述存储单元发送的数据,向所述第二仲裁单元转发所述请求或数据;
所述第一仲裁单元设置为接收所述第二路由单元发送的请求转发至所述存储单元,以及接收所述第二路由单元发送的数据转发至所述计算单元;
所述第二路由单元设置为接收其他节点发送请求或数据转发至所述第一仲裁单元或所述第二仲裁单元;
所述第二仲裁单元设置为接收所述第一路由单元或所述第二路由单元发送的请求或数据向其他节点转发。
另一方面,本公开实施例还提供了一种工作量证明芯片的处理方法,所述工作量证明芯片为前述任一种工作量证明芯片,所述处理方法包括:
所述计算单元需要其他节点存储单元数据集中的数据进行工作量证明计算时,向本节点第一路由单元发送请求,所述第一路由单元将所述请求发送至本节点第二仲裁单元,通过所述第二仲裁单元将所述请求发送至其他节点;
所述第二路由单元接收到其他节点发送的所述计算单元请求的数据后,将所述数据发送至所述第一仲裁单元,所述第一仲裁单元将所述数据发送至所述计算单元。
本公开的其它特征和优点将在随后的说明书中阐述,并且,部分地从说明书中变得显而易见,或者通过实施本公开而了解。本公开的其他优点可通过在说明书、权利要求书以及附图中所描述的方案来实现和获得。
在阅读并理解了附图和详细描述后,可以明白其他方面。
附图概述
附图用来提供对本公开技术方案的理解,并且构成说明书的一部分,与本公开的实施例一起用于解释本公开的技术方案,并不构成对本公开技术方案的限制。附图中各部件的形状和大小不反映真实比例,目的只是示意说明本公开内容。
图1为本公开实施例一种工作量证明芯片的结构示意图;
图2为本公开实施例提供的包含2节点的工作量证明芯片结构示意图;
图3为本公开实施例提供的包含4节点的工作量证明芯片结构示意图;
图4为本公开实施例提供的包含9节点的工作量证明芯片结构示意图;
图5为图4中节点S11的内部结构示意图;
图6为本公开实施例工作量证明芯片处理方法流程图。
详述
本公开描述了多个实施例,但是该描述是示例性的,而不是限制性的,并且对于本领域的普通技术人员来说显而易见的是,在本公开所描述的实施例包含的范围内可以有更多的实施例和实现方案。尽管在附图中示出了许多可能的特征组合,并在详述中进行了讨论,但是所公开的特征的许多其它组合方式也是可能的。除非特意加以限制的情况以外,任何实施例的任何特征或元件可以与任何其它实施例中的任何其他特征或元件结合使用,或可以替代任何其它实施例中的任何其他特征或元件。
本公开包括并设想了与本领域普通技术人员已知的特征和元件的组合。本公开已经公开的实施例、特征和元件也可以与任何常规特征或元件组合,以形成由权利要求限定的独特的发明方案。任何实施例的任何特征或元件也可以与来自其它发明方案的特征或元件组合,以形成另一个由权利要求限定的独特的发明方案。因此,应当理解,在本公开中示出和/或讨论的任何特征可以单独地或以任何适当的组合来实现。因此,除了根据所附权利要求及其等同替换所做的限制以外,实施例不受其它限制。此外,可以在所附权利要求的保护范围内进行各种修改和改变。
此外,在描述具有代表性的实施例时,说明书可能已经将方法和/或过程呈现为特定的步骤序列。然而,在该方法或过程不依赖于本文所述步骤的特定顺序的程度上,该方法或过程不应限于所述的特定顺序的步骤。如本领域普通技术人员将理解的,其它的步骤顺序也是可能的。因此,说明书中阐述的步骤的特定顺序不应被解释为对权利要求的限制。此外,针对该方法和/或过程的权利要求不应限于按照所写顺序执行它们的步骤,本领域技术人员可以容易地理解,这些顺序可以变化,并且仍然保持在本公开实施例的精神和范围内。
本公开实施例提供一种可用于完成工作量证明计算的专用集成电路(Application Specific Integrated Circuit,ASIC)芯片结构,相比传统的CPU、GPU或FPGA等结构,本实施例结构功耗更低,效率更高,并且摆脱了外挂内存或显存,可以直接将数据集存储到ASIC芯片内部。
图1为本公开实施例提供的工作量证明芯片结构示意图,包括2个或2个以上个节点,每个节点包括:一个计算单元10、一个存储单元20、一个第一路由单元30、一个第一仲裁单元40、至少1个第二路由单元50和至少1个第二仲裁单元60,第二路由单元50和第二仲裁单元60的数量相同,均为1个或2个或2个以上,其中,计算单元10与存储单元20相连,且所述计算单元10和存储单元20的输出端口均与第一路由单元30的输入端口连接,第一路由单元30的输出端口以及第二路由单元50的输出端口均与第二仲裁单元60的输入端口连接,其中第一路由单元30的输出端口分别与每个第二仲裁单元60的输入端口连接,第二路由单元50的输出端口与第二仲裁单元60的输入端口一一对应连接,即一个第二路由单元50的输出端口与一个第二仲裁单元60的输入端口连接,第二仲裁单元60的输出端口设置为与其他节点的第二路由单元50的输入端口连接,每个第二仲裁单元60的输出端口设置为与不同节点的第二路由单元50的输入端口连接,其中第二仲裁单元60的输出端口与第二路由单元50的输入端口一一对应连接,所述计算单元10和存储单元20的输入端口均与第一仲裁单元40的输出端口连接,第一仲裁单元40的输入端口与第二路由单元50的输出端口连接,第二路由单元50的输入端口设置为与其他节点的第二仲裁单元60的输出端口连接,每个第二路由单元50的输入端口设置为与不同节点的第二仲裁单元60的输出端口连接,其中第二路由单元50的输入端口与第二仲裁单元60的输出端口一一对应连接;其中:
所述计算单元10设置为向本节点或其他节点存储单元20请求数据进行工作量证明计算;
所述存储单元20设置为存储工作量证明计算中使用的数据集,响应于本节点或其他节点计算单元10的请求,向本节点或其他节点的计算单元10发送数据;
所述第一路由单元30设置为接收计算单元10发送的请求或存储单元20发送的数据,向第二仲裁单元60转发所述请求或数据;
所述第一仲裁单元40设置为接收第二路由单元50发送的请求转发至存储单元20,以及接收第二路由单元50发送的数据转发至计算单元10;
所述第二路由单元50设置为接收其他节点发送请求或数据转发至第一仲裁单元40或第二仲裁单元60;
所述第二仲裁单元60设置为接收第一路由单元30或第二路由单元50发送的请求或数据向其他节点转发。
通过本公开实施例所述芯片结构,存储单元设置在节点内部,无需外挂内存或者显存,不受内存接口和显存结构的带宽限制,可以在芯片内部实现更高的带宽,且数据集存储在 节点内部,可以降低工作量证明计算的功耗,提高效率。
本公开实施例以请求和数据为例说明信号流向,在节点内传输的信息内容不能作为对本申请的限制,在其他实施例中,其他信息内容可以参照请求和数据的信号流向执行。
在示例性实施例中,所述第二路由单元50和第二仲裁单元60均包括n个,n为大于等于2的正整数,所述第一仲裁单元40包括n个输入端口,每个输入端口与一个所述第二路由单元50的输出端口连接,所述第二仲裁单元60包括n个输入端口,其中一个输入端口与第一路由单元30的输出端口连接,其余n-1个输入端口与其他第二路由单元50的输出端口一一对应连接,所述第一路由单元30包括n个输出端口,每个输出端口与一个第二仲裁单元60的输入端口连接,所述第二路由单元50包括n个输出端口,其中一个输出端口与第一仲裁单元40的输入端口连接,其余n-1个输入端口与其他第二仲裁单元60的输入端口一一对应连接。由于一个第二仲裁单元60设置为连接一个节点的第二路由单元,当第二仲裁单元有n个时,说明与当前节点相连接的节点的最大值为n,即当前节点最多连接n个节点,例如n为1时,当前节点与一个节点相连接;n为2时,当前节点最多与两个节点相连接;n为3时,当前节点最多与三个节点相连接;n为4时,当前节点最多与四个节点相连接。
在示例性实施例中,当仲裁单元的输入端口包括2个以上时,所述仲裁单元可以设置为为本仲裁单元的每个输入端口设置相同或不同的权重,每个输入端口的权重值表示了该输入端口能连续处理的请求或数据数量的期望值,例如每个输入端口的权重比值等于设计中对每个端口应通过的请求或数据数量的期望比值,通过实验发现,在为每个端口设置不同的权重后,可以提高芯片整理的运算速度,提升处理效率。
在示例性实施例中,当仲裁单元的输入端口包括2个以上时,可以为每个输入端口设置相同或不同的优先级,并还可以在输入端口没有请求或数据后,降低该输入端口的优先级,经实验证明,动态调整优先级的方式相比固定优先级的方式,可以提高芯片的处理效率。
在示例性实施例中,可以为仲裁单元每个输入端口同时设置权重和优先级,在仲裁单元接收到请求或数据后,根据每个端口的优先级,选择优先级最高的有待处理请求或数据的端口,处理请求或数据的数量根据该端口的权重确定,请求或数据处理完成后,重新调整该端口的优先级,例如将该端口的优先级调至最低,继续选择下一个优先级最高的且有待处理请求或数据的端口。当优先级最高的端口无待处理请求或数据,则调整该端口的优先级,例如将该端口的优先级调至最低。
图2为本公开实施例提供的一种包含2节点的工作量证明芯片结构示意图,该芯片包括节点1和节点2,每个节点结构相同,每个节点包括1个计算单元、1个存储单元、1个第一路由单元、1个第一仲裁单元、1个第二路由单元和1个第二仲裁单元。计算单元与存储单元相连,计算单元和存储单元的输出端口均与第一路由单元的输入端口连接,第一路由单元的输出端口与第二仲裁单元的输入端口连接,第二仲裁单元的输出端口与另一个节点的第二路由单元的输入端口连接,计算单元和存储单元的输入端口均与第一仲裁单元的输出端口连接,第一仲裁单元的输入端口与第二路由单元的输出端口连接,第二路由单元的输入端口与另一个节点的第二仲裁单元的输出端口连接。由于两个节点间的请求、数据传输都只有一组总线,因此,在其他实施例中,该路由单元和仲裁单元可省略。
图3为本公开实施例提供的一种包含4节点的工作量证明芯片结构示意图,该芯片包括第一节点、第二节点、第三节点和第四节点,每个节点结构相同,每个节点包括1个计算单元、1个存储单元、1个第一路由单元、1个第一仲裁单元、2个第二路由单元和2 个第二仲裁单元。计算单元与存储单元相连,计算单元和存储单元的输出端口均与第一路由单元的输入端口连接,第一路由单元的2个输出端口分别与2个第二仲裁单元的输入端口连接,每个第二仲裁单元的输出端口与相邻节点的第二路由单元的输入端口连接,计算单元和存储单元的输入端口均与第一仲裁单元的输出端口连接,第一仲裁单元的2个输入端口分别与2个第二路由单元的输出端口连接,第二路由单元的输入端口与相邻节点的第二仲裁单元的输出端口连接。
在本示例中,包括第一仲裁单元和第二仲裁单元在内,每个仲裁单元有两个输入端口,如前所述,可为每个输入端口设置权重和优先级,仲裁单元在处理请求或数据时,可根据优先级选择先处理哪个端口的请求或数据,根据该端口的权重确定处理的请求或数据的数量。
图4为本公开实施例提供的一种包含9节点的工作量证明芯片的结构示意图,在本示例中,工作量证明芯片包含的9个节点分别为:节点S00、节点S01、节点S02、节点S10、节点S11、节点S12、节点S20、节点S21和节点S22。每个节点均与相邻的节点相连,每个节点结构相同,图5为节点S11的内部结构示意图。节点S11中包括计算单元S1180,与其连接的存储单元S1181,分别与计算单元S1180和存储单元S1181连接的第一路由单元S1190(图中简写为路由单元),以及分别与计算单元S1180和存储单元S1181连接的第一仲裁单元S1191(图中简写为仲裁单元),此外,节点S11还包括与第一路由单元S1190连接的4个第二仲裁单元(图中简写为仲裁单元)S110、S112、S114和S116,和与第一仲裁单元S1191连接的4个第二路由单元(图中简写为路由单元)S111、S113、S115和S117。
在示例性实施例中,仲裁单元S110、S112、S114、S116和S1191可以为带反压和缓存的仲裁结构,该些仲裁单元可以缓存一定数量的请求或数据,并在该请求或数据能被对应的互连结构接收时,将其发送到对应的互连结构(指与本单元连接的节点),当缓存满时,对前一级结构产生反压,防止前一级结构发出的请求或数据无法被接收而丢失,当缓存不满时,反压解除。此外,该些仲裁单元可以根据每个输入端口的数据量设计每个端口的权重比,这决定了每个端口通过的请求或数据的比例,当该比例设置与实际需要通过的请求或数据的比例相符,则会提升整个系统的效率。
以仲裁单元S110为例,其包括四个输入端口:S1100、S1101、S1102和S1103。假设四个输入端口的默认优先级为S1100>S1101>S1102>S1103,并且假设S1100的权重为4,S1101的权重为2,S1102的权重为1,S1103的权重为0。在本例中,权重数量与发送请求的数量相关,权重为4表示最多可连续发送4个请求,权重为0视为端口关闭,不允许请求通过。本例中,优先级调整的原则是在端口发送完请求或者没有请求后,将该端口的优先级调至最低。
对于仲裁单元S110来说,其加权轮询仲裁的行为举例如下:
假设端口S1100接收到请求,且当前端口S1100的优先级最高,由于端口S1100的权重为4,因此该端口S1100最多可以连续接收4个请求,当端口S1100发送完连续4个请求或者S1100没有请求后,仲裁单元S110将优先级顺序调整为:S1101>S1102>S1103>S1100;
此时可有4种情况:
情况1,端口S1101有请求,由于S1101是优先级最高的有请求的端口,且由于端口S1101的权重为2,因此端口S1101最多可以连续发送2个请求,当端口S1101连续发送完2个请求或者S1101没有请求后,仲裁单元S110将优先级顺序调整为: S1102>S1103>S1100>S1101;
情况2,端口S1101没有请求,但端口S1102有请求,由于端口S1102是优先级最高的有请求的端口,且由于端口S1102的权重为1,当端口S1102发送完1个请求后,仲裁单元S110将优先级顺序调整为:S1103>S1100>S1101>S1102;
情况3,端口S1101和S1102均没有请求,但端口S1100有请求,由于S1100是优先级最高的有请求的端口,由于端口S1100的权重为4,其最多可以连续发送4个请求,当端口S1100发送完连续4个请求或者端口S1100没有请求后,仲裁单元S110将优先级顺序调整为:S1101>S1102>S1103>S1100;
情况4,端口S1101和S1102均没有请求,S1100没有请求,此时优先级不变,仍为S1101>S1102>S1103>S1100,等待端口S1100、S1101和S1102发来的请求。
每个仲裁单元S110、S112、S114、S116和S1191均可采用上述加权优先级轮询仲裁方案,可以提高整个节点结构的效率。在其他实施例中,或者可以采用固定权重轮询仲裁方案(每个端口权重比固定为1:1),或者固定优先级仲裁方案。
在示例性实施例中,路由单元S111、S113、S115、S117和S1190可以为带反压和缓存的路由结构,该些路由单元可以缓存一定数量的请求或数据,并在该请求或数据能被对应的互连结构接收时,将其发送到对应的互连结构;当缓存满时,对前一级结构产生反压,防止前一级结构发出的请求或数据无法被接收而丢失;当缓存不满时,反压解除。
例如对于路由单元S1190,其接收来自计算单元S1180的请求并将其缓存,当缓存满,则反压计算单元S1180使其不再发出请求。路由单元S1190解析所有缓存的请求要发往的位置,例如存在发往仲裁单元S114的请求,当仲裁单元S114能够接收请求,即对路由单元S1190没有反压,则将请求发往仲裁单元S114。当缓存中同时存在发往仲裁单元S114和S116的请求,并且仲裁单元S114和S116都能接收请求,则同时将两个请求分别发往仲裁单元S114和S116。存在发往其他端口的请求时,处理方式相同。当对应的结构不能接收请求,则路由单元S1190继续将其缓存。
计算单元S1180用于进行工作量证明中的计算部分;
存储单元S1181用于存储工作量证明中用到的数据集,数据集被拆分成多个部分,分别存放在多个节点的存储单元内。
下面以节点S00为例介绍本公开实施例工作量证明芯片的工作流程,其他节点可参照执行。节点S00与节点S11结构相同,参照图4和图5。
节点S00中计算单元S0080开始进行工作量证明计算,需要请求数据集中的数据,计算单元S0080连续发出请求直到路由单元S0090产生反压,计算单元S0080发出的请求被缓存到路由单元S0090中;
假设计算单元S0080发出的其中一个请求需要访问S11节点存储单元S1181中的数据,记为请求1,请求1首先被发送到路由单元S0090,路由单元S0090缓存请求1;
路由单元S0090同时解析被缓存的所有请求,将缓存中的请求分别发往仲裁单元S000、S002、S004和S006,在此过程中,请求1会被尝试发往到仲裁单元S004;
当仲裁单元S004此时缓冲是满的,或者由于仲裁的关系不能接收路由单元S0090的请求,即对路由单元S0090有反压,则路由单元S0090继续保存请求1,当没有反压,则请求1通过路由单元S0090发送到仲裁单元S004;
仲裁单元S004分析所有输入端口上的请求,根据路由单元S007、S0090、S001和 S003对应在仲裁单元S004输入端口的权重,依次接收来自上述路由单元的请求,并将其发往节点S01路由单元S011,在此过程中,请求1被发往路由单元S011;
路由单元S011缓存来自仲裁单元S004的所有请求;路由单元S011同时解析被缓存的所有请求,将缓存中的请求分别发往仲裁单元S016、S0191、S014和S012,在此过程中,请求1会被尝试发往仲裁单元S016;
当仲裁单元S016此时缓冲是满的,或者由于仲裁的关系不能接收路由单元S011的请求,即对路由单元S011有反压,则路由单元S011继续保存请求1,当没有反压,则请求1通过路由单元S011发往仲裁单元S016;
仲裁单元S016分析所有输入端口上的请求,根据路由单元S011、S0190、S013和S015对应在仲裁单元S016输入端口的权重,依次接收来自上述路由单元的请求,并将其发往S11节点路由单元S113,在此过程中,请求1被发往路由单元S113;
路由单元S113缓存来自S01节点仲裁单元S016的所有请求;路由单元S113同时解析被缓存的所有请求,将缓存中的请求分别发往仲裁单元S110、S1191、S116和S114,在此过程中,请求1被尝试发往仲裁单元S1191;
当仲裁单元S1191此时缓冲是满的,或者由于仲裁的关系不能接收路由单元S113的数据,即对路由单元S113有反压,则路由单元S113继续保存请求1,当没有反压,则请求1通过路由单元S113发送到仲裁单元S1191;
仲裁单元S1191分析所有输入端口上的请求,根据路由单元S117、S111、S113和S115对应在仲裁单元S1191输入端口的权重,依次接收来此上述路由单元的请求,并将其发往存储单元S1181,在此过程中,请求1被发往存储单元S1181;
请求1访问存储单元S1181,获得所请求的数据,记为数据1;
数据1依次通过S11节点路由单元S1190,S11节点仲裁单元S110,S10节点路由单元S105,S10节点仲裁单元S102,S00节点路由单元S007,S00节点仲裁单元S0091被发送到S00节点计算单元S0080,过程与请求1类似,此处不在赘述。至此计算单元S0080完成对位于存储单元S1181上的数据的请求。
计算单元S0080可按以上过程从任意节点获得工作量证明需要的其他数据,进行工作量证明计算。
节点的数量可以有2个到任意多个,前述实施例分别以2节点、4节点和9节点为例进行说明,节点的数量不限于此,以图4所示的芯片结构为例,可以在该结构的基础上进行扩展,增加节点数量,最多例如可以增加至1024个节点。
本公开实施例实现了一种用于工作量证明算法的ASIC单元,解决了CPU、GPU或FPGA做工作量证明时,功耗高、效率低,需要外挂内存或者显存用来存储数据集的问题。
本公开实施例还提供了一种工作量证明芯片的处理方法,所述工作量证明芯片可以是前述任一实施例所述的工作量证明芯片,如图6所示,所述处理方法包括:
步骤A1,计算单元进行工作量证明计算,需要其他节点中存储单元数据集中的数据时,向本节点第一路由单元发送请求;
步骤A2,所述第一路由单元将所述请求发送至本节点第二仲裁单元,通过所述第二仲裁单元将所述请求发送至其他节点;
步骤A3,所述第二路由单元接收到其他节点发送的所述计算单元请求的数据后,将 所述数据发送至所述第一仲裁单元;
步骤A4,所述第一仲裁单元将所述数据发送至所述计算单元。
上述步骤A1-步骤A4描述了计算单元从其他节点获取数据进行工作量证明计算的过程。
在示例性实施例中,所述方法还包括:
步骤B1,所述存储单元接收到所述第一仲裁单元发送的由其他节点计算单元发出的请求,将请求的数据发送至所述第一路由单元;
步骤B2,所述第一路由单元将所述数据发送至本节点第二仲裁单元,通过所述第二仲裁单元将所述请求发送至请求数据的节点。
上述步骤B1-步骤B2描述了存储单元反馈所请求数据的过程。
在示例性实施例中,所述方法还包括:
所述第二路由单元接收到其他节点发送的请求或数据后,将所述请求或数据发送至所述第二仲裁单元,通过所述第二仲裁单元将所述数据发送至所述请求或数据的目标节点。
上述步骤描述了当前节点作为路由节点转发请求或数据的过程。
在示例性实施例中,所述第一仲裁单元和所述第二仲裁单元均包括n个输入端口,n为大于等于2的正整数,所述第一仲裁单元或第二仲裁单元接收到请求或数据后,根据每个输入端口的权重和/或优先级处理每个输入端口的请求或数据。根据优先级处理请求或数据例如可以是:选择优先级最高的且有待处理请求或数据的输入端口,该输入端口的请求或数据处理完成后,将该输入端口的优先级调至最低。根据权重处理请求或数据例如可以是:根据输入端口的权重值确定所述输入端口能连续处理的请求或数据数量。
本公开实施例实现了一种用于工作量证明算法的处理方法,解决了CPU、GPU或FPGA做工作量证明时,功耗高、效率低,需要外挂内存或者显存用来存储数据集的问题。
在本公开实施例的描述中,需要说明的是,除非另有明确的规定和限定,术语“安装”、“相连”、“连接”应做广义理解,例如,可以是固定连接,也可以是可拆卸连接,或一体地连接;可以是机械连接,也可以是电连接;可以是直接相连,也可以通过中间媒介间接相连,可以是两个元件内部的连通。对于本领域的普通技术人员而言,可以根据情况理解上述术语在本公开中的含义。
本领域普通技术人员可以理解,上文中所公开方法中的全部或某些步骤、系统、装置中的功能模块/单元可以被实施为软件、固件、硬件及其适当的组合。在硬件实施方式中,在以上描述中提及的功能模块/单元之间的划分不一定对应于物理组件的划分;例如,一个物理组件可以具有多个功能,或者一个功能或步骤可以由若干物理组件合作执行。某些组件或所有组件可以被实施为由处理器,如数字信号处理器或微处理器执行的软件,或者被实施为硬件,或者被实施为集成电路,如专用集成电路。这样的软件可以分布在计算机可读介质上,计算机可读介质可以包括计算机存储介质(或非暂时性介质)和通信介质(或暂时性介质)。如本领域普通技术人员公知的,术语计算机存储介质包括在用于存储信息(诸如计算机可读指令、数据结构、程序模块或其他数据)的任何方法或技术中实施的易失性和非易失性、可移除和不可移除介质。计算机存储介质包括但不限于RAM、ROM、EEPROM、闪存或其他存储器技术、CD-ROM、数字多功能盘(DVD)或其他光盘存储、磁盒、磁带、磁盘存储或其他磁存储装置、或者可以用于存储期望的信息并且可以被计算 机访问的任何其他的介质。此外,本领域普通技术人员公知的是,通信介质通常包含计算机可读指令、数据结构、程序模块或者诸如载波或其他传输机制之类的调制数据信号中的其他数据,并且可包括任何信息递送介质。

Claims (15)

  1. 一种工作量证明芯片,包括2个或2个以上个节点,其中每个节点包括:一个计算单元、一个存储单元、一个第一路由单元、一个第一仲裁单元、至少一个第二路由单元和至少一个第二仲裁单元,所述第二路由单元和第二仲裁单元的数量相同,其中,所述计算单元与所述存储单元相连,且所述计算单元和所述存储单元的输出端口均与所述第一路由单元的输入端口连接,所述第一路由单元的输出端口以及所述第二路由单元的输出端口均与所述第二仲裁单元的输入端口连接,所述第二仲裁单元的输出端口设置为与其他节点的第二路由单元的输入端口连接,所述计算单元和所述存储单元的输入端口均与所述第一仲裁单元的输出端口连接,所述第一仲裁单元的输入端口与所述第二路由单元的输出端口连接,所述第二路由单元的输入端口设置为与其他节点的第二仲裁单元的输出端口连接。
  2. 根据权利要求1所述的工作量证明芯片,其中,
    所述计算单元设置为向本节点或其他节点的存储单元请求数据进行工作量证明计算;
    所述存储单元设置为存储工作量证明计算中使用的数据集,响应于本节点或其他节点计算单元的请求,向本节点或其他节点的计算单元发送数据;
    所述第一路由单元设置为接收所述计算单元发送的请求或所述存储单元发送的数据,向所述第二仲裁单元转发所述请求或数据;
    所述第一仲裁单元设置为接收所述第二路由单元发送的请求转发至所述存储单元,以及接收所述第二路由单元发送的数据转发至所述计算单元;
    所述第二路由单元设置为接收其他节点发送请求或数据转发至所述第一仲裁单元或所述第二仲裁单元;
    所述第二仲裁单元设置为接收所述第一路由单元或所述第二路由单元发送的请求或数据向其他节点转发。
  3. 根据权利要求2所述的工作量证明芯片,其中,所述第二路由单元和所述第二仲裁单元均包括n个,n为大于等于2的正整数,所述第一仲裁单元包括n个输入端口,每个输入端口与一个所述第二路由单元的输出端口连接,所述第二仲裁单元包括n个输入端口,其中一个输入端口与所述第一路由单元的输出端口连接,其余n-1个输入端口与其他第二路由单元的输出端口一一对应连接,所述第一路由单元包括n个输出端口,每个输出端口与一个所述第二仲裁单元的输入端口连接,所述第二路由单元包括n个输出端口,其中一个输出端口与所述第一仲裁单元的输入端口连接,其余n-1个输入端口与其他第二仲裁单元的输入端口一一对应连接。
  4. 根据权利要求2所述的工作量证明芯片,其中,
    所述仲裁单元还设置为为本仲裁单元的n个输入端口分别设置权重,输入端口的权重值表示该输入端口能连续处理的请求或数据数量的期望值,所述仲裁单元包括所述第一仲裁单元和所述第二仲裁单元。
  5. 根据权利要求2或3或4所述的工作量证明芯片,其中,
    所述仲裁单元还设置为为本仲裁单元的n个输入端口分别设置优先级,在本仲裁单元处理请求或数据时,所述仲裁单元选择优先级最高的且有待处理请求或数据的输入端口。
  6. 根据权利要求5所述的工作量证明芯片,其中,
    所述仲裁单元还设置为在选择优先级最高的且有待处理请求或数据的输入端口后,当 所述输入端口的请求或数据处理完成后,重新调整所述输入端口的优先级。
  7. 根据权利要求6所述的工作量证明芯片,其中,所述重新调整所述输入端口的优先级,包括:
    所述仲裁单元将所述输入端口的优先级调至最低。
  8. 根据权利要求1或2所述的工作量证明芯片,其中,
    所述仲裁单元为带反压和缓存的仲裁单元。
  9. 根据权利要求1或2所述的工作量证明芯片,其中,
    所述路由单元为带反压和缓存的路由单元。
  10. 一种工作量证明芯片的处理方法,其中,所述工作量证明芯片包括2个或2个以上个节点,每个节点包括:一个计算单元、一个存储单元、一个第一路由单元、一个第一仲裁单元、至少一个第二路由单元和至少一个第二仲裁单元,所述第二路由单元和第二仲裁单元的数量相同,其中,所述计算单元与所述存储单元相连,且所述计算单元和所述存储单元的输出端口均与所述第一路由单元的输入端口连接,所述第一路由单元的输出端口以及所述第二路由单元的输出端口均与所述第二仲裁单元的输入端口连接,所述第二仲裁单元的输出端口设置为与其他节点的第二路由单元的输入端口连接,所述计算单元和所述存储单元的输入端口均与所述第一仲裁单元的输出端口连接,所述第一仲裁单元的输入端口与所述第二路由单元的输出端口连接,所述第二路由单元的输入端口设置为与其他节点的第二仲裁单元的输出端口连接;所述处理方法包括:
    所述计算单元需要其他节点存储单元数据集中的数据进行工作量证明计算时,向本节点第一路由单元发送请求,所述第一路由单元将所述请求发送至本节点第二仲裁单元,通过所述第二仲裁单元将所述请求发送至其他节点;
    所述第二路由单元接收到其他节点发送的所述计算单元请求的数据后,将所述数据发送至所述第一仲裁单元,所述第一仲裁单元将所述数据发送至所述计算单元。
  11. 根据权利要求10所述的处理方法,还包括:
    所述存储单元接收到所述第一仲裁单元发送的由其他节点计算单元发出的请求,将请求的数据发送至所述第一路由单元,所述第一路由单元将所述数据发送至本节点第二仲裁单元,通过所述第二仲裁单元将所述请求发送至请求数据的节点。
  12. 根据权利要求10所述的处理方法,还包括:
    所述第二路由单元接收到其他节点发送的请求或数据后,将所述请求或数据发送至所述第二仲裁单元,通过所述第二仲裁单元将所述数据发送至所述请求或数据的目标节点。
  13. 根据权利要求10所述的处理方法,其中,
    所述第一仲裁单元和所述第二仲裁单元均包括n个输入端口,n为大于等于2的正整数,所述第一仲裁单元或所述第二仲裁单元接收到请求或数据后,根据每个输入端口的权重和/或优先级处理所述输入端口的请求或数据。
  14. 根据权利要求13所述的处理方法,其中,所述仲裁单元根据每个输入端口的优先级处理所述输入端口的请求或数据,包括:
    选择优先级最高的且有待处理请求或数据的输入端口,该输入端口的请求或数据处理完成后,将该输入端口的优先级调至最低。
  15. 根据权利要求13所述的处理方法,其中,所述仲裁单元根据每个输入端口的权 重处理所述输入端口的请求或数据,包括:根据输入端口的权重值确定所述输入端口能连续处理的请求或数据数量。
PCT/CN2023/077712 2022-07-19 2023-02-22 工作量证明芯片及其处理方法 WO2024016659A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210844639.2 2022-07-19
CN202210844639.2A CN114928577B (zh) 2022-07-19 2022-07-19 工作量证明芯片及其处理方法

Publications (1)

Publication Number Publication Date
WO2024016659A1 true WO2024016659A1 (zh) 2024-01-25

Family

ID=82816044

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/077712 WO2024016659A1 (zh) 2022-07-19 2023-02-22 工作量证明芯片及其处理方法

Country Status (2)

Country Link
CN (1) CN114928577B (zh)
WO (1) WO2024016659A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114928577B (zh) * 2022-07-19 2022-10-21 中科声龙科技发展(北京)有限公司 工作量证明芯片及其处理方法
CN115905088B (zh) * 2022-12-27 2023-07-14 声龙(新加坡)私人有限公司 一种数据收集结构、方法、芯片和系统

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102546417A (zh) * 2012-01-14 2012-07-04 西安电子科技大学 基于网络信息的片上网络路由器调度方法
US20200118093A1 (en) * 2018-08-10 2020-04-16 Hajoon Ko System and method for arbitrating a blockchain transaction
CN112214427A (zh) * 2020-10-10 2021-01-12 中科声龙科技发展(北京)有限公司 缓存结构、工作量证明运算芯片电路及其数据调用方法
CN114003552A (zh) * 2021-12-30 2022-02-01 中科声龙科技发展(北京)有限公司 工作量证明运算方法、工作量证明芯片及上位机
CN114928577A (zh) * 2022-07-19 2022-08-19 中科声龙科技发展(北京)有限公司 工作量证明芯片及其处理方法

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5524261A (en) * 1991-12-31 1996-06-04 Dictaphone Corporation (U.S.) Voice processor interface chip with arbitration unit
CN112214448B (zh) * 2020-10-10 2024-04-09 声龙(新加坡)私人有限公司 异质集成工作量证明运算芯片的数据动态重构电路及方法
CN112925504A (zh) * 2021-02-20 2021-06-08 北京比特大陆科技有限公司 工作量证明的计算装置、asic芯片和工作量证明的计算方法
CN114238157A (zh) * 2021-11-26 2022-03-25 浙江毫微米科技有限公司 工作量证明的获取装置、方法、电子设备、存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102546417A (zh) * 2012-01-14 2012-07-04 西安电子科技大学 基于网络信息的片上网络路由器调度方法
US20200118093A1 (en) * 2018-08-10 2020-04-16 Hajoon Ko System and method for arbitrating a blockchain transaction
CN112214427A (zh) * 2020-10-10 2021-01-12 中科声龙科技发展(北京)有限公司 缓存结构、工作量证明运算芯片电路及其数据调用方法
CN114003552A (zh) * 2021-12-30 2022-02-01 中科声龙科技发展(北京)有限公司 工作量证明运算方法、工作量证明芯片及上位机
CN114928577A (zh) * 2022-07-19 2022-08-19 中科声龙科技发展(北京)有限公司 工作量证明芯片及其处理方法

Also Published As

Publication number Publication date
CN114928577B (zh) 2022-10-21
CN114928577A (zh) 2022-08-19

Similar Documents

Publication Publication Date Title
WO2024016659A1 (zh) 工作量证明芯片及其处理方法
US10455063B2 (en) Packet flow classification
US11929927B2 (en) Network interface for data transport in heterogeneous computing environments
US8446824B2 (en) NUMA-aware scaling for network devices
WO2020236295A1 (en) System and method for facilitating efficient message matching in a network interface controller (nic)
US10303618B2 (en) Power savings via dynamic page type selection
US8225026B2 (en) Data packet access control apparatus and method thereof
US10193831B2 (en) Device and method for packet processing with memories having different latencies
US8325603B2 (en) Method and apparatus for dequeuing data
WO2017157110A1 (zh) 高速访问双倍速率同步动态随机存储器的控制方法及装置
CN116018790A (zh) 基于接收方的精密拥塞控制
JP2016195375A (ja) 複数のリンクされるメモリリストを利用する方法および装置
US10419370B2 (en) Hierarchical packet buffer system
KR102126592B1 (ko) 멀티코어 프로세서들에 대한 내부 및 외부 액세스를 갖는 룩-어사이드 프로세서 유닛
KR20240004315A (ko) Smartnic들 내의 네트워크 연결형 mpi 프로세싱 아키텍처
US10601723B2 (en) Bandwidth matched scheduler
US11563830B2 (en) Method and system for processing network packets
CN104572498A (zh) 报文的缓存管理方法和装置
US10003551B2 (en) Packet memory system, method and device for preventing underrun
US20160294926A1 (en) Using a single work item to send multiple messages
US9137167B2 (en) Host ethernet adapter frame forwarding
US10067868B2 (en) Memory architecture determining the number of replicas stored in memory banks or devices according to a packet size
US20220358059A1 (en) Data access method for direct memory access (dma), and processor
WO2021073473A1 (zh) 数据包处理方法、装置、通信设备及存储介质
WO2019095942A1 (zh) 一种数据传输方法及通信设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23841737

Country of ref document: EP

Kind code of ref document: A1