WO2019218730A1

WO2019218730A1 - System and method for optimizing core computing components of proof of work operation chip

Info

Publication number: WO2019218730A1
Application number: PCT/CN2019/074499
Authority: WO
Inventors: 汪福全; 刘明; 蔡凯
Original assignee: 中科声龙科技发展（北京）有限公司
Priority date: 2018-05-18
Filing date: 2019-02-01
Publication date: 2019-11-21
Also published as: CN108777612A; CN108777612B

Abstract

The present invention relates to a system and method for optimizing the core computing components of a proof of work operation chip. Each basic component of the core computing components in the described method comprise a hash collision unit and a plurality of DAG node data generating units. The core computing components are composed of a plurality of the described basic components, the hash collision unit and DAG node data generating units in the basic components both employing structural designs such as parallel computing, time division multiplexing, an assembly line, and so on. The basic components increase the efficiency of algorithm implementation by means of a parallel computing structure and increase data throughput by means of time-division multiplexing and assembly line structures.

Description

Optimization system and method for core computing component of workload proof computing chip

Technical field

The invention relates to the technical field of blockchain, workload proof, encrypted digital coin mining and integrated system, in particular to a method and system for optimizing the core computing component of a workload proof computing chip involving mining of Ethereum.

Background technique

Proof of Work (POW) is a consensus mechanism used by mainstream encryption digital coins such as Bitcoin and Ethereum. The basic feature is that a large number of hash operations are needed to find the conditions under certain difficulty values. The hash value.

FNV hashing can quickly hash large amounts of data and maintain a small collision rate. Its high degree of dispersion makes it suitable for hashing very similar strings, such as URL, hostname, file name, text, IP address, etc.

Different from the SHA3-256 (a kind of hash operation) mining workload proof algorithm adopted by Bitcoin, the mining workload proof algorithm used by a type of encrypted digital coin such as Ethereum is called ETHASH. In the traditional implementation method of the ETHASH algorithm, the DAG node data is generated in one-time operation and stored in the external memory in advance, so that the participating operations can be read at any time during the subsequent hash operation, which depends on the external memory. An ETHASH algorithm optimization implementation method that does not depend on external memory includes the following three key steps: key step 1, pre-generating internal CACHE data; and key step 2, generating DAG node data in real time according to pre-generated internal CACHE data; The key step 3 is to perform hash operation through the DAG node data generated in real time, and perform workload verification according to the operation result. The optimization system corresponding to the method includes 1. one or more internal CACHE data generating units, 2. one or more Internal storage unit, 3. an internal storage access control unit, 4. one or more DAG node data generating units, 5. one or more hashing units. The present invention relates to the key step 2 and the key step 3 of the above optimization method, and the corresponding core computing component which is composed of the optimized system unit 4 and the unit 5 (in the present invention, the name is uniformly named by the hash collision unit). Optimize the implementation method and implementation system.

Summary of the invention

The technical problem to be solved by the embodiments of the present invention is to provide an optimization method for the core computing component of the workload proof computing chip, which is applicable to the workload proof operation of a type of encrypted digital currency such as Ethereum.

The main flow of the method of the present invention is as follows: FNV hashing is performed on a header_hash (a hash value of a block header, a random number of 256 bits) and a plurality of nonce (a workload verification value, a 64-bit random number) value. The operation and splicing operation result in multiple MIX (1024-bit random numbers, which are composed of any two adjacent DAG node data), and store these MIX values in the on-chip storage module, and calculate according to the MIX value. The DAG node index required to update these MIX values is obtained, through which multiple DAG node data generating units are called in parallel to generate required DAG node data, and each DAG node data generating unit can calculate multiple in parallel. Data, provide the data to the hash collision unit to update the value of MIX, generate the final MIX value, perform data compression, splicing, FNV hash operation, etc., and then generate the final calculation result, and work according to the calculation result. Proof of quantity. Through the parallel computing of multiple DAG nodes, the time-division multiplexing and pipeline structure are used to improve the data throughput rate, which improves the computational efficiency of the ETHASH algorithm.

The invention also provides an optimization system for the core computing component of the workload proof computing chip, which is suitable for the workload proof operation of a type of encrypted digital currency such as Ethereum. specifically,

The present invention provides an optimization system for the core computing component of the workload proof computing chip, including:

a core computing component comprising a plurality of basic components;

Each of the basic components includes: a hash collision unit and a plurality of DAG node data generating units respectively connected to the hash collision unit;

The hash collision unit includes: one or more SHA3 hash operation modules, one or more storage modules, one or more FNV hash operation modules, and one or more DAG node index generation modules; The SHA3 hash operation module, the FNV hash operation module, and the DAG node index generation module are respectively connected to the storage module;

The DAG node data generating unit includes: one or more DAG node data loading modules, one or more SHA3 hash computing modules, one or more DAG node data computing modules, one or more CACHEs Node index generation module;

The DAG node data loading module and the DAG node data computing module are respectively connected to the SHA3 hash computing module; the CAHCE node index generating module is connected to the DAG node data computing module.

The hash collision unit and the DAG node data generation unit are both designed with a time division multiplexing structure; or the hash collision unit and the DAG node data generation unit are designed by using a pipeline structure;

Preferably, the number of the DAG node data generating units is not less than 128.

Preferably, the number of the DAG node data generating units is 1024.

Preferably, the plurality of DAG node data generating units are time-multiplexed for one of the hash collision units.

Preferably, the number of the DAG node data generating units that are time-multiplexed for one of the hash collision units is 64.

Preferably, the DAG node data generating unit comprises: one or more temporary DAG sub-calculation modules; wherein the plurality of temporary DAG sub-calculation modules are time-multiplexed to one DAG node data generating unit.

Preferably, the number of pipeline stages of the DAG node data generating unit is not less than 8.

Preferably, the storage module is a static random access memory.

The application also provides an optimization method for the core computing component of the workload proof computing chip, based on the above system, including the steps:

A. The data transmitted from the host computer is hashed to obtain the DAG node index and stored;

B. The DAG node index is hashed to generate DAG node data;

C. Hash the DAG node data, and do the workload proof according to the operation result.

Preferably, the step A comprises:

The data transmitted from the host computer is hashed and simultaneously generates one or more DAG node indexes and stores them.

Preferably, the number of the DAG node indexes is 64.

Preferably, the DAG node data in step B is node data of one or more DAG nodes generated at the same time.

Preferably, the number of nodes of the DAG node data in step B is 256.

Preferably, the step C includes:

The DAG node data is hashed, and one or more hash operations results for the workload proof are generated, and the workload proof is performed according to the hash operation result.

Preferably, the number of hash operations is 256.

In summary, compared with the prior art, the present invention includes the following advantages: 1. The efficiency of the algorithm is improved by the parallel computing structure; 2. The data throughput rate is improved by the pipeline structure; 3. The time division multiplexing structure is adopted. Improve data throughput, and reduce chip area and cost; 4, through the above advantages, improve the cost performance of the system

DRAWINGS

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, a brief description of the drawings used in the embodiments or the prior art description will be briefly described below. Obviously, the drawings in the following description It is some embodiments of the invention.

Figure 1 is a basic component of the present invention;

2 is a flow chart of a hash collision operation according to the present invention;

3 is a flow chart of generating DAG node data according to the present invention;

4 is a schematic structural view of a system according to the present invention.

Detailed ways

The technical solutions in the embodiments of the present invention will be clearly and completely described in conjunction with the drawings in the embodiments of the present invention. It is a partial embodiment of the invention, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.

Embodiment 1

The method for optimizing the core computing component of the workload proof computing chip provided by the embodiment of the present invention is described in detail.

Referring to FIG. 1, an optimization method for a core computing component of a workload proof computing chip is provided, including the following steps:

1. Obtain the header_hash value and nonce from the host computer through the collision initialization port. Splicing the two values and performing a hash operation to obtain a value (S101 to S102);

2. Using the value obtained in step 1, obtain a MIX data through the splicing operation, and use the data and its related data to initialize the storage space required for the current operation in the hash collision unit (refer to FIG. 2);

3. Detect whether the MIX data in the storage module has completed 64 operations, and if so, transfer the MIX data in the storage module to the MIX compression module (step 5); if the 64 operations are not completed, the MIX data operation is obtained. An index of the DAG node, submitted to the task interface (S103 to S114);

a) obtaining a DAG node index through the task interface, and obtaining a CACHE node data through the memory access interface according to the DAG node index (S104);

b) after the CACHE node data obtained in step a is SHA3 hashed, populating to a temporary DAG node data module of an idle computing unit (S105 to S109);

c) determining whether the computing unit has completed 256 operation cycles, and if so, submitting the CACHE node data to the hash operation module (step e); if not, according to the temporary DAG node data in the current calculation unit Obtaining a CACHE node index by XOR, FNV hash operation, modulo operation, etc., and reading data from the memory access port according to the DAG node index (S109 to S112);

d) performing FNV hash operation on the data returned by the memory interface and the temporary DAG node data in the computing unit, and updating the temporary DAG node data in the computing unit;

e) the hash operation module performs a SHA3 hash operation on the DAG node data to obtain a DAG node data, and transmits the DAG node data to the task port (S113 to S114);

4. The unit receives the DAG node data returned by the task interface, and uses it to update the MIX data in the storage module, and then repeats step 3 (S103);

5. The MIX compression module compresses the MIX into CMIX (a 256-bit random number, which is obtained by MIX through multiple FNV hash operations) through FNV hash operation, and performs the splicing operation with the value obtained in step 1, and submits it to Kazakhstan. Greek operation module (S115 to S116);

6. Perform operations such as compression, hashing, and the like on the data in step 5, and submit the data that meets the requirements to the host computer (S116 to S117).

Embodiment 2

An optimization system for a core computing component of a workload proof computing chip provided by an embodiment of the present invention is described in detail.

Referring to FIG. 4, specifically, it includes:

a hash collision unit (S400): consisting of a SHA3 hash operation module (S401), a storage module module (S402), a DAG node index generation module (S403), and an FNV hash operation module (S404);

a plurality of DAG node data generating units (S5001 to S500N): a DAG node data loading module (S501), a SHA3 hashing module (S408), and a plurality of parallel computing modules (S5041 to S504N), each of which includes A temporary DAG node data module (S5031 to S503N) and a CACHE node index generating unit (S505) are constructed.

The one hash collision unit (S400) is connected to the plurality of DAG node data generating units (S5001 to S500N);

The SHA3 hash operation module (S401) in the one hash collision unit (S400) is connected to the storage module module (S402);

The DAG node index generating module (S403) in the one hash collision unit (S400) is connected to the storage module module (S402);

The FNV hash operation module (S404) in the one hash collision unit (S400) is connected to the storage module module (S402);

a DAG node data loading module (S501) in the plurality of DAG node data generating units (S5001 to S500N) is connected to the SHA3 hash computing module (S502);

a SHA3 hash operation module (S502) in the plurality of DAG node data generating units (S5001 to S500N) is connected to the parallel computing modules (S5041 to S504N);

The parallel computing modules (S5041 to S504N) of the plurality of DAG node data generating units (S5001 to S500N) are connected to the CACHE node index generating module (S505).

In this embodiment, a hash collision unit (S400) and a plurality of DAG node data generating units (S5001 to S500N) are taken as an example. Referring to FIG. 1 and FIG. 4, when performing the ETHASH algorithm, a header_hash value is used. Multiple nonce values are used to perform FNV hash operations and splicing operations to obtain multiple MIX values. These MIX values are stored in the on-chip storage module, and the DAG node index required to update these MIX values is calculated according to the MIX value. DAG node index, in parallel call multiple DAG node data generating units to generate required DAG node data, each DAG node data generating unit can calculate multiple DAG node data in parallel, and provide DAG node data to Kazakhstan After the collision unit updates the value of MIX, and generates the final MIX value, after performing data compression, splicing, FNV hash operation, etc., the final calculation result is generated, and the workload is proved according to the calculation result.

In the embodiment of the above system according to the present invention, the system implementation includes: a dedicated integrated system chip ASIC and a field programmable gate array FPGA, but the implementation manner is not limited to these types.

The above description may be implemented individually or in combination in various ways, and such modifications are within the scope of the invention.

The above embodiments are only intended to illustrate the technical solutions of the present invention and are not to be construed as limiting the invention. It should be understood by those skilled in the art that the present invention may be modified or equivalently substituted without departing from the spirit and scope of the invention.

Claims

An optimization system for a core computing component of a workload proof computing chip, characterized in that:

a core computing component comprising a plurality of basic components;

Each of the basic components includes: a hash collision unit and a plurality of DAG node data generating units respectively connected to the hash collision unit;

The hash collision unit includes: one or more SHA3 hash operation modules, one or more storage modules, one or more FNV hash operation modules, and one or more DAG node index generation modules; The SHA3 hash operation module, the FNV hash operation module, and the DAG node index generation module are respectively connected to the storage module;

The DAG node data generating unit includes: one or more DAG node data loading modules, one or more SHA3 hash computing modules, one or more DAG node data computing modules, one or more CACHEs Node index generation module;

The DAG node data loading module and the DAG node data computing module are respectively connected to the SHA3 hash computing module; the CAHCE node index generating module is connected to the DAG node data computing module;

The hash collision unit and the DAG node data generation unit are both designed by using a time division multiplexing structure; or the hash collision unit and the DAG node data generation unit are both designed by using a pipeline structure.
The system according to claim 1, wherein the number of said DAG node data generating units is not less than 128.
The system according to claim 2, wherein the number of said DAG node data generating units is 1024.
The system according to claim 1, wherein a plurality of said DAG node data generating units are time-multiplexed for one of said hash collision units.
The system according to claim 4, wherein the number of said DAG node data generating units that are time-multiplexed for one of said hash collision units is 64.
The system according to claim 1, wherein said DAG node data generating unit comprises: one or more temporary DAG sub-computing modules; wherein said plurality of temporary DAG sub-computing modules pair one DAG node data The generation unit is time-multiplexed.
The system according to claim 1, wherein the number of pipeline stages of the DAG node data generating unit is not less than eight.
The system of claim 1 wherein said storage module is a static random access memory.
A method for optimizing a core computing component of a workload verification computing chip, the system according to any one of claims 1-8, comprising the steps of:

A. The data transmitted from the host computer is hashed to obtain the DAG node index and stored;

B. The DAG node index is hashed to generate DAG node data;

C. Hash the DAG node data, and do the workload proof according to the operation result.
The method of claim 9 wherein said step A comprises:

The data transmitted from the host computer is hashed and simultaneously generates one or more DAG node indexes and stores them.
The method of claim 10 wherein the number of DAG node indices is 64.
The method according to claim 9, wherein the DAG node data in step B is node data of one or more DAG nodes generated at the same time.
The method according to claim 12, wherein the number of nodes of the DAG node data in step B is 256.
The method of claim 9 wherein said step C comprises:

The DAG node data is hashed, and one or more hash operations results for the workload proof are generated, and the workload proof is performed according to the hash operation result.
The method according to claim 14, wherein the number of hash operations is 256.