CN109062856B - Computing processing device and method, and electronic device - Google Patents

Computing processing device and method, and electronic device Download PDF

Info

Publication number
CN109062856B
CN109062856B CN201810786533.5A CN201810786533A CN109062856B CN 109062856 B CN109062856 B CN 109062856B CN 201810786533 A CN201810786533 A CN 201810786533A CN 109062856 B CN109062856 B CN 109062856B
Authority
CN
China
Prior art keywords
components
bus
data
read
memory
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810786533.5A
Other languages
Chinese (zh)
Other versions
CN109062856A (en
Inventor
王逵
杨存永
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bitmain Technologies Inc
Original Assignee
Bitmain Technologies Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bitmain Technologies Inc filed Critical Bitmain Technologies Inc
Priority to CN201810786533.5A priority Critical patent/CN109062856B/en
Priority to PCT/CN2018/114101 priority patent/WO2020015252A1/en
Publication of CN109062856A publication Critical patent/CN109062856A/en
Application granted granted Critical
Publication of CN109062856B publication Critical patent/CN109062856B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/58Random or pseudo-random number generators
    • G06F7/582Pseudo-random number generators

Abstract

The embodiment of the invention discloses a computing processing device and method and electronic equipment, wherein the device comprises a plurality of components which are interconnected through a bus; each part comprises a bus interface, and data segments in the same corresponding sequence in all elements in a constant pool participating in iterative computation and partial data in an initial data block needing iterative computation are stored; one of the components acquires the element serial number required to be read at this time, and broadcasts the element serial number required to be read at this time to other components through a bus; each part respectively reads the data segment in the element serial number which is stored by the part and needs to be read at this time, and performs hash calculation to obtain a calculation result to update the corresponding data in the stored initial data block; and iterating and executing the operation until the iteration times reach preset times, and obtaining a result data block corresponding to the initial data block. The embodiment of the invention realizes the iterative computation of the data block with low power consumption and high flux.

Description

Computing processing device and method, and electronic device
Technical Field
The present invention relates to data processing technologies, and in particular, to a computing processing apparatus and method, and an electronic device.
Background
In data calculation processing, a hash operation is often used. In a type of hash operation, iterative computation of data blocks is required. For example, in a Hashimato algorithm involved in an Ethernet (ETH) mine excavation, for a certain initial data block, according to a latest value of a certain element in the data block, an address obtained by specific calculation is used to fetch data from a constant pool in a memory to update the data block, and a result data block is obtained after a plurality of iterations.
In the process of implementing the present invention, through research, the inventors find that, in the prior art, the constant pool is located in a Memory, i.e., a Dynamic Random Access Memory (DRAM), the initial data block is located in a Static Random Access Memory (SRAM) inside a main control chip, such as a Central Processing Unit (CPU) or a Graphics Processing Unit (GPU), and the above iterative computation is performed on a computing device based on the main control chip. The memory and the main control chip communicate with each other through an inter-chip interconnection bus. The chip-to-chip interconnection bus adopts JEDEC standard, such as DDR4, GDDR5, GDDR6, HBM and the like. The main control chip sends an address to the memory, and the memory takes out the data from the constant pool and returns the data to the main control chip. Due to the high power consumption and limited bandwidth of the inter-chip interconnection bus, the efficiency of iterative computation of data blocks is low due to the fact that a large amount of data are transmitted.
Disclosure of Invention
The embodiment of the invention aims to solve the technical problem that: provided are a computing processing device and method and an electronic device.
According to an aspect of the embodiments of the present invention, there is provided a computing processing apparatus, including a plurality of components, wherein the components in the plurality of components are interconnected through a bus;
each component includes: a bus interface for said bus connection, storing: the method comprises the steps that data segments in the same corresponding sequence in all elements in a constant pool participating in iterative computation and partial data in a data block needing the iterative computation are obtained, wherein the constant pool comprises a plurality of elements; the constant pool and the data blocks are distributed among the plurality of components; one of the components acquires the element serial number required to be read at this time, and broadcasts the element serial number required to be read at this time to other components in the components through the bus;
each part of the plurality of parts respectively reads the data segment in the element serial number which is stored by the part and needs to be read this time, and performs hash calculation to obtain a calculation result to update corresponding data in the data block stored by the part;
and iteratively executing the operation of obtaining the element serial number required to be read at this time by one of the multiple components until the iteration frequency of the hash calculation reaches a preset frequency, and obtaining a result data block corresponding to the data block.
Alternatively, in the above embodiments of the apparatus, the plurality of members form a one-dimensional connection, a two-dimensional connection, a ring connection, or a star connection.
Optionally, in each of the above apparatus embodiments, the data block is specifically one or more data blocks.
Optionally, in the above device embodiments, each component further includes: a control unit, a first memory, a second memory and a calculation unit; wherein:
the control unit is used for acquiring the element serial numbers which need to be read at this time and broadcasting the element serial numbers which need to be read at this time to other components in the multiple components through the bus;
the computing unit is used for respectively reading the data segments in the element serial numbers which are stored by the computing unit and need to be read this time, and performing hash computation to obtain a computation result to update corresponding data in the data blocks stored by the computing unit;
the first memory is used for storing data segments which participate in iterative computation and have the same corresponding sequence in all elements in the constant pool;
and the second memory is used for storing partial data in the data block which needs to be subjected to iterative computation.
Optionally, in each of the above apparatus embodiments, the first memory includes a static random access memory SRAM, a dynamic random access memory DRAM, a magnetoresistive memory MRAM, or a memristor RRAM; and/or the second memory comprises an SRAM, or a dynamic random access memory DRAM, or a magnetoresistive memory MRAM, or a memristor RRAM.
Optionally, in each of the above apparatus embodiments, the bus interface includes: a serializer/deserializer SERDES interface, or a bus and interface standard PCIE interface, or a solid state technology association JEDEC interface, or an advanced extensible interface AXI bus interface, or a Wishbone bus interface;
correspondingly, the bus comprises a SERDES bus, a PCIE bus, a JEDEC standard bus, an AXI bus or a Wishbone bus.
Optionally, in each of the above device embodiments, the component is a chip; or, the component is a board card.
Optionally, in each of the above device embodiments, the plurality of components are processing units integrated on the same chip.
Optionally, in the above device embodiments, each component further includes: the device comprises a control unit, a memory interface, a second memory and a calculation unit; wherein:
the control unit is used for acquiring the element serial numbers which need to be read at this time and broadcasting the element serial numbers which need to be read at this time to other components in the multiple components through the bus;
the computing unit is used for reading the data segments in the element serial numbers needing to be read at this time from at least one external chip connected with the component and performing hash computation to obtain a computation result to update corresponding data in the data blocks stored in the computing unit;
the memory interface is used for connecting the at least one external chip, and a first memory on the at least one external chip is used for storing data segments which participate in iterative computation and have the same corresponding sequence in all elements in the constant pool;
and the second memory stores part of data in the data block which needs to be subjected to iterative computation.
Optionally, in each of the above apparatus embodiments, the preset number of times is 64, the number of the calculation units includes 64, and each calculation unit is respectively configured to perform one hash calculation in the iterative calculation.
Optionally, in each of the above apparatus embodiments, the plurality of components includes 2n components, where n is 1, 2, 4, or 8.
Optionally, in the above embodiments of the apparatus, each element includes a data segment with a length of 16 words, with each word length as a unit, and the data segment is evenly distributed in the 2n components in the order in the element; and/or the data block comprises 32 elements, the 32 elements are evenly distributed in the 2n parts in the order of the elements in the data block.
According to another aspect of the embodiments of the present invention, there is provided a computing processing method based on the computing processing apparatus according to any embodiment of the present invention, where the computing processing apparatus includes a plurality of components, and the components in the plurality of components are interconnected through a bus; each component includes: a bus interface for said bus connection, and storing: data segments in the same corresponding sequence in all elements in the constant pool participating in iterative computation and partial data in a data block needing the iterative computation; the constant pool comprises a plurality of elements; the constant pool and the data blocks are distributed among the plurality of components; the method comprises the following steps:
one of the components acquires the element serial number required to be read at this time, and broadcasts the element serial number required to be read at this time to other components in the components through the bus;
each part of the plurality of parts respectively reads the data segment in the element serial number which is stored by the part and needs to be read this time, and performs hash calculation to obtain a calculation result to update corresponding data in the data block stored by the part;
and iteratively executing the operation of obtaining the element serial number required to be read at this time by one of the multiple components until the iteration frequency of the hash calculation reaches a preset frequency, and obtaining a result data block corresponding to the data block.
Alternatively, in each of the above method embodiments, the plurality of members form a one-dimensional connection, a two-dimensional connection, a ring connection, or a star connection.
Optionally, in each of the above method embodiments, the data block is specifically one or more.
Optionally, in each of the above method embodiments, the method specifically includes:
the control unit in one of the components acquires the element serial number required to be read at this time, and broadcasts the element serial number required to be read at this time to the control units in other components in the multiple components through the bus;
the computing unit in each component respectively reads the data segment in the element serial number which is stored in the computing unit per se and needs to be read this time from the first memory of the component and carries out hash computation so as to obtain a computation result and update corresponding data in a data block stored in the second memory of the component;
the control unit identifies that the iteration times of the Hash calculation reach preset times;
responding to the iteration times of the Hash calculation reaching preset times, and obtaining a result data block corresponding to the data block;
otherwise, responding to the fact that the iteration times of the Hash calculation does not reach the preset times, and returning to execute the operation that the control unit in one component obtains the element serial number needing to be read at this time.
Optionally, in each of the above method embodiments, the first memory includes a static random access memory SRAM, or a dynamic random access memory DRAM, or a magnetoresistive memory MRAM, or a memristor RRAM; and/or the second memory comprises an SRAM, or a dynamic random access memory DRAM, or a magnetoresistive memory MRAM, or a memristor RRAM.
Optionally, in each of the above method embodiments, the bus interface includes: a serializer/deserializer SERDES interface, or a bus and interface standard PCIE interface, or a solid state technology association JEDEC interface, or an AXI bus interface, or a Wishbone bus interface;
correspondingly, the bus comprises a SERDES bus, a PCIE bus, a JEDEC standard bus, an advanced extensible interface AXI bus or a Wishbone bus.
Optionally, in each of the above method embodiments, the component is a chip; or, the component is a board card.
Optionally, in each of the above method embodiments, the method specifically includes:
the control unit in one of the components acquires the element serial number required to be read at this time, and broadcasts the element serial number required to be read at this time to the control units in other components in the multiple components through the bus;
the computing unit in each component respectively reads the data segment in the element serial number required to be read and stored in the computing unit from at least one external chip connected with the memory interface in the component, and performs hash calculation to obtain a calculation result to update corresponding data in a data block stored in a second memory of the component;
the control unit identifies that the iteration times of the Hash calculation reach preset times;
responding to the iteration times of the Hash calculation reaching preset times, and obtaining a result data block corresponding to the data block;
otherwise, responding to the fact that the iteration times of the Hash calculation does not reach the preset times, and returning to execute the operation that the control unit in one component obtains the element serial number needing to be read at this time.
Optionally, in each of the above method embodiments, the preset number of times is 64, the number of the calculation units includes 64, and each calculation unit is respectively configured to perform one hash calculation in the iterative calculation.
Optionally, in each of the above method embodiments, the plurality of components includes 2n components, where n is 1, 2, 4, or 8.
Optionally, in the above method embodiments, each element includes a data segment with a length of 16 words, with each word length as a unit, and the data segment is evenly distributed in the 2n components according to the order in the element; and/or the data block comprises 32 elements, the 32 elements are evenly distributed in the 2n parts in the order of the elements in the data block.
According to another aspect of the embodiments of the present invention, there is provided an electronic device including the computing processing apparatus according to any embodiment of the present invention.
Based on the computing processing device and method and the electronic device provided by the embodiments of the present invention, since the bus only broadcasts the element serial number and does not transmit data, the power consumption of the bus is low, and the required bandwidth is small, thereby avoiding the time delay caused by data transmission and improving the efficiency of iterative computation. In addition, the embodiment of the invention further realizes the main control chip and the memory through a computing processing device by a technical scheme of combining the functions of the main control chip and the memory into a whole, solves the technical problem of low efficiency of data block iterative computation caused by high power consumption and limited bandwidth of an inter-chip interconnection bus, and realizes the data block iterative computation with low power consumption and high flux.
The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description, serve to explain the principles of the invention.
The invention will be more clearly understood from the following detailed description, taken with reference to the accompanying drawings, in which:
FIG. 1 is a schematic structural diagram of a computing device according to an embodiment of the present invention.
FIG. 2 is a schematic structural diagram of another embodiment of a computing device according to the present invention.
FIG. 3 is a schematic structural diagram of a computing device according to another embodiment of the present invention.
FIG. 4 is a flow chart of an embodiment of a computing process of the present invention.
FIG. 5 is a flow chart of another embodiment of a computing process of the present invention.
FIG. 6 is a flow chart of a computing process according to yet another embodiment of the present invention.
Detailed Description
Various exemplary embodiments of the present invention will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, the numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present invention unless specifically stated otherwise.
Meanwhile, it should be understood that the sizes of the respective portions shown in the drawings are not drawn in an actual proportional relationship for the convenience of description.
The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the invention, its application, or uses.
Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.
Embodiments of the invention are operational with numerous other general purpose or special purpose computing system environments or configurations, and with numerous other electronic devices, such as terminal devices, computer systems, servers, etc. Examples of well known terminal devices, computing systems, environments, and/or configurations that may be suitable for use with electronic devices, such as terminal devices, computer systems, servers, and the like, include, but are not limited to: personal computer systems, server computer systems, thin clients, thick clients, hand-held or laptop devices, microprocessor-based systems, set-top boxes, programmable consumer electronics, networked personal computers, minicomputer systems, mainframe computer systems, distributed cloud computing environments that include any of the above, and the like.
Electronic devices such as terminal devices, computer systems, servers, etc. may be described in the general context of computer system-executable instructions, such as program modules, being executed by a computer system. Generally, program modules may include routines, programs, objects, components, logic, data structures, etc. that perform particular tasks or implement particular abstract data types. The computer system/server may be practiced in distributed cloud computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located in both local and remote computer system storage media including memory storage devices.
FIG. 1 is a schematic structural diagram of a computing device according to an embodiment of the present invention. As shown in fig. 1, the computing processing apparatus of this embodiment includes a plurality of components, and the components among the plurality of components are interconnected by a bus (also referred to as an address broadcast bus). In the calculation processing device according to each embodiment of the present invention, a plurality of components in the calculation processing device may be connected to each other in any interconnection manner, such as one-dimensional connection, two-dimensional connection, head-to-head annular connection, or one-to-many star connection. Fig. 1 only exemplarily shows an exemplary structure in which two adjacent components of the multiple components are interconnected through a bus, and the multiple components form a one-dimensional connection.
Wherein each component comprises: a bus interface for bus connection; there are stored: corresponding data segments in the same sequence in all elements in a constant pool (dataset) participating in iterative computation, and partial data in a data block needing iterative computation. The constant pool participating in the iterative computation comprises a plurality of elements, the constant pool and the data blocks (mix) are distributed in the plurality of components, and each component is responsible for storing a part of data segments in the elements in the constant pool. The data block in the embodiments of the present invention may be referred to as an initial data block before participating in the first iteration calculation.
One of the components acquires the element serial numbers in the constant pool which needs to be read at this time, and broadcasts the element serial numbers which need to be read at this time to other components in the components in sequence through the bus.
Each of the plurality of components respectively reads a data segment stored in the component serial number required to be read this time and performs hash calculation, for example, performs Hashimoto algorithm operation on the data segment, and updates corresponding data in the data block stored in itself according to a calculation result obtained by the hash calculation.
And iterating and executing the operation of acquiring the element serial number required to be read at this time by one of the multiple components until the iteration number of the hash calculation reaches a preset number, for example, 64 times. And obtaining a result data block corresponding to the data block. When the next element number is calculated on a component, it is broadcast to all other components, all components remaining in synchronous operation.
In each embodiment of the present invention, the number of the data blocks may be specifically one, or may be multiple. When there are a plurality of data blocks, the plurality of data blocks may be processed in parallel or concurrently.
Based on the calculation processing device provided by the above embodiment of the present invention, in the embodiment of the present invention, since the address broadcast bus only broadcasts the element serial number and does not transmit data, the power consumption of the bus is low, and the required bandwidth is small, thereby avoiding the time delay caused by data transmission and improving the efficiency of iterative calculation. In addition, the embodiment of the invention further realizes the main control chip and the memory through a computing processing device by a technical scheme of combining the functions of the main control chip and the memory into a whole, solves the technical problem of low efficiency of data block iterative computation caused by high power consumption and limited bandwidth of an inter-chip interconnection bus, and realizes the data block iterative computation with low power consumption and high flux. Because the area of the silicon chip is limited, the whole constant pool is distributed on a plurality of silicon chip components for storage, and the components are interconnected through an address broadcast bus. In an optional example of the computing processing apparatus according to embodiments of the present invention, the plurality of components may specifically include 2n components, where n is 1, 2, 4, or 8.
Each element in the constant pool comprises a data segment of 16 words in length, so 16 parts can be selected each to store a data segment of 1 word in length in each element, or 8 parts can be selected each to store a data segment of 2 words in length in each element, or 4 parts can be selected each to store a data segment of 4 words in length in each element, or 2 parts can be selected each to store a data segment of 8 words in length in each element. And (4) 64 times of iterative computation, wherein each iterative computation calculates which element in the constant pool is taken next time according to one of the 16 data segments, namely, each iterative computation determines the position (namely, the element serial number) of the element read next time by all the components in the constant pool by one of all the components, and the component broadcasts the element sequence number to all other components after determining.
In another alternative example of the computing processing device according to embodiments of the present invention, the constant pool includes tens of millions of elements, each element includes 64 bytes (byte), that is, 16 WORDs (WORD) long data segments, and the 16 WORD length data segments are evenly distributed in 2n units in the order in the element with each WORD length as a unit. In an optional hash calculation scheme, 32-bit (bit) words in two data segments with length of 8 and with the same sequence of the two data segments with length of 16 words can be taken out one by one, and hash calculation is performed on the words pair by pair to form a calculation result with the length of 16 words.
In addition, the data block includes 32 elements, and the 32 elements are evenly distributed in the 2n parts in the order in the data block in units of elements.
In some implementations of the computing processing device of various embodiments of the present invention, the component may be a chip.
In other embodiments of the computing processing apparatus according to the embodiments of the present invention, the component may be a board.
In another optional example of the computing processing apparatus according to embodiments of the present invention, the bus interface includes: a serializer/deserializer (SERDES) Interface, or a bus and Interface standard (PCIE) Interface, or a solid state technology association (JEDEC) Interface, or an Advanced eXtensible Interface (AXI) bus Interface, or a Wishbone bus Interface, among others. Correspondingly, the above-mentioned bus may include a SERDES bus, or a PCIE bus, or a JEDEC standard bus, or an AXI bus, or a Wishbone bus, and so on. The JEDEC interface is compatible with three generations of double data rate (DDR3), DDR4, DDR5, three generations of low power consumption (LPDDR3), LPDDR4, LPDDR5, five generations of graphics double data rate (GDDR5), GDDR5X, GDDR6, and the like. The Wishbone bus was first proposed by the silicon corporation and has now been relegated to OpenCores organization maintenance, which accomplishes the interconnection by establishing a common interface between IP cores. Can be used for interconnection among the soft core, the solid core and the hard core.
FIG. 2 is a schematic structural diagram of another embodiment of a computing device according to the present invention. The embodiment is described by taking the above components as chips as an example, and those skilled in the art can understand that the above components can be implemented by the same structure when they are board cards or other possible implementations. As shown in fig. 2, in this embodiment, compared to the embodiment shown in fig. 1, each component further includes: the device comprises a control unit, a first memory, a second memory and a calculation unit. Wherein:
and the control unit is used for acquiring the element serial number required to be read at this time and broadcasting the element serial number required to be read at this time to other parts in the plurality of parts through the bus.
And the computing unit is used for respectively reading the data segments in the element serial numbers which are stored by the computing unit and need to be read this time and performing hash computation to obtain a computation result to update corresponding data in the data blocks stored by the computing unit.
And the first memory is used for storing the data segments with the same corresponding sequence in all the elements in the constant pool participating in the iterative computation. In an optional example of the computing processing device according to embodiments of the present invention, the first memory is an on-chip memory, which may be, for example, an SRAM, or a Dynamic Random Access Memory (DRAM), or a magnetoresistive memory (MRAM), or a memristor (RRAM), or the like.
And the second memory is used for storing partial data in the data block which needs to be subjected to iterative computation. In an optional example of the computing processing device according to embodiments of the present invention, the second memory is an on-chip memory, which may be, for example, an SRAM, or a DRAM, or an MRAM, or an RRAM, or the like.
In still other embodiments of the computing processing device according to the embodiments of the present invention, the plurality of components may be processing units integrated on the same chip, that is, a plurality of components are integrated on the same chip, and each component is a processing unit on the chip.
FIG. 3 is a schematic structural diagram of a computing device according to another embodiment of the present invention. As shown in fig. 3, in this embodiment, compared to the embodiment shown in fig. 1, each component further includes: the device comprises a control unit, a memory interface, a second memory and a calculation unit. Wherein:
and the control unit is used for acquiring the element serial number required to be read at this time and broadcasting the element serial number required to be read at this time to other parts in the plurality of parts through the bus.
And the computing unit is used for reading the data segments in the element serial numbers which need to be read at this time from the first memories on at least one external chip connected with the components, and performing hash computation to obtain a computation result to update corresponding data in the data blocks stored in the computing unit.
And the memory interface is used for connecting at least one external chip outside the chip, for example, in a bus mode, and the first memory on the at least one external chip is used for storing the data segments in the same corresponding sequence in all the elements in the constant pool participating in the iterative computation. Fig. 3 only exemplarily shows an example that the memory interface is connected to one external chip, and those skilled in the art can know that each external chip is connected to the memory interface when the memory interface is connected to a plurality of external chips through the description of the embodiment.
And the second memory stores partial data in the data block which needs to be subjected to iterative computation. In an optional example of the computing processing device according to embodiments of the present invention, the second memory is an on-chip memory, which may be, for example, an SRAM, or a DRAM, or an MRAM, or an RRAM, or the like.
In an alternative example of the embodiments shown in fig. 2 and fig. 3, the preset number of times is 64, and the number of the calculation units includes 64, and each calculation unit is respectively used for performing one hash calculation.
FIG. 4 is a flow chart of an embodiment of a computing process of the present invention. The calculation processing method according to each embodiment of the present invention can be implemented based on the calculation processing device according to any one of the embodiments, for example, one calculation processing device for implementing the calculation processing method according to each embodiment of the present invention includes a plurality of components, the components in the plurality of components are interconnected by a bus, and the plurality of components may form any interconnection manner, such as one-dimensional connection, two-dimensional connection, head-to-head ring connection, or one-to-many star connection. Wherein each component comprises: a bus interface for bus connection; there are stored: the data segments in the same order in all the elements in the constant pool participating in the iterative computation correspond to the partial data in the data block needing the iterative computation. The constant pool participating in the iterative computation comprises a plurality of elements, and the constant pool and the data blocks are distributed in the plurality of components. In one optional example, the bus interface includes: SERDES interface, or PCIE interface, or JEDEC interface, or AXI bus interface, or Wishbone bus interface, etc. Correspondingly, the above-mentioned bus may include a SERDES bus, or a PCIE bus, or a JEDEC standard bus, or an AXI bus, or a Wishbone bus, and so on.
As shown in fig. 4, the calculation processing method of this embodiment includes:
and 102, acquiring the element serial numbers in the constant pool which needs to be read at this time by one of the components, and broadcasting the element serial numbers which need to be read at this time to other components in the components in sequence through a bus.
104, each component in the multiple components respectively reads the data segment in the element sequence number which is stored by the component and needs to be read at this time, and performs hash calculation, so that the obtained calculation result updates the corresponding data in the data block stored by the component.
And iterating to execute operation 102 until the iteration number of the hash calculation reaches a preset number, for example, 64 times, and obtaining a result data block corresponding to the data block.
The data block in the embodiments of the present invention may be referred to as an initial data block before participating in the first iteration calculation.
In each embodiment of the present invention, the number of the data blocks may be specifically one, or may be multiple. When there are a plurality of data blocks, the plurality of data blocks may be processed in parallel or concurrently.
Based on the technical scheme that the main control chip and the memory function are combined into one to realize the calculation processing method provided by the embodiment of the invention, the main control chip and the memory are realized by one calculation processing device, the technical problem that the efficiency of data block iterative calculation is not high due to high power consumption and limited bandwidth of an inter-chip interconnection bus is solved, and the data block iterative calculation with low power consumption and high flux is realized.
In an optional example of the calculation processing method according to each embodiment of the present invention, the plurality of components may specifically include 2n components, where a value of n is 1, 2, 4, or 8.
In another alternative example of the calculation processing method according to the embodiments of the present invention, each element includes data segments of 16 word lengths, each word length being one unit, and the data segments are evenly distributed in 2n parts in the order in the element. In addition, the data block includes 32 elements, and the 32 elements are evenly distributed in the 2n parts in the order in the data block in units of elements.
In some embodiments of the computing processing apparatus according to the embodiments of the present invention, the component may be a chip or a board.
FIG. 5 is a flow chart of another embodiment of a computing process of the present invention. In an embodiment, when the component is a chip or a board card, each component further includes: the device comprises a control unit, a first memory, a second memory and a calculation unit, wherein the calculation unit can be one or a plurality of units. The first memory or the second memory is an on-chip memory, and may be, for example, an SRAM, or a DRAM, or an MRAM, or an RRAM, or the like. In this embodiment, the technical solution of the embodiment of the present invention is described by taking an example that the preset number of times is 64 times, the calculation units include 64 units, each calculation unit is respectively configured to perform hash calculation once, and the data block includes 32 elements. However, those skilled in the art can know specific implementation of the technical solutions of the embodiments of the present invention in other alternative cases based on the descriptions of the embodiments of the present invention. As shown in fig. 5, the calculation processing method of the embodiment includes:
200, a control unit in each of the plurality of components, initializing a number of elements in a data block held in a second memory of the component in which it is located.
In an alternative example of embodiments of the present invention, the data block in the component may be initialized as follows: each value in the pre-generated or received sequence of random numbers is written sequentially to a respective memory location in the device for storing a number of elements in the data block. For example, in a mining service, the random number sequence may be computationally generated by a mine and sent to the miner.
202, the control unit in one of the components acquires the element serial number in the constant pool which needs to be read this time.
In an alternative of the embodiments of the present invention, one of the components may be determined according to a preset rule, for example, which component of a plurality of components starts to perform the operation 202 for the first time may be preset, or the first component starts to perform the operation 202 for the first time, and then the subsequent components perform the operation 202 in sequence according to a preset sequence.
For example, taking 2n parts with 64 times as the preset number as an example, because 64 is an integer multiple of 2n, the operation 202 is executed by the 0 th part, and the next time the operation 202 is executed by the next part, for example, 8 parts, namely, the operation 202 is executed by the 0 th part to the 7 th part in sequence, and then the operation 202 is executed by the 0 th part to the 7 th part again.
204, the control unit in the above-mentioned component broadcasts the element numbers in the constant pool that needs to be read this time to the control units in the components connected through the bus, and the control unit in the component broadcasts the element numbers in the constant pool that needs to be read this time to the control units in the other components connected through the bus, …, and broadcasts in sequence until the element numbers in the constant pool that needs to be read this time are broadcast to all the components.
206, the kth computing unit in each component respectively reads the data segment in the element serial number which is stored in the first memory of the component and needs to be read this time, and performs hash computation to obtain a computation result, so as to update the data in the plurality of elements in the data block stored in the second memory of the component, and complete the computation of the 32 elements in the data block.
And k represents the iteration times of the Hash calculation in the calculation processing method, and the value of k is any integer from 1 to 64.
The control unit in the component identifies whether the number of iterations of the hash calculation reaches 64 times 208.
And if the iteration times of the hash calculation reach 64 times, obtaining a result data block corresponding to the data block. Otherwise, if the iteration number of the hash calculation does not reach 64 times, returning to execute operation 202 until the iteration number of the hash calculation reaches 64 times, and obtaining a result data block corresponding to the data block.
In an alternative of the embodiments of the present invention, the components in the operation 208 may be determined according to a preset rule, for example, which component of the plurality of components starts to execute the operation 208 for the first time may be preset, or the first component starts to execute the operation 208 for the first time, and then the subsequent components execute the operation 208 in sequence according to a preset sequence.
In addition, when the above-mentioned components may be processing units integrated on the same chip, as shown in fig. 6, the calculation processing method according to another embodiment of the calculation processing method of the present invention includes:
a control unit in each of the plurality of components initializes a number of elements in a data block stored in a second memory of the component in which it is located 300.
In an alternative example of embodiments of the present invention, the data block in the component may be initialized as follows: each value in the pre-generated or received sequence of random numbers is written sequentially to a respective memory location in the device for storing a number of elements in the data block. For example, in a mining service, the random number sequence may be computationally generated by a mine and sent to the miner.
302, the control unit in one of the components acquires the element serial number in the constant pool which needs to be read this time.
In an alternative of the embodiments of the present invention, one of the components may be determined according to a preset rule, for example, which component of the plurality of components starts to perform the operation 302 for the first time may be preset, or the first component starts to perform the operation 302 for the first time, and then the subsequent components perform the operation 302 in sequence according to a preset sequence.
For example, taking 2n parts with 64 times as the preset number as an example, since 64 is an integer multiple of 2n, the operation 302 is performed by the 0 th part, and the next time the operation 302 is performed by replacing the next part, for example, 8 parts, the operation 302 is performed by sequentially going from the 0 th part to the 7 th part, and then returning to the 0 th part, and then performing the operation 302 by going from the 0 th part to the 7 th part.
304, the control unit in the above-mentioned one component broadcasts the element numbers in the constant pool that needs to be read this time to the control units in the components connected through the bus, and the control unit in this component broadcasts the element numbers in the constant pool that needs to be read this time to the control units in the other components connected through the bus, …, and broadcasts in sequence until the element numbers in the constant pool that needs to be read this time are broadcast to all the components.
And 306, the kth computing unit in each component respectively reads the data segment in the element serial number required to be read this time from at least one external chip connected with the memory interface in the component and performs hash computation, so that the obtained computation result updates the data in the plurality of elements in the data block stored in the second memory of the component, and the computation of the 32 elements in the data block is completed.
And k represents the iteration times of the Hash calculation in the calculation processing method, and the value of k is any integer from 1 to 64.
The control unit in the component identifies whether the number of iterations of the hash calculation reaches 64 times 308.
And if the iteration times of the hash calculation reach 64 times, obtaining a result data block corresponding to the data block. Otherwise, if the iteration number of the hash calculation does not reach 64 times, returning to execute operation 302 until the iteration number of the hash calculation reaches 64 times, and obtaining a result data block corresponding to the data block.
In an alternative of the embodiments of the present invention, the components in operation 308 may be determined according to a preset rule, for example, which component of a plurality of components starts to execute operation 308 for the first time may be preset, or the first component starts to execute operation 308 for the first time, and then the subsequent components execute operation 308 in sequence according to a preset sequence. In addition, an embodiment of the present invention further provides an electronic device, including the computing processing apparatus according to any of the above embodiments of the present invention.
Those of ordinary skill in the art will understand that: all or part of the steps for implementing the method embodiments may be implemented by hardware related to program instructions, and the program may be stored in a computer readable storage medium, and when executed, the program performs the steps including the method embodiments; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
In the present specification, the embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts in the embodiments are referred to each other. For the system embodiment, since it basically corresponds to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The method and apparatus of the present invention may be implemented in a number of ways. For example, the methods and apparatus of the present invention may be implemented in software, hardware, firmware, or any combination of software, hardware, and firmware. The above-described order for the steps of the method is for illustrative purposes only, and the steps of the method of the present invention are not limited to the order specifically described above unless specifically indicated otherwise. Furthermore, in some embodiments, the present invention may also be embodied as a program recorded in a recording medium, the program including machine-readable instructions for implementing a method according to the present invention. Thus, the present invention also covers a recording medium storing a program for executing the method according to the present invention.
The description of the present invention has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to practitioners skilled in this art. The embodiment was chosen and described in order to best explain the principles of the invention and the practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated.

Claims (24)

1. A computing processing apparatus comprising a plurality of components, components of the plurality of components being interconnected by a bus;
each component includes: a bus interface for said bus connection, storing: the method comprises the steps that data segments in the same corresponding sequence in all elements in a constant pool participating in iterative computation and partial data in a data block needing the iterative computation are obtained, wherein the constant pool comprises a plurality of elements; the constant pool and the data blocks are distributed among the plurality of components; one of the components acquires the element serial number required to be read at this time, and broadcasts the element serial number required to be read at this time to other components in the components through the bus;
each part of the plurality of parts respectively reads the data segment in the element serial number which is stored by the part and needs to be read this time, and performs hash calculation to obtain a calculation result to update corresponding data in the data block stored by the part;
and iteratively executing the operation of obtaining the element serial number required to be read at this time by one of the multiple components until the iteration frequency of the hash calculation reaches a preset frequency, and obtaining a result data block corresponding to the data block.
2. The device of claim 1, wherein the plurality of members form a one-dimensional connection, a two-dimensional connection, a ring connection, a star connection.
3. The apparatus according to claim 1 or 2, wherein the data blocks are embodied as one or more data blocks.
4. The apparatus of claim 1 or 2, wherein each component further comprises: a control unit, a first memory, a second memory and a calculation unit; wherein:
the control unit is used for acquiring the element serial numbers which need to be read at this time and broadcasting the element serial numbers which need to be read at this time to other components in the multiple components through the bus;
the computing unit is used for respectively reading the data segments in the element serial numbers which are stored by the computing unit and need to be read this time, and performing hash computation to obtain a computation result to update corresponding data in the data blocks stored by the computing unit;
the first memory is used for storing data segments which participate in iterative computation and have the same corresponding sequence in all elements in the constant pool;
and the second memory is used for storing partial data in the data block which needs to be subjected to iterative computation.
5. The apparatus of claim 4, wherein the first memory comprises a Static Random Access Memory (SRAM), or a Dynamic Random Access Memory (DRAM), or a magnetoresistive memory (MRAM), or a memristor (RRAM); and/or the second memory comprises an SRAM, or a dynamic random access memory DRAM, or a magnetoresistive memory MRAM, or a memristor RRAM.
6. The apparatus of claim 1 or 2, wherein the bus interface comprises: a serializer/deserializer SERDES interface, or a bus and interface standard PCIE interface, or a solid state technology association JEDEC interface, or an advanced extensible interface AXI bus interface, or a Wishbone bus interface;
correspondingly, the bus comprises a SERDES bus, a PCIE bus, a JEDEC standard bus, an AXI bus or a Wishbone bus.
7. The device of claim 1 or 2, wherein the component is a chip; or, the component is a board card.
8. The apparatus of claim 1 or 2, wherein the plurality of components are processing units integrated on a same chip.
9. The apparatus of claim 8, wherein each component further comprises: the device comprises a control unit, a memory interface, a second memory and a calculation unit; wherein:
the control unit is used for acquiring the element serial numbers which need to be read at this time and broadcasting the element serial numbers which need to be read at this time to other components in the multiple components through the bus;
the computing unit is used for reading the data segments in the element serial numbers needing to be read at this time from at least one external chip connected with the component and performing hash computation to obtain a computation result to update corresponding data in the data blocks stored in the computing unit;
the memory interface is used for connecting the at least one external chip, and a first memory on the at least one external chip is used for storing data segments which participate in iterative computation and have the same corresponding sequence in all elements in the constant pool;
and the second memory stores part of data in the data block which needs to be subjected to iterative computation.
10. The apparatus according to claim 4, wherein the preset number of times is 64, the number of the computing units is 64, and each computing unit is configured to perform one hash computation in the iterative computation.
11. The apparatus of claim 1 or 2, wherein the plurality of components comprises 2n components, wherein n has a value of 1, 2, 4, or 8.
12. The apparatus of claim 11, wherein each element comprises a data segment of 16 word lengths, with each word length as a unit, evenly distributed among the 2n parts in order among the elements; and/or the data block comprises 32 elements, the 32 elements are evenly distributed in the 2n parts in the order of the elements in the data block.
13. A computing processing method based on the computing processing device of any one of claims 1-12, wherein the computing processing device comprises a plurality of components, and the components in the plurality of components are interconnected through a bus; each component includes: a bus interface for said bus connection, and storing: data segments in the same corresponding sequence in all elements in the constant pool participating in iterative computation and partial data in a data block needing the iterative computation; the constant pool comprises a plurality of elements; the constant pool and the data blocks are distributed among the plurality of components; the method comprises the following steps:
one of the components acquires the element serial number required to be read at this time, and broadcasts the element serial number required to be read at this time to other components in the components through the bus;
each part of the plurality of parts respectively reads the data segment in the element serial number which is stored by the part and needs to be read this time, and performs hash calculation to obtain a calculation result to update corresponding data in the data block stored by the part;
and iteratively executing the operation of obtaining the element serial number required to be read at this time by one of the multiple components until the iteration frequency of the hash calculation reaches a preset frequency, and obtaining a result data block corresponding to the data block.
14. The method of claim 13, wherein the plurality of members form a one-dimensional connection, a two-dimensional connection, a ring connection, a star connection.
15. The method according to claim 13 or 14, wherein the data blocks are embodied as one or more data blocks.
16. The method according to claim 13 or 14, characterized in that the method comprises in particular:
the control unit in one of the components acquires the element serial number required to be read at this time, and broadcasts the element serial number required to be read at this time to the control units in other components in the multiple components through the bus;
the computing unit in each component reads the data segment in the element serial number which is stored in the computing unit per se and needs to be read this time from the first memory in the component and performs hash computation on the data segment so as to obtain a computing result, and updates corresponding data in a data block stored in the second memory of the component;
the control unit identifies that the iteration times of the Hash calculation reach preset times;
responding to the iteration times of the Hash calculation reaching preset times, and obtaining a result data block corresponding to the data block;
otherwise, responding to the fact that the iteration times of the Hash calculation does not reach the preset times, and returning to execute the operation that the control unit in one component obtains the element serial number needing to be read at this time.
17. The method of claim 16, wherein the first memory comprises a Static Random Access Memory (SRAM), or a Dynamic Random Access Memory (DRAM), or a magnetoresistive memory (MRAM), or a memristor (RRAM); and/or the second memory comprises an SRAM, or a dynamic random access memory DRAM, or a magnetoresistive memory MRAM, or a memristor RRAM.
18. The method of claim 13 or 14, wherein the bus interface comprises: a serializer/deserializer SERDES interface, or a bus and interface standard PCIE interface, or a solid state technology association JEDEC interface, or an advanced extensible interface AXI bus interface, or a Wishbone bus interface;
correspondingly, the bus comprises a SERDES bus, a PCIE bus, a JEDEC standard bus, an AXI bus or a Wishbone bus.
19. The method of claim 13 or 14, wherein the component is a chip; or, the component is a board card.
20. The method according to claim 13 or 14, characterized in that the method comprises in particular:
the control unit in one of the components acquires the element serial number required to be read at this time, and broadcasts the element serial number required to be read at this time to the control units in other components in the multiple components through the bus;
the computing unit in each component respectively reads the data segment in the element serial number required to be read at this time from at least one external chip connected with the memory interface in the component and performs hash computation to obtain a computation result to update corresponding data in a data block stored in a second memory of the component;
the control unit identifies that the iteration times of the Hash calculation reach preset times;
responding to the iteration times of the Hash calculation reaching preset times, and obtaining a result data block corresponding to the data block;
otherwise, responding to the fact that the iteration times of the Hash calculation does not reach the preset times, and returning to execute the operation that the control unit in one component obtains the element serial number needing to be read at this time.
21. The method according to claim 16, wherein the preset number of times is 64, the number of the computing units is 64, and each computing unit is respectively used for performing one hash computation in the iterative computation.
22. The method of claim 13 or 14, wherein the plurality of components comprises 2n components, wherein n has a value of 1, 2, 4, or 8.
23. The method of claim 22, wherein each element comprises a data segment of 16 word lengths, with each word length as a unit, evenly distributed among the 2n parts in order among the elements; and/or the data block comprises 32 elements, the 32 elements are evenly distributed in the 2n parts in the order of the elements in the data block.
24. An electronic device, characterized in that it comprises a computing processing means according to any one of claims 1 to 12.
CN201810786533.5A 2018-07-17 2018-07-17 Computing processing device and method, and electronic device Active CN109062856B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201810786533.5A CN109062856B (en) 2018-07-17 2018-07-17 Computing processing device and method, and electronic device
PCT/CN2018/114101 WO2020015252A1 (en) 2018-07-17 2018-11-06 Computation processing apparatus and method, and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810786533.5A CN109062856B (en) 2018-07-17 2018-07-17 Computing processing device and method, and electronic device

Publications (2)

Publication Number Publication Date
CN109062856A CN109062856A (en) 2018-12-21
CN109062856B true CN109062856B (en) 2021-09-21

Family

ID=64816982

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810786533.5A Active CN109062856B (en) 2018-07-17 2018-07-17 Computing processing device and method, and electronic device

Country Status (2)

Country Link
CN (1) CN109062856B (en)
WO (1) WO2020015252A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114861594B (en) * 2022-07-08 2022-09-20 英诺达(成都)电子科技有限公司 Low-power-consumption verification method, device, equipment and storage medium of chip

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7684563B1 (en) * 2003-12-12 2010-03-23 Sun Microsystems, Inc. Apparatus and method for implementing a unified hash algorithm pipeline
CN101739358A (en) * 2009-12-21 2010-06-16 东南大学 Method for dynamically allocating on-chip heterogeneous memory resources by utilizing virtual memory mechanism
CN104714782A (en) * 2012-12-05 2015-06-17 北京奇虎科技有限公司 Matrix data element identification serialization method and system
CN106776461A (en) * 2017-01-13 2017-05-31 算丰科技(北京)有限公司 Data processing equipment and server

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100332942A1 (en) * 2008-09-10 2010-12-30 Arm Limited Memory controller for NAND memory using forward error correction
US20160019339A1 (en) * 2014-07-06 2016-01-21 Mercator BioLogic Incorporated Bioinformatics tools, systems and methods for sequence assembly
CN105760324B (en) * 2016-05-11 2019-11-15 北京比特大陆科技有限公司 Data processing equipment and server

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7684563B1 (en) * 2003-12-12 2010-03-23 Sun Microsystems, Inc. Apparatus and method for implementing a unified hash algorithm pipeline
CN101739358A (en) * 2009-12-21 2010-06-16 东南大学 Method for dynamically allocating on-chip heterogeneous memory resources by utilizing virtual memory mechanism
CN104714782A (en) * 2012-12-05 2015-06-17 北京奇虎科技有限公司 Matrix data element identification serialization method and system
CN106776461A (en) * 2017-01-13 2017-05-31 算丰科技(北京)有限公司 Data processing equipment and server

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
AES加密算法的一种优化的FPGA实现方法;刘珍桢;《现代电子技术》;20071201;第103-106页 *
Optimization of high resolution PET iterative reconstruction with resolution modeling for image derived input function;Joseph Lewis等;《2012 IEEE Nuclear Science Symposium and Medical Imaging Conference Record (NSS/MIC)》;20130708;第3999-4004页 *
基于混合蛙跳和遗传规划的跨单元调度方法;贾凌云等;《自动化学报》;20150413;第42卷(第5期);第936-948页 *

Also Published As

Publication number Publication date
CN109062856A (en) 2018-12-21
WO2020015252A1 (en) 2020-01-23

Similar Documents

Publication Publication Date Title
US20190042611A1 (en) Technologies for structured database query for finding unique element values
CN111095223A (en) Method and system for implementing active persistent memory via memory bus
US20190068501A1 (en) Throttling for bandwidth imbalanced data transfers
CN102654827A (en) First-in first-out buffer and data caching method
CN107919943A (en) Coding, coding/decoding method and the device of binary data
US7609574B2 (en) Method, apparatus and system for global shared memory using serial optical memory
CN109062856B (en) Computing processing device and method, and electronic device
US11740791B2 (en) Data compression system using base values and methods thereof
CN107451075B (en) Data processing chip and system, data storage forwarding and reading processing method
US7609575B2 (en) Method, apparatus and system for N-dimensional sparse memory using serial optical memory
US20180048732A1 (en) Techniques for storing or accessing a key-value item
CN115190102B (en) Information broadcasting method, information broadcasting device, electronic unit, SOC (system on chip) and electronic equipment
US10305509B2 (en) Compression of frequent data values across narrow links
CN113420860A (en) Memory smart card, device, network, method and computer storage medium
CN107577625B (en) Data processing chip and system, and data storing and forwarding processing method
US9785592B2 (en) High density mapping for multiple converter samples in multiple lane interface
CN107643991B (en) Data processing chip and system, and data storing and forwarding processing method
TW201710885A (en) Hardware loading adjusting method and related electronic device
US11947512B2 (en) Feedback-based inverted index compression
CN113986134B (en) Method for storing data, method and device for reading data
US11704271B2 (en) Scalable system-in-package architectures
TWI764311B (en) Memory access method and intelligent processing apparatus
US20230315654A1 (en) Method of ring allreduce processing
US11899953B1 (en) Method of efficiently identifying rollback requests
US9442661B2 (en) Multidimensional storage array and method utilizing an input shifter to allow an entire column or row to be accessed in a single clock cycle

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant