CN113792009A - Self-adaptive reconfigurable processing array and master control interaction method and device - Google Patents

Self-adaptive reconfigurable processing array and master control interaction method and device Download PDF

Info

Publication number
CN113792009A
CN113792009A CN202110861861.9A CN202110861861A CN113792009A CN 113792009 A CN113792009 A CN 113792009A CN 202110861861 A CN202110861861 A CN 202110861861A CN 113792009 A CN113792009 A CN 113792009A
Authority
CN
China
Prior art keywords
array
module
reconfigurable processing
processing array
register
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110861861.9A
Other languages
Chinese (zh)
Inventor
尹首一
钟鸣
谷江源
韩慧明
刘雷波
魏少军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN202110861861.9A priority Critical patent/CN113792009A/en
Publication of CN113792009A publication Critical patent/CN113792009A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7839Architectures of general purpose stored program computers comprising a single central processing unit with memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7867Architectures of general purpose stored program computers comprising a single central processing unit with reconfigurable architecture
    • G06F15/7871Reconfiguration support, e.g. configuration loading, configuration switching, or hardware OS

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Advance Control (AREA)

Abstract

The invention discloses a self-adaptive reconfigurable processing array and main control interaction method and a device, wherein the device comprises a control type processing unit which is arranged on a reconfigurable processing array and used for replacing the read-write of a main control to a global register GR in a coprocessor interface, and the carrying of data and the execution of the array are realized. The invention greatly reduces the coupling degree of the array and the master control, shortens the execution time of the application, greatly improves the computing capacity and the computing performance, meets the requirements of the computing performance of the application, and is very suitable for being applied to hardware acceleration design aiming at data intensive and computing intensive applications.

Description

Self-adaptive reconfigurable processing array and master control interaction method and device
Technical Field
The invention relates to the technical field of large-scale integrated circuits, in particular to a self-adaptive reconfigurable processing array and master control interaction method and device.
Background
This section is intended to provide a background or context to the embodiments of the invention that are recited in the claims. The description herein is not admitted to be prior art by inclusion in this section.
The coprocessor interface module is a bridge for interaction of a main control and the reconfigurable processing array, the coprocessor interface can read and write a shared global register on the reconfigurable processing array, and also comprises 10 global registers facing the function control of the reconfigurable processing array, and the 10 registers are invisible to a processing unit in the reconfigurable processing array. The coprocessor interface module analyzes the numerical values of the 10 registers to generate an enabling signal and control tasks such as reconfigurable processing array calculation, data transportation, configuration transportation and the like.
Before the addition of control type processing units, the values of these control oriented special global registers are written by the master responsibility. A large number of clock cycles are wasted when the main controller writes a value into one global register every time, and values are written into a plurality of special global registers when a reconfigurable processing array starting task or a data carrying task is initiated once. These tasks are iterated many times when an application is executed, so the resulting time overhead will show an order of magnitude increase.
Disclosure of Invention
The embodiment of the invention provides a self-adaptive reconfigurable processing array and master control interaction method, which comprises the following steps:
a control type processing unit is added on the reconfigurable processing array to replace the read-write of a master control to a global register GR in a coprocessor interface, so that the data handling and the array execution are realized.
The embodiment of the invention also provides a self-adaptive reconfigurable processing array and main control interaction device, which comprises: and the control type processing unit is arranged on the reconfigurable processing array and used for replacing the read-write of the master control to the global register GR in the coprocessor interface, and realizing the data transportation and the array execution.
In the embodiment of the invention, compared with the technical scheme that tasks such as calculation, data transportation, configuration transportation and the like of the reconfigurable processing array are all controlled by the master control in the prior art, the control type processing unit is arranged on the reconfigurable processing array and is used for replacing the read-write of the master control on the global register GR in the coprocessor interface, so that the data transportation and the array execution are realized. The method can greatly reduce the coupling degree of the array and the master control, shorten the execution time of the application, greatly improve the computing capacity and the computing performance, meet the requirements of the computing performance of the application, and is very suitable for being applied to hardware acceleration design aiming at data intensive and computing intensive applications.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts. In the drawings:
FIG. 1 is an architecture diagram of a reconfigurable processor block RPU (reconfigurable processor unit);
FIG. 2 is a schematic diagram of the operational flow required by the main control before the task of the transport configuration is performed;
fig. 3 is a schematic view of an operation flow that the master needs to perform before data is transferred to the share memory;
fig. 4 is an architecture diagram of an adaptive reconfigurable processing array and a master control interaction device in an embodiment of the present invention, that is, an architecture diagram of a reconfigurable array after adding a control PE;
FIG. 5 is a diagram of a control type PE architecture according to an embodiment of the present invention;
FIG. 6 is a diagram illustrating a format of control type PE configuration information according to an embodiment of the present invention;
FIG. 7 is a diagram illustrating the process of performing a 512-point original FFT;
FIG. 8 is a diagram illustrating a 512-point FFT execution process according to an embodiment of the present invention;
fig. 9 is a diagram illustrating configuration information executed by a control PE according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the embodiments of the present invention are further described in detail below with reference to the accompanying drawings. The exemplary embodiments and descriptions of the present invention are provided to explain the present invention, but not to limit the present invention.
Fig. 1 is an architecture diagram of a reconfigurable processor block rpu (reconfigurable processor unit), and fig. 1 is an architecture diagram of a reconfigurable processor block rpu (reconfigurable processor unit). An RPU has four PEAs (processing element array). The master RISCV (RISCV processor) interacts with PEA through the coprocessor interface. Data Cache (Data Cache) stores Data, and sends the Data to a shared memory module (share memory) on the PEA array through an AHB bus. The Context Cache stores configuration information, and the configuration information is distributed to each processing unit (processing element) on the PEA array through an AHB (Cache-PEA array bridge) and a configuration Control module (Config _ Control) on the PEA array. GR: a global register. PE: a processing element. CM: context memory, configuration information memory, storing configuration information of PE.
The global register GR (global register) storing the control of the PEA function is located in the coprocessor Interface (coprocessor Interface). The GR32 stores a starting control signal, the GR35 stores a base address of data in a Cache, the GR36 stores a length of data carried from the Cache, the GR33 stores a base address of configuration information in the Cache, the GR34 stores a length of the configuration information carried from the Cache, the GR37 stores a base address of a read-write share memory, and the GR39 stores finish signals of data carrying completion, configuration carrying completion and PEA calculation completion.
As shown in fig. 2, the operations required for the main control before the tasks of the transport configuration are performed. The master needs to write the base address of the configuration information in the cache into the GR33 through the coprocessor interface, write the length of the carried configuration information into the GR34, and write an enable signal for starting the carrying of the configuration information into the GR 32. After the global registers are written, the carrying of the configuration information is started.
As shown in fig. 3, the master needs to do the operation before carrying data to the share memory. The master needs to write the base address of the data in the cache into the GR35 through the coprocessor interface, write the length of the transferred data into the GR36, write the base address of the read-write share memory into the GR37, and write the enable signal for starting data transfer into the GR 32. After the global registers are written, the data transfer is started.
A master will take thousands of clock cycles each time it writes data to a GR, and one task needs to write multiple GRs. It follows that the time overhead for performing a task is huge.
In view of the above problems, in order to reduce the coupling between the array and the master, as shown in fig. 4, a control PE (Ctrl _ PE) is added to a PEA (processing element array) array to replace the read/write of the master on GR, and to automatically calculate information such as address offset and transport length, and further control the execution of various tasks (Exec _ PE).
Fig. 5 is an architecture diagram of the control PE, where the configuration information storage module (Config _ memory) stores at most 16 pieces of configuration information, and the Local register (Local _ Regfile) can store 8 pieces of data. The control type PE supports four levels of pipelines of read configuration (control), decode (decode), Execute (Execute), and write back (writeback). The execution (Execute) module comprises common operations of addition, subtraction, AND, OR, NOT, XOR, equality, shift and the like to support the flexible address jump mode. But also includes special operation "wait" operations performed for the control array. The "wait" operation constantly reads the signal from GR39 to detect whether or not the execution of the task such as data transfer and array calculation is completed.
FIG. 6 shows a configuration information format for a control type PE, which has two inputs, input1 and input2, each supporting multiple data sources. input1 has access to the GR of the coprocessor interface, the result of the last operation, and the local registers. input2 has access to PEA array shared registers, the results of the last operation, and local registers. When the Imm immediate enable field is set to 1, the Input2 field is an immediate and participates in the operation. Output out1 also supports a variety of output destinations, including coprocessor interface registers, PEA array shared registers, and local registers. The control type PE supports iteration and pause and is used for supporting flexible and complex control functions.
The technology of the invention adds a control type PE on the basis of the original PEA array to replace a main control to finish tasks such as data handling, array starting and the like, thereby greatly reducing the coupling degree of the array and the main control, shortening the execution time of application, greatly improving the computing capacity and the computing performance, meeting the requirements of the application computing performance, and being very suitable for being applied to hardware acceleration design aiming at data intensive and computing intensive application.
The adaptive reconfigurable processing array and master control interaction method and device provided by the invention are analyzed by examples.
Fig. 7 shows a 512-point FFT implementation, which is also the implementation when no control PE is added. It can be seen from the figure that each layer of the FFT requires the master to control the data handling and array execution, which results in a huge time overhead. The 9-layer FFT is performed for a total of 35.8 tens of thousands ns.
As shown in fig. 8, the process of performing 512-point FFT after adding control PE. Compared with the original edition, except that the data required by calculation is prestored in the GR of the coprocessor interface, the whole execution process is greatly simplified. After the configuration is carried out and the complete local register is written, the main control can complete the iteration of the 9-layer FFT by the control type PE only by enabling the control type PE, and the main control does not need to be responsible for carrying of each layer of data and starting of iterative computation frequently as before, so that the coupling degree of the main control and the PEA array is reduced. The control PE completes the handling of each layer of data and the initiation of iterative computations by means of the configuration information as shown in fig. 9. The running time of the 512-point FFT is 1.56 ten thousand ns, and compared with the prior version, the operation time is shortened by 95.648%.
The embodiment of the invention also provides a self-adaptive reconfigurable processing array and master control interaction method, which is described in the following embodiment. Because the principle of solving the problems of the method is similar to that of the adaptive reconfigurable processing array and the main control interaction device, the implementation of the device can refer to the implementation of the adaptive reconfigurable processing array and the main control interaction device, and repeated parts are not repeated.
The self-adaptive reconfigurable processing array and master control interaction method comprises the following steps:
a control type processing unit is added on the reconfigurable processing array to replace the read-write of a master control to a global register GR in a coprocessor interface, so that the data handling and the array execution are realized.
In the embodiment of the present invention, the control type processing unit includes a configuration information storage module, a local register, a configuration module, a decoding module, an execution module, and a write-back module;
the configuration information storage module is used for storing configuration information;
the local register is used for storing data;
the configuration module, the decoding module, the execution module and the write-back module are used for realizing four-level flow of reading configuration, decoding, execution and write-back.
In the embodiment of the present invention, the execution module is further configured to implement a wait operation, where the wait operation detects whether the execution of the tasks such as data handling and array computing is completed by reading a finish signal in the global register GR 39.
In the embodiment of the present invention, the control type processing unit configures an information format as follows:
the information format comprises two input1 and input2, and supports various data sources;
the input1 accesses the global register GR, the last operation result, and the local register of the coprocessor interface;
input2 accesses the shared register, the last operation result and the local register on the reconfigurable processing array PEA;
the immediate enable field is included, and when the immediate enable field is set to 1, the input2 field is input as an immediate to participate in the operation.
In the embodiment of the present invention, the control type processing unit configures an information format as follows:
the information format includes output out1, supporting a variety of output destinations, including global registers in the coprocessor interface, shared registers on the reconfigurable processing array PEA array, and local registers.
In the embodiment of the invention, compared with the technical scheme that tasks such as calculation, data transportation, configuration transportation and the like of the reconfigurable processing array are all controlled by the master control in the prior art, the control type processing unit is arranged on the reconfigurable processing array and is used for replacing the read-write of the master control on the global register GR in the coprocessor interface, so that the data transportation and the array execution are realized. The method can greatly reduce the coupling degree of the array and the master control, shorten the execution time of the application, greatly improve the computing capacity and the computing performance, meet the requirements of the computing performance of the application, and is very suitable for being applied to hardware acceleration design aiming at data intensive and computing intensive applications.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (10)

1. A self-adaptive reconfigurable processing array and master control interaction method is characterized by comprising the following steps:
a control type processing unit is added on the reconfigurable processing array to replace the read-write of a master control to a global register GR in a coprocessor interface, so that the data handling and the array execution are realized.
2. The adaptive reconfigurable processing array and master interaction method of claim 1, wherein the control type processing unit comprises a configuration information storage module, a local register, a configuration module, a decoding module, an execution module, and a write-back module;
the configuration information storage module is used for storing configuration information;
the local register is used for storing data;
the configuration module, the decoding module, the execution module and the write-back module are used for realizing four-level flow of reading configuration, decoding, execution and write-back.
3. The adaptive reconfigurable processing array and master control interaction method according to claim 2, wherein the execution module is further configured to implement a wait operation, wherein the wait operation detects whether tasks such as data handling and array computation are completed by reading a finish signal in a global register GR 39.
4. The adaptive reconfigurable processing array and master interaction method of claim 2, wherein the control type processing unit configures an information format as follows:
the information format comprises two input1 and input2, and supports various data sources;
the input1 accesses the global register GR, the last operation result, and the local register of the coprocessor interface;
input2 accesses the shared register, the last operation result and the local register on the reconfigurable processing array PEA;
the immediate enable field is included, and when the immediate enable field is set to 1, the input2 field is input as an immediate to participate in the operation.
5. The adaptive reconfigurable processing array and master interaction method of claim 4, wherein the control type processing unit configures an information format as follows:
the information format includes output out1, supporting a variety of output destinations, including global registers in the coprocessor interface, shared registers on the reconfigurable processing array PEA array, and local registers.
6. An adaptive reconfigurable processing array and master interaction device, comprising: and the control type processing unit is arranged on the reconfigurable processing array and used for replacing the read-write of the master control to the global register GR in the coprocessor interface, and realizing the data transportation and the array execution.
7. The adaptive reconfigurable processing array and master interaction device of claim 6, wherein the control type processing unit comprises a configuration information storage module, a local register, a configuration module, a decoding module, an execution module, and a write-back module;
the configuration information storage module is used for storing configuration information;
the local register is used for storing data;
the configuration module, the decoding module, the execution module and the write-back module are used for realizing four-level flow of reading configuration, decoding, execution and write-back.
8. The adaptive reconfigurable processing array and master interaction device of claim 7, wherein the execution module is further configured to implement a wait operation, wherein the wait operation detects whether tasks such as data handling and array computation are completed by reading a finish signal in a global register GR 39.
9. The adaptive reconfigurable processing array and master interaction device of claim 8, wherein the control-type processing unit configures an information format as follows:
the information format comprises two input1 and input2, and supports various data sources;
the input1 accesses the global register GR, the last operation result, and the local register of the coprocessor interface;
input2 accesses the shared register, the last operation result and the local register on the reconfigurable processing array PEA;
the immediate enable field is included, and when the immediate enable field is set to 1, the input2 field is input as an immediate to participate in the operation.
10. The adaptive reconfigurable processing array and master interaction device of claim 9, wherein the control type processing unit configures an information format as follows:
the information format includes output out1, supporting a variety of output destinations, including global registers in the coprocessor interface, shared registers on the reconfigurable processing array PEA array, and local registers.
CN202110861861.9A 2021-07-29 2021-07-29 Self-adaptive reconfigurable processing array and master control interaction method and device Pending CN113792009A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110861861.9A CN113792009A (en) 2021-07-29 2021-07-29 Self-adaptive reconfigurable processing array and master control interaction method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110861861.9A CN113792009A (en) 2021-07-29 2021-07-29 Self-adaptive reconfigurable processing array and master control interaction method and device

Publications (1)

Publication Number Publication Date
CN113792009A true CN113792009A (en) 2021-12-14

Family

ID=78877063

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110861861.9A Pending CN113792009A (en) 2021-07-29 2021-07-29 Self-adaptive reconfigurable processing array and master control interaction method and device

Country Status (1)

Country Link
CN (1) CN113792009A (en)

Similar Documents

Publication Publication Date Title
US9639371B2 (en) Solution to divergent branches in a SIMD core using hardware pointers
CN114391135A (en) Method for performing in-memory processing operations on contiguously allocated data, and related memory device and system
GB2474901A (en) Multiply-accumulate instruction which adds or subtracts based on a predicate value
US20100318764A1 (en) System and method for managing processor-in-memory (pim) operations
US10761851B2 (en) Memory apparatus and method for controlling the same
US8572355B2 (en) Support for non-local returns in parallel thread SIMD engine
CN111752530A (en) Machine learning architecture support for block sparsity
US9513923B2 (en) System and method for context migration across CPU threads
WO2021046568A1 (en) Methods for performing processing-in-memory operations, and related memory devices and systems
JP3573506B2 (en) Computer system and method for solving predicates and Boolean expressions
US10152328B2 (en) Systems and methods for voting among parallel threads
US8413151B1 (en) Selective thread spawning within a multi-threaded processing system
CN113792009A (en) Self-adaptive reconfigurable processing array and master control interaction method and device
US20200319885A1 (en) Vector add-with-carry instruction
US20220300326A1 (en) Techniques for balancing workloads when parallelizing multiply-accumulate computations
US20100318769A1 (en) Using vector atomic memory operation to handle data of different lengths
US20210150311A1 (en) Data layout conscious processing in memory architecture for executing neural network model
US11822541B2 (en) Techniques for storing sub-alignment data when accelerating Smith-Waterman sequence alignments
KR102644951B1 (en) Arithmetic Logic Unit Register Sequencing
US20230101085A1 (en) Techniques for accelerating smith-waterman sequence alignments
CN113254078B (en) Data stream processing method for efficiently executing matrix addition on GPDPU simulator
US11669489B2 (en) Sparse systolic array design
US11640302B2 (en) SMID processing unit performing concurrent load/store and ALU operations
WO2020246598A1 (en) Calculation device, calculation method, and calculation program
TW202203000A (en) Method for in-memory computing and system for computing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination