CN112433773B - Configuration information recording method and device for reconfigurable processor - Google Patents

Configuration information recording method and device for reconfigurable processor Download PDF

Info

Publication number
CN112433773B
CN112433773B CN202011465401.6A CN202011465401A CN112433773B CN 112433773 B CN112433773 B CN 112433773B CN 202011465401 A CN202011465401 A CN 202011465401A CN 112433773 B CN112433773 B CN 112433773B
Authority
CN
China
Prior art keywords
configuration information
field
operation type
recording
bit field
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011465401.6A
Other languages
Chinese (zh)
Other versions
CN112433773A (en
Inventor
尹首一
谢思敏
谷江源
钟鸣
罗列
张淞
韩慧明
刘雷波
魏少军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN202011465401.6A priority Critical patent/CN112433773B/en
Publication of CN112433773A publication Critical patent/CN112433773A/en
Application granted granted Critical
Publication of CN112433773B publication Critical patent/CN112433773B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/4401Bootstrapping
    • G06F9/4403Processor initialisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention provides a method and a device for recording configuration information of a reconfigurable processor, wherein the method comprises the following steps: acquiring configuration information of a reconfigurable processor; recording the configuration information by adopting a predefined configuration information format; the predefined configuration information format is a configuration information format meeting the requirement of a preset length, and comprises an extension bit of the configuration information length and a plurality of reserved bits of configuration information codes. The invention can describe the configuration information format of the reconfigurable processor, and has strong expansibility and high flexibility.

Description

Configuration information recording method and device for reconfigurable processor
Technical Field
The invention relates to the technical field of computer hardware, in particular to a method and a device for recording configuration information of a reconfigurable processor.
Background
With the rise of big data, cloud computing and artificial intelligence technologies, the realization of high-performance computing is a problem to be considered firstly, but the sales volume of special chips is far insufficient to cover the research and development cost, the performance of a processor cannot depend on the development of the process manufacturing level when the process manufacturing level reaches a certain order of magnitude, and a new solution needs to be searched on the aspects of computing mode and architecture innovation. Reconfigurable processors are high performance processors that are intermediate between general purpose processors and special purpose processors, emphasizing the reuse of resources, seeking high performance and efficiency similar to ASICs. Meanwhile, different task requirements can be met by changing the function configuration information, so that the method has the same flexibility as a general processor. The reconfigurable processing technology combines the advantages of the two.
In order to pursue high performance and energy efficiency similar to those of a special processor and enable the special processor to process operations with high data parallelism such as intensive computation and the like, configuration information of a reconfigurable processor needs to realize resource multiplexing, and because similar instructions and data have similar decoding or access modes, redundancy may exist when decoding and addressing are repeatedly carried out. By adjusting the calculation execution sequence, the same or regular decoding and addressing operation in the application is continuously executed for iteration, and the decoding is executed for a plurality of times only once, so that the aim of saving the processing time or reducing the power consumption can be fulfilled. However, the cured structure makes the asic less scalable and flexible, and thus cannot meet the ever-evolving application requirements.
Therefore, a description scheme of a configuration information format of a reconfigurable processor with strong expansibility and flexibility is needed at present.
Disclosure of Invention
The embodiment of the invention provides a configuration information recording method of a reconfigurable processor, which is used for describing the configuration information format of the reconfigurable processor, and has strong expansibility and high flexibility, and the method comprises the following steps:
acquiring configuration information of a reconfigurable processor;
recording the configuration information by adopting a predefined configuration information format;
the predefined configuration information format is a configuration information format meeting the requirement of preset length;
the configuration information comprises top layer type configuration information, ALU operation type configuration information and access operation type configuration information;
the predefined configuration information formats comprise a top layer type configuration information format for recording top layer type configuration information, an ALU operation type configuration information format for recording ALU operation type configuration information and a memory access operation type configuration information format for recording memory access operation type configuration information;
the predefined configuration information format comprises a plurality of fields, each field being in a different bit field; the preset length requirement is 64 bits;
the top-level configuration information format comprises an extension bit field, a reserved bit field and an actual coding field;
the ALU operation type configuration information format comprises an extension bit field and an actual encoding field;
the configuration information format of the access operation type comprises an extension bit field, a reserved bit field and an actual coding field.
The embodiment of the invention provides a configuration information recording device of a reconfigurable processor, which is used for describing the configuration information format of the reconfigurable processor, has strong expansibility and high flexibility, and comprises the following components:
the configuration information acquisition module is used for acquiring configuration information of the reconfigurable processor;
the configuration information recording module is used for recording the configuration information by adopting a predefined configuration information format;
the predefined configuration information format is a configuration information format meeting the requirement of preset length;
the configuration information comprises top layer type configuration information, ALU operation type configuration information and access operation type configuration information;
the predefined configuration information formats comprise a top layer type configuration information format for recording top layer type configuration information, an ALU operation type configuration information format for recording ALU operation type configuration information and a memory access operation type configuration information format for recording memory access operation type configuration information;
the predefined configuration information format comprises a plurality of fields, each field being in a different bit field; the preset length requirement is 64 bits;
the top-level configuration information format comprises an extension bit field, a reserved bit field and an actual coding field;
the ALU operation type configuration information format comprises an extension bit field and an actual encoding field;
the configuration information format of the access operation type comprises an extension bit field, a reserved bit field and an actual coding field.
The embodiment of the invention also provides computer equipment which comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein the processor realizes the configuration information recording method of the reconfigurable processor when executing the computer program.
An embodiment of the present invention further provides a computer-readable storage medium, in which a computer program for executing the configuration information recording method of the reconfigurable processor is stored.
In the embodiment of the invention, the configuration information of the reconfigurable processor is collected; recording the configuration information by adopting a predefined configuration information format; the predefined configuration information format is a configuration information format meeting the requirement of a preset length, and comprises an extension bit of the configuration information length and a plurality of reserved bits of configuration information codes. In the process, the configuration information format meeting the requirement of the preset length can be effectively used in a coarse-grained reconfigurable processor; the expansion of the configuration information can be realized by the expansion bit of the length of the configuration information and the reserved bits of a plurality of configuration information codes, so that the method has high flexibility and strong expansibility.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts. In the drawings:
FIG. 1 is a flowchart of a method for recording configuration information of a reconfigurable processor according to an embodiment of the present invention;
FIG. 2 is a diagram illustrating the classification of configuration information and configuration information formats according to an embodiment of the present invention;
FIG. 3 is a schematic structural diagram of a predefined configuration information format according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a configuration information recording apparatus of a reconfigurable processor according to an embodiment of the present invention;
FIG. 5 is a diagram of a computer device in an embodiment of the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the embodiments of the present invention are further described in detail below with reference to the accompanying drawings. The exemplary embodiments and descriptions of the present invention are provided to explain the present invention, but not to limit the present invention.
In the description of the present specification, the terms "comprising," "including," "having," "containing," and the like are used in an open-ended fashion, i.e., to mean including, but not limited to. Reference to the description of the terms "one embodiment," "a particular embodiment," "some embodiments," "for example," etc., means that a particular feature, structure, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. The sequence of steps involved in the embodiments is for illustrative purposes to illustrate the implementation of the present application, and the sequence of steps is not limited and can be adjusted as needed.
Fig. 1 is a flowchart of a configuration information recording method of a reconfigurable processor according to an embodiment of the present invention, as shown in fig. 1, the method includes:
step 101, acquiring configuration information of a reconfigurable processor;
step 102, recording the configuration information by adopting a predefined configuration information format;
the predefined configuration information format is a configuration information format meeting the requirement of a preset length, and comprises an extension bit of the configuration information length and a plurality of reserved bits of configuration information codes.
In the embodiment of the invention, the configuration information format meeting the requirement of the preset length can be effectively used in a coarse-grained reconfigurable processor; the expansion of the configuration information can be realized by the expansion bit of the length of the configuration information and the reserved bits of a plurality of configuration information codes, so that the method has high flexibility and strong expansibility.
In specific implementation, the configuration information of the reconfigurable processor can be collected from the cache through the configuration controller and sent to the reconfigurable processing unit array PEA of the reconfigurable processor, each PE in the PEA adopts a predefined configuration information format, and the configuration information is recorded, wherein the predefined configuration information format is a key point of the embodiment of the invention.
Fig. 2 is a diagram illustrating classification of configuration information and configuration information format according to an embodiment of the present invention, as shown in fig. 2, in an embodiment, the configuration information includes top layer type configuration information, ALU operation type configuration information, and memory access operation type configuration information;
the predefined configuration information formats comprise a top layer type configuration information format for recording top layer type configuration information, an ALU operation type configuration information format for recording ALU operation type configuration information and a memory access operation type configuration information format for recording memory access operation type configuration information.
Therefore, the predefined configuration information formats respectively correspond to the three types of configuration information, and the three types of configuration information represented by the three types of configuration information formats are clearer and more convenient to manage through classified representation.
It should be noted that the configuration packet in the embodiment of the present invention refers to configuration information required by the entire reconfigurable processing unit array PEA, and includes configuration information that needs to be allocated in each processing unit PE, where the configuration information may be an execution task, and of course, there may be some PEs that do not execute the task. In the execution task of a PE, the distribution form of the configuration information generally includes 1 top-level configuration information (the top-level configuration information starts to be executed first) and several other two types of configuration information (there may be both types or only one type, and the key is depending on the type of the execution task), that is, all the PE execution tasks are re-executed uniformly from the top-level configuration information.
Fig. 3 is a schematic structural diagram of a predefined configuration information format according to an embodiment of the present invention, as shown in fig. 3, in an embodiment, the predefined configuration information format includes a plurality of fields, each of which is in a different bit field;
the type of the field comprises an extension bit field, a reserved bit field or an actual encoding field;
the preset length requirement is 64 bits.
In the above embodiment, the extension bit field refers to an extension bit used for recording the length of the configuration information in the bit field corresponding to the field of the type, and the reserved bit field refers to a reserved bit used for recording the encoding of the configuration information in the bit field corresponding to the field of the type; the actual encoding field refers to a field corresponding to the field of the type for recording the encoding having the actual meaning. The expansion bit field and the reserved bit field realize the expansion of future configuration information, so the method has high flexibility and high expansibility, can support the operation of various operation types through flexible dynamic configuration on the premise of fully utilizing hardware resources, and is simultaneously suitable for reconfigurable arrays with different array sizes and different storage sizes, thereby improving the calculation throughput, the calculation performance and the energy efficiency. The preset length requirement is 64 bits, and the method can be effectively used for a coarse-grained reconfigurable processor.
The configuration information in the top-level configuration information format, the ALU operation configuration information format, and the access operation configuration information format will be described below.
One, top layer type configuration information format (Head-Config)
Bit field [63 ]: the field configextended, which has 1bit in total, is an extension bit field, which indicates an extension bit of the configuration information length, and is temporarily reserved and may be always set to 0. If the configextended is 0, the configuration information length is not expanded; if configextended is 1, it indicates that the configuration information length may be extended, for example, the length of the configuration information used in the embodiment of the present invention is 64 bits, and if configextended is 1, the length of the configuration information may be extended to 64+8, 64+16, 64+32, 64+64 bits, and so on.
Bit region [62:61 ]: and a field Func of 2 bits in total, which is an actual encoding field and is used for recording the type of the configuration information stored by the PE, and when the field is 00, the field indicates that the configuration information is top-level configuration information.
Bit field [60:56 ]: the field Task _ PackageNum, which is 5 bits in total, is an actual coding field for recording the total number of configuration packets required by the current PE to perform the Task.
Bit field [55:53 ]: the bit field is a reserved bit field, i.e. the bit in the bit field is a reserved bit, and can always be set to 0.
Bit field [52:48 ]: and a field Package _ Index of 5 bits, which is an actual coding field and indicates the Index of the configuration packet required by the current PE to execute the task, and when the bit is equal to the total number of configuration packets required by the PE to execute the task, the PE completes the task.
Bit field [47:45 ]: the field Bit _ width, which is a reserved Bit field, may always be set to 0.
Bit field [44:37 ]: the field IndexPE, which has 8 bits, is an actual coding field and indicates the index of the PE in the PEA to which the current configuration packet belongs. The controller in the driving PEA assigns a configuration packet to each PE.
Bit field [36:30 ]: a field Iteration _ PEA with 7 bits in total is an actual coding field and is used for recording the Iteration times of the configuration information of the reconfigurable processing unit array PEA; the enable signal of PE _ enable is required, and all PEs of the PEA restart execution from the top-level type configuration information uniformly every time, wherein the memory address of the memory access operation restarts from the base address indicated by the memory access operation.
Bit field [29:23 ]: a field Iteration _ PE with 7 bits in total is an actual coding field and is used for recording the Iteration times of the current configuration packet of each PE in the PEA, which need to be circulated in the PE; and the pe _ enabel is not required to be enabled, the repeated execution is carried out from the configuration information recorded by the iteration line (bit field [16:15] represents the iteration line) of the current configuration packet each time, and the access of the LSU operation continues to accumulate the iteration from the address at which the last iteration is finished.
Bit field [22:17 ]: the field Initial _ Idle, which is 6 bits in total, is an actual encoding field, which indicates that the current PE needs Idle cycles when executing the top-level configuration information, and the current PE executes the next configuration information only after the Idle cycles are reached. Two bits of the bit field [14:13] in the top-level configuration information format have been extended to Initial _ Idle as its top two bits, i.e., a total of 8 bits for representation of the field Initial _ Idle.
Bit field [16:15 ]: and the field Iteration _ Line is 2 bits in total, is an actual coding field, and when the PE performs PE internal loop Iteration according to the Iteration _ PE, each Iteration is repeatedly executed from the configuration information recorded in the Iteration Line of the Iteration _ Line.
Bit field [14:12 ]: the bit field is actually coded, the bit field is already expanded and used by other fields, and two bits of [14:13] of the bit field are expanded to Iteration _ Idle at present and used as the highest two bits; the [12] bit of the bit field is extended to the operation _ PE as its most significant bit.
Domain [11:6 ]: the total amount of the bits is 6 bits, and 0 can be always set for reserving the bit field.
Bit field [5:0 ]: the field Count, which is 6 bits in total, records how many pieces of configuration information of all configuration packets of the current PE for the actual coded field, and currently supports that one configuration packet contains 16 pieces of configuration information at most, each configuration packet contains 8 pieces of configuration information at most when the PE ping-pong executes, and the first piece of configuration information is generally top layer type configuration information.
Second, ALU operation type configuration information format (ALU-Config)
In one embodiment, the ALU operation type configuration information includes MUL type operation type configuration information and MAC type operation type configuration information.
Bit field [63 ]: the field configextended, which has 1bit in total, is an extension bit field, which indicates an extension bit of the configuration information length, and is temporarily reserved and may be always set to 0.
Bit region [62:61 ]: and a field Func of 2 bits in total, which is an actual encoding field and is used for recording the type of the configuration information stored by the PE, and when the field is 01, the field indicates that the configuration information is ALU operation type configuration information.
Bit field [60:53 ]: the In1 field, which is 8 bits In total, is an actual encoding field used to indicate the data source of operand 1 of 32 bits, and is described below: (1) in1[7:5] ═ 000, which indicates that the data originated from the local register file of the present PE, In1[4:0] is used to record the local register Index, which currently supports 12 local registers, bits 0-7 are data registers, and bits 8-11 are iterator registers (In1[4:0] can support 32 local registers at maximum). (2) In1[7:5] ═ 001, which indicates that data originated from the global register file of the present PEA array, In1[4:0] is used to record the Index of the global register; the current implementation supports a maximum of 20 global registers, with bits 0-15 being data register bits and bits 16-19 being iterator register bits. (3) In1[7:5] ═ 010, indicates that the data originated from the calculation of Out1 for other connected PEs. When In1[4] is 0, it indicates that the data comes from the output register Out1 of other PE, In1[3:0] indicates the Index of the PE interconnect Router, and the data source supporting 16 different PEs at maximum; in1[4] ═ 1, which indicates that data comes from the output of Execute unit operations of other PEs, and has not been written to Out 1; in1[3:0] indicates the Index of the PE interconnect Router, which supports a maximum of 16 different PE data sources. (4) In1[7:5] -, 011, data comes from the calculation result Out1 of this PE, or the result Out2 of the ripple pass output of operand 1 of this PE. In1[4:3] ═ 00, which indicates that the data comes from the result after the present PE operation and is read from the output register Out1 written back onto the pipeline; in1[4:3] ═ 10 indicates that the data comes from the output result Out1 after the operation of this PE, and has not been written into the on-pipeline register Out 1; in1[4:3] ═ 01, indicating the result Out2 of the ripple pass output of operand 1, read from the output register Out2 written back onto the pipeline; in1[4:3] ═ 11, indicating that the data came from the ripple pass output result Out2 of source operand 1, but had not yet been written to on-pipeline register Out 2. (5) When In1[7:5] ═ 100, data comes from operand results Out2 of other connected PEs, the data source representing the systolic propagation of an operand across the array for systolic computations. In1[4] ═ 0, which indicates that data comes from operand 1 of other PEs and writes to the output register Out2 on the pipeline, In1[3:0] indicates the Index of the PE interconnect Router, a data source supporting 16 different PEs at maximum; when In1[4] is 1, it represents the output of the source operand 1 transfer of data from other PE, but it has not yet been written to Out2, and In1[3:0] represents the Index of the PE interconnect Router, which supports up to 16 different PE data sources.
Bit field [52:45 ]: the field In2, which is 8 bits In total, is an actual coding field used for indicating the data source of the 32-bit operand 2, the specific coding and meaning of the field In2 are consistent with those of the field In1, the field In1 supports the field to access the global register, and the purpose of the field In2 is to reduce the access delay and the area overhead of the global register; however, when the subsequent immediate flag is enabled, it indicates that In2 is an immediate and the maximum immediate of 255 is supported, i.e., In2_ Imm [31:0] ═ 24' b { In2[7], In2} }.
Bit field [44:37 ]: the field In3, 8 bits In total, is an actual encoding field used to indicate the data source of operand 3, 32 bits, and the specific encoding and meaning are consistent with the field In1, but the hardware does not support access to GR.
Bit field [36:31 ]: the field In4, having a total of 6 bits, is the actual encoding field used to indicate the data source of operand 4, 1 bit. In4[5] ═ 1, which indicates that the data comes from the calculation result Out3 of other connected PEs, at this time, if In4[4] ═ 0, which indicates that the data comes from the output register Out3 of other PEs, In4[3:0] indicates the Index of the PE interconnect Router, and the data source supporting 16 different PEs at maximum; in4[4] ═ 1, which indicates that data is coming from the output of ALU operations by other PEs, Out3 has not been written yet; in4[3:0] indicates the Index of the PE interconnect Router, which supports a maximum of 16 different PE data sources. In4[5] ═ 0 indicating that the data originated from the calculation result Out3 of the PE, In4[4] ═ 0 indicating that the data originated from the output register Out3 of the PE; in4[4] ≦ 1, indicating that the data came from the Out3 output of the Execute unit operation of this PE, but not yet written to the pipeline register Out 3.
Bit field [30 ]: and the field Imm is 1bit in total, is an actual coding field and is used for indicating an immediate flag bit and judging whether an immediate participates in ALU operation, wherein Imm is 1 and indicates that an operand 2 is from an immediate, otherwise, the field is register operation, and other operands do not participate in the operation.
Bit field [29:23 ]: a field Out1, which is 7 bits in total, is an actual encoding field, and is used for indicating the calculation result output of output data 1 of 32 bits, and Out1 indicates the output of the calculation result, specifically: out1[6:5] ═ 00, which indicates that output data 1 will be output to the local register file of this PE, Out1[4:0] indicates a local register Index, which currently supports only 8 local registers, with bits 0-7 being data registers and bits 8-11 being iterator registers, while pipeline register Out1Reg is written to by default; out1[6:5] ═ 01, which indicates that output data 1 will output the global register of the present PEA array, Out1[4:0] indicates the global register Index, the current version supports 16 global registers, bits 0-15 are data registers, bits 16-19 are iterator registers, and the pipeline register Out1Reg is written by default; out1[6:5] ═ 10, meaning that output data 1 will only output the output register Out1 of this PE, i.e. no other local or global registers are written.
Bit field [22:16 ]: the field Out2, 7 bits in total, is an actual encoding field used for representing the result output of output data 2 of 32 bits, the field Out2 represents the direct transfer output of operand 1, and is specifically designed for supporting the systolic array operation, the encoding is consistent with Out1, only the writing of global register GR and local register LR is not supported in hardware, and the main purpose is to reduce the access delay and area overhead of the global register and the local register.
Bit field [15 ]: the field IItype, 1bit in total, is an actual coding field, and is used for determining an iteration mode of the configuration information of the PEA and also determining an idel mode. The IItype is 0, which means that the PEA is iterated first and then performs the Idle once in a unified manner, and the working mode is suitable for a static operation mode of Spatial without memory access conflict of the PEA and a more ideal dynamic operation mode of temporal. This mode of operation supports mainly the following several modes: (1) a single piece of configuration information is only iterated without pause; (2) a single piece of configuration information is not iterated, but is stalled; (3) the single configuration information is iterated first and then is stopped uniformly. IItype is 1, which indicates that operation is performed once, and Idle is performed once again, and the working mode is suitable for a static operation mode of Spatial with access conflict in PEA, a static operation mode of Spatial with data hazard problem, a dynamic operation mode of temporal with multi-cycle and single-cycle mismatch, and the like; this mode of operation supports mainly the following several modes: (1) the pattern Idle must be greater than 0; (2) the single configuration is executed (transmitted) once per operation, and then the Idle is executed again until all the iterations are completed, including the Idle thereafter.
Bit field [14:5 ]: a field Iteraction, which is 10 bits in total, is an actual coding field, and the field indicates the iterative times Iter _ Num of the configuration information and the number of cycles Iter _ II of the interval required after the iterative execution of the configuration information is completed. (1) The Iteration [9:8] is 00 and represents the Iteration number and the Iteration interval in the form of immediate number, namely the Iteration [7:3] represents the number Iter _ Num of times that the Iteration of the piece of configuration information needs to be repeated, and the maximum repetition is 31 times; iteration [2:0] represents the period number Iter _ II needing to be spaced after Iteration of the configuration information, and the maximum interval is 7 periods; (2) iteraion [9:8] is 01, Iter _ Num is greater than or equal to 32 or Iter _ II is greater than or equal to 8, which means that the stored value in the global register is removed to represent the Iteration number and the Iteration interval, Iteration [7:3] is used for recording the Index of the global register file (20 at present and only 4 special global registers from 16 to 19 can be accessed), and obtaining the 32-bit value of the global register, wherein g [31:10] in the global register represents the Iteration number Iter _ Num, and g [9:0] bits in the global register represents the Iteration interval Iter _ II. (3) Iteration [9:8] is 10 and Iter _ Num is equal to or greater than 32 or Iter _ II is equal to or greater than 8, indicating that the stored values in the local register file are to be taken to represent the number of iterations and the Iteration interval. Iteration [7:3] is used for recording the Index of a local register, wherein g [31:10] in the local register represents the Iteration number Iter _ Num; the iteration interval Iter _ II is represented by g [9:0] in the local register.
Bit field [4:0 ]: a field Opcode with 5 bits in total is an actual encoding field and is used for determining the operation type of the ALU; the field Opcode supports 32 operation types at maximum, wherein the supported operation types include a binocular operation and a trinocular operation, 20 types are currently supported in the recording method provided in the embodiment of the present invention, and the remaining encoding types are temporarily reserved.
Thirdly, memory access operation type configuration information format (LSU-Config)
Bit field [63 ]: the field configextended, which is 1bit in total, is an extension bit field, is not extended at present, and can be always set to 0.
Bit region [62:61 ]: and a field Func of 2 bits in total, which is an actual encoding field and is used for recording the type of the configuration information stored by the PE, and when the field is 10, the field indicates that the configuration information is access operation type configuration information.
Bit field [60:53 ]: the field AddrMem, which is 8 bits in total, is an actual coded field, which indicates the base address for accessing a Shared Memory operation, and supports access to 2 different Shared memories, a single 16KB, i.e., 256 × 32 bits in depth.
Bit field [52:45 ]: the field InMem, 8 bits in total, is the actual coding field, and represents the Store access data input of 32 bits. The specific encoding and meaning are consistent with the field In2 field In the ALU operation type configuration information format, supporting access to global registers.
Bit field [44:37 ]: the field DirectADDrMem, which is 8 bits in total, is the actual encoding field used to participate in the Share Memory direct immediate address calculation.
Bit field [36:33 ]: the field Offset, which is 4 bits in total, is an actual coding field, and represents the address Offset configured in the cycle iteration of the memory access operation, namely the Offset performed when the address is increased automatically in the memory access operation, and the field Offset _ extended of the bit field [32] in the ALU operation type configuration information format is extended to the Offset field, so that the Offset is 5 bits in size.
Bit field [32]: the Offset _ extended field, which is 1bit in total, is the actual coded field and is extended to the most significant bit of the Offset field.
Bit field [31:30 ]: to reserve a field of bits, 2 bits in total, these two bits are left unused at present.
Bit field [29:23 ]: the field Out1, 8 bits in total, is an actual encoding field, which is used for indicating that the output data 1 is the target address of the access output data, and the specific encoding and meaning are consistent with the field Out1 in the ALU operation type configuration information format.
Bit field [22:19 ]: the bit field is reserved, 4 bits are used, 0 can be always set, and the method can be used for expansion of future memory access addresses, such as the space size of Sharememory and data interaction of different Sharememory; or extensions of other types of configuration information.
Bit field [18 ]: the field Addr _ Loop, which is 1bit in total, is an actual coding field and is used for representing an address self-increment mark between similar access and storage configurations (Load/storage), namely when Addr _ Loop is true, in the same configuration packet, the next similar access and storage configuration (Load or storage) is based on the last access and storage address and then self-increment. Especially, in the operation mode where PE _ Iteration of the top-level configuration information is greater than 0, the operations related to the configuration packet need to be continuously executed. The specific meanings are as follows: (1) addr _ Loop is 0, which indicates that the next same type of access configuration room (Load or Store) increases with the offset according to the access address field of the configuration information. (2) Addr _ Loop is 1, which indicates that the next same type of access configuration (Load or Store) is based on the last access address of the last piece of configuration information and increases with the field offset.
Bit field [17 ]: the field IItype, 1bit in total, is an actual coding field, is used for determining an iteration mode of the configuration information of the PEA, and can also be used for determining an idel mode, and the field has the coding, meaning and function which are consistent with the field IItype in the ALU operation type configuration information format.
Bit field [16:15 ]: the field IncreasFlag, which has 2 bits in total, is an actual coding field, and is a flag bit for indicating the self-increment or self-decrement of the access address, for example, it means that the access address automatically increases Offset, otherwise, the access address is determined by the input address in the configuration information (i.e. jointly determined by the field AddrMem and the field DirectAddrMem, and the specific meaning is as follows: (1) the IncreasFlag is 00 and indicates that the access address does not increase or decrease, and is determined by an input address in the configuration information (AddrMem + DirectAddrMem). (2) the IncreasFlag is 01 and indicates that the access address increases automatically, and the high level indicates that the access address increases automatically; otherwise, the access address is determined by the input address in the configuration (AddrMem + DirectAddrMem) ((3) increasflag ═ 10, which represents the flag bit of the access address self-reduction, and high represents that the access address automatically reduces Offset, otherwise, the access address is determined by the input address in the configuration information (AddrMem + DirectAddrMem) ((4) increasflag ═ 11, and the access address is left unused.
Bit field [14:5 ]: and the field Iteration, which is 10 bits in total, is an actual coding field and is used for indicating the Iteration repetition times and the Iteration interval of the piece of configuration information.
Bit field [4: 2]: to reserve a bit field, 3 bits in total, can always be set to 0.
Bit field [1:0 ]: the field Opcode, which is 2 bits in total, is an actual coding field and is used for determining the access type of the access operation type configuration information LSU, and the access type is divided into 4 types: global Load, global Store, local Load, local Store.
A specific embodiment is given below to explain a specific application of the configuration information recording method of the reconfigurable processor.
For a certain actual reconfigurable processor, acquiring configuration information of the processor, and then recording the acquired configuration information by adopting a configuration information format predefined by the embodiment of the invention, wherein the configuration information recorded finally is as follows:
first, the top-level configuration information of PE1 is taken as an example to describe the format of the top-level configuration information.
Bit field [63 ]: a field configextended, wherein the coding value is 0, namely the length of the configuration information is not expanded;
bit region [62:61 ]: the field is Func, the coding value is 00, and the current configuration information is Top-level configuration information Top-Config;
bit field [60:56 ]: a field Task _ PackageNum with a code value of 0000, indicating a Task with 1 configuration packet;
bit field [55:53 ]: for the reserved bit field, the code value is set to be 000;
bit field [52:48 ]: a field Package _ Index, the coded value of which is 00000, indicates that the configuration packet required by the current PE to execute the task is the 1 st configuration packet of all the current configuration packets;
bit field [47:45 ]: a field Bit _ width, whose code value is 000, representing the operand of the operation of 32 bits;
bit field [44:37 ]: a field IndexPE, whose encoded value is 00000001, indicates that the current PE is PE 1;
bit field [36:30 ]: a field Iteration _ PEA whose code value is 0000010, which indicates that the Iteration number of the configuration information of PEA is 2;
bit field [29:23 ]: a field Iteration _ PE, the code value of which is 0000010, indicates that the Iteration number of the current configuration packet of the PE needing to be circulated in the PE is 2;
bit field [22:17 ]: the code value of the field Initial _ Idle is 001000, which indicates that the PE is executed at intervals of 8 cycles when executing the configuration information of the next Store operation, and the cycle number needs to be specifically calculated.
Bit field [16:15 ]: a field Iteration _ Line, the code value of which is 01, indicating that when the PE performs PE internal loop Iteration according to the Iteration _ PE, each Iteration starts from the configuration information recorded in the 2 nd Line;
bit field [14:12] encoded with a value of 000;
bit field [11:6], which is a reserved bit field, and the code value of the reserved bit field is 000000;
a bit field [5:0], a field Count, whose code value is 000001, indicates that there are 2 pieces of configuration information including the top-level type configuration information and the Store configuration information.
Combining the above fields, the top-level configuration information of PE1 is specifically encoded as:
0_00_00000_000_00000_000_00000001_0000010_0000010_001000_01_00_0_000000_000001。
second, the configuration information of the addition operation performed by PE8, in which the data of the addition operation is derived from PE0 and PE16, specifically describes the ALU operation type configuration information expressed in the format of ALU operation type configuration information.
Bit field [63 ]: a field configextended, wherein the coding value is 0, namely the length of the configuration information is not expanded;
bit region [62:61 ]: a field Func with a code value of 00, which indicates that the current configuration information is ALU operation type configuration information ALU-Config;
bit field [60:53 ]: a field In1 encoding a value of 10000000, In which the upper four bits 1000 indicate that the data originated from operand result Out2 of the other connected PE; the lower four bits 0000 represent the output of PE0, which is the upper PE of the current PE in PEA;
bit field [52:45 ]: a field In2 encoding a value 10000001, where the upper four bits 1000 indicate that the data originated from the operand result Out2 of the other connected PE; the four lower bits 0001 represent the output of PE below this PE in PEA, PE 16;
bit field [44:37 ]: a field In3 with a code value of 00000000, wherein the default is 0 since In3 does not participate In the operation and is not restricted;
bit field [36:31 ]: a field In4 with a code value of 000000, which indicates that In4 does not participate In the operation, is not restricted, and is 0 by default;
bit field [30 ]: a field Imm, encoding a value of 0, indicating that operand 2 is not an immediate;
bit field [29:23 ]: a field Out1 with the encoding value of 0100001 indicates that the Add calculation result is written back to the global register GR1 and is output to the output register by default;
bit field [22:16 ]: a field Out2, encoded with a value of 1000000, indicating default unwritten GR and LR;
bit field [15 ]: the field IItype has a coding value of 0, and indicates that the PEA is continuously iterated first and then Idle is carried out;
bit field [14:5 ]: the field is Iteration and the code value is 0001010101, where the upper two bits 00 indicate that the number of iterations is represented by an immediate number, the middle 5 bits 01010 indicate that the number of iterations, Iter _ Num, is 10, and the lower 3 bits 101 indicate that the number of cycles Iter _ II of the interval is 5 cycles, since the load operation is 6 cycles, which is 5 cycles more than Add.
Bit field [4:0 ]: the field Opcode, which encodes 00010, indicates the type of ALU operation is signed Add.
To sum up, the specific encoding of the ALU operation type configuration information of PE8 is:
0_00_00000_000_00000_000_00001000_0000010_0000010_000100_01_00_0_000000_000001。
and thirdly, the configuration information of the load operation carried out by the PE16 is used for specifically explaining the access operation type configuration information stored in the access operation type configuration information format. A description method of a storage operation configuration information format Load-Config.
Bit field [63 ]: a field configextended, wherein the coding value is 0, namely the length of the configuration information is not expanded;
bit region [62:61 ]: a field Func with a code value of 00, which indicates that the current configuration information is storage operation type configuration information Load-Config;
bit field [60:53 ]: the field AddrMem has the coding value of 10001000, wherein the upper three bits of 100 represent the immediate data access, and the lower 5 bits and a bit field [52:45] jointly form a 13-bit access address to participate in the immediate data operation;
bit field [52:45 ]: the field DirectADDrMem with the code value of 00100100100 represents participation in immediate address calculation, the address represented by the immediate is 36 of 8 higher banks on the right side, namely, at the 13 th bank, a single bank is 256 multiplied by 32;
bit field [44:37 ]: the field InMem has an encoding value of 00000000, and when the store operation is used, the load operation can be defaulted to 0;
bit field [36:33 ]: the field Offset has an encoding value of 0001, represents that the memory access Offset is 1, and can continuously access the array in the ShareMemory when accumulating 1;
bit field [32] is field Offset _ extended, total 1bit, which is actual coding field, and is extended to the most significant bit of Offset field;
bit field [31:30 ]: for holding bit fields;
bit field [29:23 ]: a field Out1, encoded with a value of 0000001, indicating that the load result is written into local register LR1 and output directly to the output register by default;
bit field [22:19 ]: reserving bit fields, and setting all the bit fields to be 0;
bit field [18 ]: a field Addr _ Loop, which is 1bit in total, is an actual coding field and is used for representing an address self-increment mark between similar access and storage configurations (Load/Store), and the Addr _ Loop is 0;
bit field [17 ]: the field IItype with the encoding value of 0 indicates that continuous iteration is carried out first and then Idle is carried out;
bit field [16:15 ]: a field IncreasFlag with the code value of 01 indicates that the auto-increment operation is performed;
bit field [14:5 ]: a field Iteration with an encoded value of 0001010000, where the upper two bits 00 indicate immediate representation, the middle 5 bits 01010 indicate 10 iterations of the load operation, and the lower 3 bits 000 indicate 0 cycles apart;
bit field [4: 2]: reserving a bit field, and setting all bits to zero;
bit field [1:0 ]: opcode field with code value 00, indicating that global load operation is performed and all ShareMemory's Bank is accessible
In summary, the specific code of the store-and-compute configuration information of PE16 is:
0_10_10001000_00100100_00000000_0001_000_0000001_10000_0_01_0001010000_000_00。
in summary, in the method provided in the embodiment of the present invention, configuration information of the reconfigurable processor is collected; recording the configuration information by adopting a predefined configuration information format; the predefined configuration information format is a configuration information format meeting the requirement of a preset length, and comprises an extension bit of the configuration information length and a plurality of reserved bits of configuration information codes. In the process, the configuration information format meeting the requirement of the preset length can be effectively used in a coarse-grained reconfigurable processor; the expansion of the configuration information can be realized by the expansion bit of the length of the configuration information and the reserved bits of a plurality of configuration information codes, so that the method has high flexibility and strong expansibility.
Based on the same inventive concept, the embodiment of the present invention further provides a configuration information recording apparatus of a reconfigurable processor, as described in the following embodiments. Since the principles of these solutions are similar to the method for recording configuration information of a reconfigurable processor, the implementation of the apparatus can be referred to the implementation of the method, and the repeated details are not repeated.
Fig. 4 is a schematic diagram of a configuration information recording apparatus of a reconfigurable processor according to an embodiment of the present invention, as shown in fig. 4, the apparatus includes:
a configuration information acquisition module 401, configured to acquire configuration information of the reconfigurable processor;
a configuration information recording module 402, configured to record the configuration information in a predefined configuration information format;
the predefined configuration information format is a configuration information format meeting the requirement of a preset length, and comprises an extension bit of the configuration information length and a plurality of reserved bits of configuration information codes.
In one embodiment, the predefined configuration information format includes a plurality of fields, each field being in a different bit field;
the type of the field comprises an extension bit field, a reserved bit field or an actual encoding field;
the preset length requirement is 64 bits.
In one embodiment, the configuration information includes top layer type configuration information, ALU operation type configuration information, and memory access operation type configuration information;
the predefined configuration information formats comprise a top layer type configuration information format for recording top layer type configuration information, an ALU operation type configuration information format for recording ALU operation type configuration information and a memory access operation type configuration information format for recording memory access operation type configuration information.
In an embodiment, in the top-level configuration information format, the field Iteration _ PEA is used for recording the Iteration number of the configuration information of the reconfigurable processing unit array PEA; a field Iteration _ PE for recording the number of iterations that the current configuration packet of each processing element PE in the PEA needs to loop inside the PE.
In an embodiment, in the ALU operation type configuration information format or the memory access operation type configuration information format, the field IItype is used to determine the iteration mode of the configuration information of the PEA.
In one embodiment, the ALU operation type configuration information includes MUL type operation type configuration information and MAC type operation type configuration information.
In an embodiment, in the ALU operation type configuration information format, a field Opcode is used to determine the operation type of the ALU;
the field Opcode supports 32 operation types at most, wherein the supported operation types comprise a binocular operation and a trinocular operation.
In summary, in the apparatus provided in the embodiment of the present invention, configuration information of the reconfigurable processor is collected; recording the configuration information by adopting a predefined configuration information format; the predefined configuration information format is a configuration information format meeting the requirement of a preset length, and comprises an extension bit of the configuration information length and a plurality of reserved bits of configuration information codes. In the process, the configuration information format meeting the requirement of the preset length can be effectively used in a coarse-grained reconfigurable processor; the expansion of the configuration information can be realized by the expansion bit of the length of the configuration information and the reserved bits of a plurality of configuration information codes, so that the method has high flexibility and strong expansibility.
An embodiment of the present application further provides a computer device, and fig. 5 is a schematic diagram of a computer device in an embodiment of the present invention, where the computer device is capable of implementing all steps in the configuration information recording method of the reconfigurable processor in the foregoing embodiment, and the computer device specifically includes the following contents:
a processor (processor)501, a memory (memory)502, a communication Interface (Communications Interface)503, and a communication bus 504;
the processor 501, the memory 502 and the communication interface 503 complete mutual communication through the communication bus 504; the communication interface 503 is used for implementing information transmission between related devices such as server-side devices, detection devices, and user-side devices;
the processor 501 is configured to call the computer program in the memory 502, and when the processor executes the computer program, the processor implements all the steps in the configuration information recording method of the reconfigurable processor in the above embodiments.
An embodiment of the present application further provides a computer-readable storage medium, which can implement all the steps in the configuration information recording method of the reconfigurable processor in the above-described embodiment, and the computer-readable storage medium stores a computer program, and when the computer program is executed by the processor, the computer program implements all the steps of the configuration information recording method of the reconfigurable processor in the above-described embodiment.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (8)

1. A method for recording configuration information of a reconfigurable processor, comprising:
acquiring configuration information of a reconfigurable processor;
recording the configuration information by adopting a predefined configuration information format;
the predefined configuration information format is a configuration information format meeting the requirement of preset length;
the configuration information comprises top layer type configuration information, ALU operation type configuration information and access operation type configuration information;
the predefined configuration information formats comprise a top layer type configuration information format for recording top layer type configuration information, an ALU operation type configuration information format for recording ALU operation type configuration information and a memory access operation type configuration information format for recording memory access operation type configuration information;
the predefined configuration information format comprises a plurality of fields, each field being in a different bit field; the preset length requirement is 64 bits;
the top-level configuration information format comprises an extension bit field, a reserved bit field and an actual coding field;
the ALU operation type configuration information format comprises an extension bit field and an actual encoding field;
the configuration information format of the access operation type comprises an extension bit field, a reserved bit field and an actual coding field.
2. A configuration information recording method of a reconfigurable processor according to claim 1, wherein in the top-level configuration information format, a field Iteration _ PEA is used to record the number of iterations of configuration information of the reconfigurable processing unit array PEA; a field Iteration _ PE for recording the number of iterations that the current configuration packet of each processing element PE in the PEA needs to loop inside the PE.
3. A configuration information recording method of a reconfigurable processor according to claim 1, wherein in the ALU operation type configuration information format or the memory access operation type configuration information format, the field IItype is used to determine an iterative manner of the configuration information of the PEA.
4. The configuration information recording method of a reconfigurable processor according to claim 1, wherein the ALU operation type configuration information includes MUL type operation type configuration information and MAC type operation type configuration information.
5. A configuration information recording method of a reconfigurable processor according to claim 1, wherein in the ALU operation type configuration information format, a field Opcode is used to determine the operation type of the ALU;
the field Opcode supports 32 operation types at most, wherein the supported operation types comprise a binocular operation and a trinocular operation.
6. A configuration information recording apparatus of a reconfigurable processor, comprising:
the configuration information acquisition module is used for acquiring configuration information of the reconfigurable processor;
the configuration information recording module is used for recording the configuration information by adopting a predefined configuration information format;
the predefined configuration information format is a configuration information format meeting the requirement of preset length;
the configuration information comprises top layer type configuration information, ALU operation type configuration information and access operation type configuration information;
the predefined configuration information formats comprise a top layer type configuration information format for recording top layer type configuration information, an ALU operation type configuration information format for recording ALU operation type configuration information and a memory access operation type configuration information format for recording memory access operation type configuration information;
the predefined configuration information format comprises a plurality of fields, each field being in a different bit field; the preset length requirement is 64 bits;
the top-level configuration information format comprises an extension bit field, a reserved bit field and an actual coding field;
the ALU operation type configuration information format comprises an extension bit field and an actual encoding field;
the configuration information format of the access operation type comprises an extension bit field, a reserved bit field and an actual coding field.
7. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method of any of claims 1 to 5 when executing the computer program.
8. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program for executing the method of any one of claims 1 to 5.
CN202011465401.6A 2020-12-14 2020-12-14 Configuration information recording method and device for reconfigurable processor Active CN112433773B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011465401.6A CN112433773B (en) 2020-12-14 2020-12-14 Configuration information recording method and device for reconfigurable processor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011465401.6A CN112433773B (en) 2020-12-14 2020-12-14 Configuration information recording method and device for reconfigurable processor

Publications (2)

Publication Number Publication Date
CN112433773A CN112433773A (en) 2021-03-02
CN112433773B true CN112433773B (en) 2021-11-30

Family

ID=74692158

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011465401.6A Active CN112433773B (en) 2020-12-14 2020-12-14 Configuration information recording method and device for reconfigurable processor

Country Status (1)

Country Link
CN (1) CN112433773B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115328821B (en) * 2022-10-18 2022-12-23 北京红山微电子技术有限公司 Reconfigurable Cache system, memory access system and memory access method based on GPU

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015149433A1 (en) * 2014-03-31 2015-10-08 Tsinghua University Method and device for generating configuration information of dynamic reconfigurable processor

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101581882B1 (en) * 2009-04-20 2015-12-31 삼성전자주식회사 Reconfigurable processor and method for reconfiguring the processor
KR20120061593A (en) * 2010-12-03 2012-06-13 삼성전자주식회사 Apparatus and Method for synchronization of threads
CN102411555B (en) * 2011-08-17 2014-01-01 清华大学 Method for telescopically and dynamically configuring configuration information of reconfigurable array
US9727460B2 (en) * 2013-11-01 2017-08-08 Samsung Electronics Co., Ltd. Selecting a memory mapping scheme by determining a number of functional units activated in each cycle of a loop based on analyzing parallelism of a loop
CN103914404B (en) * 2014-04-29 2017-05-17 东南大学 Configuration information cache device in coarseness reconfigurable system and compression method
CN105302525B (en) * 2015-10-16 2018-01-05 上海交通大学 Method for parallel processing for the reconfigurable processor of multi-level heterogeneous structure
CN105468568B (en) * 2015-11-13 2018-06-05 上海交通大学 Efficient coarseness restructurable computing system

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015149433A1 (en) * 2014-03-31 2015-10-08 Tsinghua University Method and device for generating configuration information of dynamic reconfigurable processor

Also Published As

Publication number Publication date
CN112433773A (en) 2021-03-02

Similar Documents

Publication Publication Date Title
CN112612521A (en) Apparatus and method for performing matrix multiplication operation
WO2007084700A2 (en) System and method for thread handling in multithreaded parallel computing of nested threads
CN102279818B (en) Vector data access and storage control method supporting limited sharing and vector memory
CN1659514A (en) Registers for data transfers within a multithreaded processor
CN112232517B (en) Artificial intelligence accelerates engine and artificial intelligence treater
Min et al. NeuralHMC: An efficient HMC-based accelerator for deep neural networks
CN114398308A (en) Near memory computing system based on data-driven coarse-grained reconfigurable array
CN100573500C (en) Stream handle IP kernel based on the Avalon bus
CN101320344A (en) Multi-core or numerous-core processor function verification device and method
CN112433773B (en) Configuration information recording method and device for reconfigurable processor
CN102629238B (en) Method and device for supporting vector condition memory access
Ross et al. Implementing openshmem for the adapteva epiphany risc array processor
CN113407483B (en) Dynamic reconfigurable processor for data intensive application
US20210255793A1 (en) System and method for managing conversion of low-locality data into high-locality data
KR101639854B1 (en) An interconnect structure to support the execution of instruction sequences by a plurality of engines
CN117435251A (en) Post quantum cryptography algorithm processor and system on chip thereof
Shang et al. LACS: A high-computational-efficiency accelerator for CNNs
CN112559954A (en) FFT algorithm processing method and device based on software-defined reconfigurable processor
JP2006523883A (en) Reconfigurable processor array utilizing ILP and TLP
CN111475205A (en) Coarse-grained reconfigurable array structure design method based on data flow decoupling
CN112506853B (en) Reconfigurable processing unit array of zero-buffer pipelining and zero-buffer pipelining method
WO2023045250A1 (en) Memory pool resource sharing method and apparatus, and device and readable medium
CN112486904B (en) Register file design method and device for reconfigurable processing unit array
EP0136218A2 (en) Multiple port pipelined processor
Wang et al. Addressing memory wall problem of graph computation in reconfigurable system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant