WO2022134426A1 - 可重构处理器中的指令分发方法、系统以及存储介质 - Google Patents

可重构处理器中的指令分发方法、系统以及存储介质 Download PDF

Info

Publication number
WO2022134426A1
WO2022134426A1 PCT/CN2021/092239 CN2021092239W WO2022134426A1 WO 2022134426 A1 WO2022134426 A1 WO 2022134426A1 CN 2021092239 W CN2021092239 W CN 2021092239W WO 2022134426 A1 WO2022134426 A1 WO 2022134426A1
Authority
WO
WIPO (PCT)
Prior art keywords
instruction
storage
synchronization
queue
identification field
Prior art date
Application number
PCT/CN2021/092239
Other languages
English (en)
French (fr)
Inventor
费宝川
欧阳鹏
唐士斌
邓立玮
Original Assignee
北京清微智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京清微智能科技有限公司 filed Critical 北京清微智能科技有限公司
Priority to US17/770,553 priority Critical patent/US11977894B2/en
Publication of WO2022134426A1 publication Critical patent/WO2022134426A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7867Architectures of general purpose stored program computers comprising a single central processing unit with reconfigurable architecture
    • G06F15/7871Reconfiguration support, e.g. configuration loading, configuration switching, or hardware OS
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication
    • G06F15/173Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
    • G06F15/17306Intercommunication techniques
    • G06F15/17331Distributed shared memory [DSM], e.g. remote direct memory access [RDMA]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/3004Arrangements for executing specific machine instructions to perform operations on memory
    • G06F9/30043LOAD or STORE instructions; Clear instruction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3802Instruction prefetching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • G06F9/3838Dependency mechanisms, e.g. register scoreboarding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7867Architectures of general purpose stored program computers comprising a single central processing unit with reconfigurable architecture

Definitions

  • the present disclosure relates to the technical field of reconfigurable processors.
  • the present disclosure particularly relates to an instruction distribution method in a reconfigurable processor, an instruction distribution system in a reconfigurable processor, and a storage medium.
  • the purpose of the present disclosure is to provide an instruction distribution method in a reconfigurable processor.
  • the method extracts the memory identification field of each instruction, performs one-hot encoding on the memory identification field to generate a storage synchronization ID table, generates a synchronization table according to the storage synchronization ID table to establish a dependency relationship between the instructions, and executes each instruction according to the synchronization table. , so that each instruction is executed in sequence, which can ensure the parallel execution efficiency of multiple instructions, reduce memory conflicts, and shorten the instruction running time.
  • Another object of the present disclosure is to provide an instruction distribution system in a reconfigurable processor.
  • the system extracts the memory identification field of each instruction, performs one-hot encoding on the memory identification field to generate a storage synchronization ID table, generates a synchronization table according to the storage synchronization ID table to establish the dependency relationship between each instruction, and executes each instruction according to the synchronization table. , so that each instruction is executed in sequence, which can ensure the parallel execution efficiency of multiple instructions, reduce memory conflicts, and shorten the instruction running time.
  • a first aspect of the present disclosure provides an instruction distribution method in a reconfigurable processor.
  • the reconfigurable processor includes an instruction fetch module, an instruction synchronization control module and an instruction queue module; the instruction fetch module is used to distribute multiple lines of instructions to be executed to the instruction synchronization control module and the instruction queue module; the The instruction synchronization control module is used to control the execution of instructions in the instruction queue module; the instruction queue module includes multiple instruction queues; multiple instruction units are arranged in sequence in each of the instruction queues; each instruction queue corresponds to An instruction type.
  • the method includes: step S101, setting the storage synchronization ID table format of each instruction type; the storage synchronization ID table includes a plurality of storage units arranged in sequence; each storage unit is set with a first memory identification field storage bit and/or a first memory identification field storage bit.
  • each instruction type corresponds to a set number of first memory identification fields and/or second memory identification fields; step S102, sequentially extracting the first memory identification field of each instruction in the multiple rows of instructions to be executed and the second memory identification field; Step S103, obtain the one-hot encoding of the first memory identification field of each instruction and/or the one-hot encoding of the second memory identification field;
  • the one-hot encoding of the field and the one-hot encoding of the second memory identification field are stored in the corresponding storage synchronization ID table according to the execution order of the multi-row instructions to be executed; the instruction parameters of the instructions are sent to the in each instruction unit of the instruction queue corresponding to the instruction queue module; step S104, the instruction synchronization control module according to the first memory identification field one-hot encoding of each instruction type in the storage synchronization ID table and the described
  • the second memory identification field one-hot encoding obtains the dependency identification information of any one instruction type and the other two instruction types; generates a synchronization table along the
  • the instruction types include a load instruction, a calculation instruction, and a store instruction
  • the storage synchronization ID table includes: a storage synchronization ID table of a load instruction, a storage synchronization ID table of a calculation instruction, and a storage synchronization ID table of the storage instruction a synchronization ID table
  • the instruction queue includes: a load instruction queue corresponding to the load instruction, a calculation instruction queue corresponding to the calculation instruction, and a storage instruction queue corresponding to the storage instruction.
  • the storage synchronization ID table includes 8 storage units arranged in sequence; the instruction queue includes 8 instruction units.
  • each storage unit in the storage synchronization ID table of the load instruction includes a first memory identification field and a second memory identification field; each storage unit in the storage synchronization ID table of the calculation instruction The unit includes a first memory identification field and a second memory identification field; and each storage unit in the storage synchronization ID table of the storage instruction includes a second memory identification field.
  • step S103 further includes: judging whether each instruction unit of the instruction queue is full, and if so, returning to this step until there is an idle instruction unit in the instruction queue; if not, returning to this step Then step S104 is executed.
  • a second aspect of the present disclosure provides an instruction distribution system in a reconfigurable processor.
  • the reconfigurable processor includes an instruction fetch module, an instruction synchronization control module and an instruction queue module; the instruction fetch module is used to distribute multiple lines of instructions to be executed to the instruction synchronization control module and the instruction queue module; the The instruction synchronization control module is used to control the execution of instructions in the instruction queue module; the instruction queue module includes multiple instruction queues; multiple instruction units are arranged in sequence in each of the instruction queues; each instruction queue corresponds to An instruction type.
  • the system includes the following units.
  • a storage synchronization ID table setting unit which is configured to set the storage synchronization ID table format of each instruction type; the storage synchronization ID table includes a plurality of storage units set in turn; each storage unit is provided with a first memory identification field storage bit and/or Or the storage bit of the second memory identification field; each instruction type corresponds to a set number of the first memory identification field and/or the second memory identification field.
  • an instruction extraction unit configured to sequentially extract the first memory identification field and the second memory identification field of each instruction in the multiple lines of instructions to be executed.
  • the one-hot encoding unit is configured to obtain the one-hot encoding of the first memory identification field of each instruction and/or the one-hot encoding of the second memory identification field;
  • the one-hot encoding and the one-hot encoding of the second memory identification field are stored in the corresponding storage synchronization ID table according to the execution order of the multi-line instructions to be executed; send the instruction parameters of the instructions to the instruction in each instruction unit of the instruction queue corresponding to the queue module.
  • a synchronization table unit is generated, which is configured for the instruction synchronization control module to obtain any one of the first memory identification field one-hot encoding and the second memory identification field one-hot encoding of each instruction type in the storage synchronization ID table Dependency identification information between the instruction type and the other two instruction types; a synchronization table is generated along the first data dimension according to the dependency identification information; the synchronization table includes the first data dimension and the second data dimension that are converged; The first data dimension of the instruction type corresponds to the storage bit number of the storage synchronization ID table of each instruction type.
  • an instruction execution unit configured to execute corresponding instructions along the second data dimension of the synchronization table according to the dependency relationship of each type of instruction; the number of the second data dimension of each instruction type is the same as the number of instruction units in the instruction queue
  • the instruction synchronization control module calls the instruction parameter corresponding to each instruction in the corresponding instruction unit through the synchronization table while executing the corresponding instruction, so as to execute each instruction in the multi-line to-be-run instruction line.
  • the instruction types include a load instruction, a calculation instruction, and a store instruction
  • the storage synchronization ID table includes: a storage synchronization ID table of a load instruction, a storage synchronization ID table of a calculation instruction, and a storage synchronization ID table of the storage instruction a synchronization ID table
  • the instruction queue includes: a load instruction queue corresponding to the load instruction, a calculation instruction queue corresponding to the calculation instruction, and a storage instruction queue corresponding to the storage instruction.
  • the storage synchronization ID table includes 8 storage units arranged in sequence; the instruction queue includes the number of 8 instruction units.
  • each storage unit in the storage synchronization ID table of the load instruction includes a first memory identification field and a second memory identification field; each storage unit in the storage synchronization ID table of the calculation instruction The unit includes a first memory identification field and a second memory identification field; and each storage unit in the storage synchronization ID table of the storage instruction includes a second memory identification field.
  • the one-hot encoding unit is further configured to: determine whether each instruction unit of the instruction queue is full, and if so, return to this unit until there is an idle instruction unit in the instruction queue ; if not, execute the generate synchronization table unit.
  • FIG. 1 is a schematic flowchart for illustrating an instruction distribution method in a reconfigurable processor in an embodiment of the present disclosure.
  • FIG. 2 is a schematic diagram for illustrating the composition of the storage synchronization ID table format in an embodiment of the present disclosure.
  • FIG. 3 is a schematic diagram for illustrating the composition of a synchronization table in an embodiment of the present disclosure.
  • FIG. 4 is a schematic diagram illustrating the structure of an instruction queue in an embodiment of the present disclosure.
  • FIG. 5 is a schematic diagram for illustrating the composition of a reconfigurable processor in an embodiment of the present disclosure.
  • FIG. 6 is a diagram illustrating various states in the instruction queue module in one embodiment of the present disclosure.
  • FIG. 5 is a schematic diagram for illustrating the composition of a reconfigurable processor in an embodiment of the present disclosure.
  • the reconfigurable processor includes an instruction fetch module 202 , an instruction synchronization control module 201 and an instruction queue module 203 .
  • the instruction fetching module 202 is capable of splitting multiple lines of instructions to be executed and then distributing them to the instruction synchronization control module 201 and the instruction queue module 203 respectively.
  • the instruction synchronization control module 201 controls the execution of instructions in the instruction queue module.
  • the types of instructions to be executed are included in an instruction set.
  • the instruction set includes multiple instruction types.
  • FIG. 4 is a schematic diagram illustrating the structure of an instruction queue in an embodiment of the present disclosure.
  • multiple instruction queues are set in the instruction queue module. For example: calculating PEA_EXEC queue 102 , storing WDMA queue 103 and loading RDMA queue 101 .
  • Eight instruction units are arranged in sequence in the calculation PEA_EXEC queue 102 , the store WDMA queue 103 and the load RDMA queue 101 respectively.
  • An instruction queue corresponds to an instruction type.
  • FIG. 1 is a schematic flowchart for illustrating an instruction distribution method in a reconfigurable processor in an embodiment of the present disclosure. As shown in Figure 1, the instruction distribution method in the reconfigurable processor includes:
  • Step S101 setting the storage synchronization ID table format of each instruction type.
  • the storage synchronization ID table format is stored for each instruction type.
  • the storage synchronization ID table includes a plurality of storage units arranged in sequence. Each storage unit is provided with a first memory identification field CSPM_ID storage bit and/or a second memory identification field LSHM_ID storage bit.
  • Each instruction type corresponds to a set number of first memory identification fields and/or second memory identification fields.
  • the storage synchronization ID table of the load RDMA instruction has 8 units, and each unit includes a first memory identification field CSPM_ID and a second memory identification field LSHM_ID.
  • the storage synchronization ID table format for calculating the PEA_EXEC instruction has 8 units, and each unit includes a first memory identification field CSPM_ID and a second memory identification field LSHM_ID.
  • Step S102 acquiring the first memory identification field and the second memory identification field of each instruction.
  • the first memory identification field and/or the second memory identification field of each instruction in the multiple lines of instructions to be executed are sequentially extracted.
  • the instruction type of each instruction is included in the instruction set.
  • Step S103 obtaining the one-hot codes of the first and second memory identification fields.
  • the one-hot encoding of the first memory identification field and the one-hot encoding of the second memory identification field of each instruction are stored in the storage synchronization ID table according to the execution order of the multiple rows of instructions to be executed.
  • the one-hot rule code of the CSPM_ID field and the one-hot rule code of the LSHM_ID field of the "RDMA data load instruction" in program 1 store them in the storage synchronization ID table, and store the other instruction parameters of the RDMA data load instruction correspondingly. into the instruction unit of the instruction queue.
  • Examples of the first memory bank number CSPM_ID field encoding and the second memory bank number LSHM_ID field encoding of each instruction are as follows:
  • Step S104 acquiring a synchronization table.
  • the instruction synchronization control module obtains any one instruction type and the other two instructions according to the one-hot encoding of the first memory identification field and the one-hot encoding of the second memory identification field of each instruction type in the synchronization ID table of each instruction storage Dependency identification information for the type.
  • a synchronization table is generated along a first data dimension according to the dependency identification information.
  • the synchronization table includes a first data dimension and a second data dimension that intersect.
  • the first data dimension of each instruction type corresponds to the storage bit number of the storage synchronization ID table of each instruction type.
  • the first data dimension is the direction indicated by "A”.
  • the second data dimension is the direction indicated by "B”.
  • the dependency identification information is marked in Table 3 from the direction of the first data dimension "A" middle. For example, " ⁇ " in row A1 in Table 3 represents the corresponding dependency between the first bit in the WDMA data storage instruction and the PEA_EXEC calculation instruction. According to the instructions in the instruction line, the dependencies of one instruction and the other two instructions are stored.
  • the number of units in the one-dimensional data dimension r0 to r7 corresponding to the RDMA instruction in the synchronization table corresponds to the number of 8 units in the storage synchronization ID table of the load RDMA instruction in FIG. 2 .
  • the corresponding relationship between the RDMA instruction and the PEA_EXEC calculation instruction is that the 3rd and 4th bit instructions of the load RDMA instruction depend on the 0-bit instruction of the PEA_EXEC calculation instruction.
  • the number of cells in the one-dimensional data dimension w0 to w7 corresponding to the WDMA data storage instruction in the synchronization table corresponds to the number of 8 cells in the storage synchronization ID table of the WDMA data storage instruction in Figure 2 of.
  • the corresponding relationship between the WDMA data storage instruction and the PEA_EXEC calculation instruction is that the 0th instruction of the WDMA data storage instruction depends on the 1-bit instruction of the PEA_EXEC calculation instruction, and the first instruction of the WDMA data storage instruction depends on PEA_EXEC computes the 0, 1, 2-bit instructions of the instruction.
  • the 0-bit instruction of the PEA_EXEC calculation instruction depends on the 0 and 1 bits of the RDMA.
  • the 1-bit instruction of the PEA_EXEC calculation instruction depends on the 2-bit of RDMA, and the 2-bit instruction of the PEA_EXEC calculation instruction depends on the 0, 1, 3, and 4 bits of RDMA.
  • Step S105 execute each instruction in the multi-line running instruction line.
  • the number of 8 units in the second data dimension of the RDMA data load instruction is consistent with the number of 8 units in the RDMA data load queue in the instruction queue in FIG. 4 .
  • the number of 8 units in the second data dimension of the PEA_EXEC calculation instruction is consistent with the number of 8 units in the PEA_EXEC calculation instruction queue in the instruction queue in FIG. 4 .
  • the number of 8 units in the second data dimension of the WDMA data store instruction is consistent with the number of 8 units in the load queue of the WDMA data store instruction in the instruction queue in FIG. 4 .
  • the program when executed, it is executed in sequence according to the second data dimension B direction, that is, the order of the arrangement direction of the column B1 .
  • the second data dimension B direction that is, the order of the arrangement direction of the column B1 .
  • the 0th bit of the RDMA data loading instruction when executed, other related data related to the calling instruction corresponding to the 0th bit in the RDMA data loading queue is called.
  • the first bit of the RDMA data loading instruction is executed, other related data related to the calling instruction corresponding to the first bit in the RDMA data loading queue is called. Called sequentially from bits 0 to 7.
  • the execution of RDMA 0 and 1 is not restricted by any instruction.
  • the 0-bit of RDMA corresponds to the column.
  • the dependency of the 0 and 2-bit instructions of the PEA_EXEC calculation instruction in the above is eliminated.
  • the 1-bit of the RDMA corresponds to the 0 and 2-bit instructions of the PEA_EXEC calculation instruction in the column.
  • the RDMA bit 0 is executed, the B0 column will be updated, then the RDMA bit 1 will be executed, and then the B1 column will be updated.
  • the storage synchronization ID table of each instruction type includes: a storage synchronization ID table of a load instruction, a storage synchronization ID table of a calculation instruction, and a storage synchronization ID table of a storage instruction.
  • the instruction queue is set to three instruction queues.
  • the instruction types are load instruction, calculation instruction and store instruction.
  • the load instruction corresponds to the load instruction queue
  • the calculation instruction corresponds to the calculation instruction queue
  • the storage corresponds to the storage instruction queue.
  • the storage synchronization ID table includes 8 storage units arranged in sequence.
  • the memory cell addresses are "0 to 7".
  • the unit address of the start bit is "0".
  • the number of instruction units in the instruction queue is 8 bits.
  • the unit addresses in the instruction queue are "0 ⁇ 7".
  • the unit address of the start bit of the instruction unit is "0".
  • the unit addresses "0-7" in the instruction queue correspond to the unit addresses "0-7" in the control queue, respectively.
  • the instruction set includes: each instruction type includes: a load instruction RDMA, a calculation instruction EXEC, and a store instruction WDMA.
  • the one-hot encoding of the first memory identification field or the one-hot encoding of the second memory identification field includes: "0, 1, 2.".
  • the current first memory bank number of the load instruction RDMA, the calculation instruction EXEC or the store instruction WDMA is encoded as "0, 1, 2".
  • the current second memory bank number of the load instruction RDMA, the calculation instruction EXEC or the store instruction WDMA is encoded as "0, 1, 2"..
  • step S103 further includes:
  • step S104 It is judged whether each instruction unit of the instruction queue is full, and if so, return to this step until there is an idle instruction unit in the instruction queue. If not, step S104 is executed.
  • a second aspect of the present disclosure provides an instruction distribution system in a reconfigurable processor, where the reconfigurable processor includes an instruction fetch module, an instruction synchronization control module, and an instruction queue module.
  • the instruction fetching module can distribute multiple lines of instructions to be executed to the instruction synchronization control module and the instruction queue module respectively.
  • the instruction synchronization control module controls the execution of instructions in the instruction queue module.
  • the types of instructions to be executed are included in an instruction set.
  • the instruction set includes multiple instruction types.
  • Set up multiple instruction queues in the instruction queue module A plurality of instruction units are arranged in sequence in the instruction queue.
  • An instruction queue corresponds to an instruction type.
  • the instruction distribution system in the reconfigurable processor includes: a storage synchronization ID table setting unit, an instruction fetching unit, a one-hot encoding unit, a generating synchronization table unit and an instruction execution unit. in:
  • the storage synchronization ID table setting unit is configured to set the storage synchronization ID table format of each instruction type.
  • the storage synchronization ID table includes a plurality of storage units arranged in sequence. A first memory identification field storage bit and/or a second memory identification field storage bit is set in each storage unit.
  • Each instruction type corresponds to a set number of first memory identification fields and/or second memory identification fields.
  • the instruction extraction unit is configured to sequentially extract a first memory identification field and a second memory identification field of each instruction in the multiple lines of instructions to be executed.
  • the instruction type of each instruction is included in the instruction set.
  • the one-hot encoding unit is configured to obtain the one-hot encoding of the first memory identification field and/or the second memory identification field according to the first memory identification field and/or the second memory identification field corresponding to the instruction type of each instruction one-hot encoding.
  • the one-hot encoding of the first memory identification field and the one-hot encoding of the second memory identification field of each instruction are stored in the storage synchronization ID table according to the execution order of the multiple rows of instructions to be executed. Send other command parameters of each command in the multi-line commands to be executed to each command unit of the command queue of the command queue module.
  • a synchronization table unit is generated, which is configured for the instruction synchronization control module to store the one-hot encoding of the first memory identification field and the one-hot encoding of the second memory identification field of each instruction type in the synchronization ID table according to each instruction to obtain any instruction type and other instruction types.
  • Dependency identification information for two instruction types is generated along a first data dimension according to the dependency identification information.
  • the synchronization table includes a first data dimension and a second data dimension that intersect. The first data dimension of each instruction type corresponds to the storage bit number of the storage synchronization ID table of each instruction type.
  • the instruction execution unit is configured to execute corresponding instructions along the second data dimension of the synchronization table according to the dependencies of various types of instructions; the number of the second data dimension of each instruction type corresponds to the number of instruction units in the instruction queue, and the instruction synchronization controls
  • the module can call other command parameters corresponding to each command in the corresponding command unit through the synchronization table while executing the corresponding command, so as to execute each command in the multi-line to-be-run command line.
  • the storage synchronization ID table of each instruction type includes: a storage synchronization ID table of a load instruction, a storage synchronization ID table of a calculation instruction, and a storage synchronization ID table of a storage instruction.
  • the instruction queue is set to three instruction queues.
  • the instruction types are load instruction, calculation instruction and store instruction.
  • the load instruction corresponds to the load instruction queue
  • the calculation instruction corresponds to the calculation instruction queue
  • the storage corresponds to the storage instruction queue.
  • the storage synchronization ID table includes 8 storage units arranged in sequence.
  • the memory cell addresses are "0 to 7".
  • the unit address of the start bit is "0".
  • the number of instruction units in the instruction queue is 8 bits.
  • the unit addresses in the instruction queue are "0 ⁇ 7".
  • the unit address of the start bit of the instruction unit is "0".
  • the unit addresses "0-7" in the instruction queue correspond to the unit addresses "0-7" in the control queue, respectively.
  • the instruction set includes: each instruction type includes: a load instruction RDMA, a calculation instruction EXEC, and a store instruction WDMA.
  • the one-hot encoding of the first memory identification field or the one-hot encoding of the second memory identification field includes: "0, 1, 2.".
  • the current first memory bank number of the load instruction RDMA, the calculation instruction EXEC or the store instruction WDMA is encoded as "0, 1, 2".
  • the current second memory bank number of the load instruction RDMA, the calculation instruction EXEC or the store instruction WDMA is encoded as ", 1, 2"..
  • the one-hot encoding unit is further configured to: determine whether each instruction unit of the instruction queue is full, and if so, return to this unit until there are idle instructions in the instruction queue unit. If not, execute generate synchronization table element.
  • the depth of the three groups of instruction queues in the instruction queue module 203 is eight. Instructions in each instruction queue can be divided into five states: no instruction, instruction entry, waiting, instruction execution, and instruction exit. Refer to Figure 6 for the status of the instructions in the queue. A few notes for an example situation:
  • Each group of queues can have at most one instruction in the execution state. If an instruction is in the execution state, the instruction must be located at address 0 in the queue, such as instruction b0 and instruction c0.
  • Instructions in different queues can be executed in parallel, such as instruction b0 and instruction c0.
  • Each group of queues may not have any instructions in the execution state, and all instructions are in the waiting state; (the reason may be that the instructions at address 0 in this queue need to wait for the execution of instructions in other queues to complete, and the synchronization between such queues is controlled by Sync.
  • the Ctrl instruction is implemented by the synchronization control module 201).
  • the instruction fetch module can send the instruction into the corresponding queue, such as instruction c6.
  • the instruction fetch module suspends instruction fetching and blocks the fetching of all subsequent instructions.
  • RDMA moves the data from the outside to the A area of the internal memory.
  • PEA_EXEC reads the data in the memory A area for calculation.
  • the WDMA moves the data from the B area to the outside. It can be seen from the process that the dependency between instructions is essentially a conflict of access to the same storage space.
  • the present disclosure adopts a synchronization mechanism based on Memory Bank ID storage bit ID.
  • Each instruction that needs to access the memory memory contains the ID information of the accessed memory bank, and the Sync Ctrl instruction synchronization control module establishes all the instructions entering the queue according to this information. The dependencies between them ensure that the instructions are issued and executed in the correct order.
  • CSPM has 2 physical banks
  • Local shram has 32 physical banks.
  • Memory Bank ID storage bit ID in this design can be divided into two categories:
  • CSPM_ID (CSPM Bank ID, 1bit): Indicates the bank information of the CSPM that the instruction needs to use.
  • LSHM_ID (Local shram Bank ID, 5bit): Indicates the bank information of the Local shram that the instruction needs to use.
  • the Memory Bank ID storage bit ID in the instruction is only used to establish the dependency of the execution order between the instructions, and does not completely correspond to the physical bank division of the memory.
  • the CSPM_ID and LSHM_ID of each instruction are one-hot encoded, and the Memory Sync ID Table corresponding to the instruction queue is established to store the synchronization ID table. Therefore, whenever the content of the instruction queue changes (instruction entry or instruction exit), the Memory Sync ID Table storage synchronization ID table also needs to be updated accordingly.
  • the instruction synchronization relationship is still represented by a table, which is defined as the Sync Table synchronization table.
  • the establishment of the Sync Table synchronization table is carried out according to the following steps:
  • One-hot encoding the CSPM_ID and LSHM_ID information of the instruction and compare it with the Memory Sync ID Table storage synchronization ID table of the other two groups of queues that are not the queue to which the instruction belongs.
  • the two types of IDs need to be compared separately, as long as there is one item If the ID conflicts, that is, it is considered that there is a conflict relationship, and all instruction positions with conflict relationships are marked as '1'.
  • Each " ⁇ ” indicates the dependency between the abscissa and ordinate commands of its location. Taking “ ⁇ ” whose coordinates are (PEA-2, RDMA-0) as an example, it means that the instruction PEA-2 It depends on the instruction RDMA-0, that is, the PEA-2 can be issued after the execution of RDMA-0.
  • PEA-0 depends on RDMA-0 and RDMA-1
  • WDMA-0 and WDMA-1 depend on PEA-1.
  • the condition that the instruction in the instruction queue is allowed to be issued is: the instruction is located at the position of the read pointer of the queue to which it belongs, and does not depend on any instructions in other queues (there is no " ⁇ " in the corresponding row in the Sync Table synchronization table).
  • the instruction Sync Table synchronization table also needs to be updated at the same time. Assuming the 10th instruction in the example is queued, the Sync Table needs to update the row shown in the A1 box in the diagram based on the dependencies.
  • the instruction Sync Table synchronization table Whenever an instruction is executed and exits the queue, in addition to the instruction queue needs to be updated, the instruction Sync Table synchronization table also needs to be updated at the same time. Assuming that the second instruction in the example is executed, the Sync Table synchronization table needs to clear all " ⁇ " in the column shown in the B1 box in the figure to indicate that all dependencies constrained by this instruction are released.

Abstract

一种可重构处理器中的指令分发方法及系统。该可重构处理器包括取指令模块、指令同步控制模块及指令队列模块。该方法包括:设置各指令类型的存储同步ID表格式(S101);获取各指令的第一内存标识字段和第二内存标识字段(S102);获取第一、第二内存标识字段的one-hot编码(S103);获取同步表(S104);执行多行运行指令行中各指令(S105)。该方法可保证多指令的并行执行效率,减少内存冲突的同时可缩短指令运行时间。

Description

可重构处理器中的指令分发方法、系统以及存储介质
相关申请的交叉引用
本申请基于申请号为2020115396721、申请日为2020年12月23日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。
技术领域
本公开涉及可重构处理器的技术领域。本公开具体涉及一种可重构处理器中的指令分发方法、一种可重构处理器中的指令分发系统以及一种存储介质。
背景技术
在可重构处理器的开发及应用过程中,其指令一般有加载(RDMA,Read Direct Memory Access)数据,计算(EXEC),计算完成后存储(WDMA,Write Direct Memory Access)数据等不同类型的指令。由于其计算单元有限,如果各指令顺序执行,必然会导致效率低下。也可将指令分类,如可分为三类(加载、计算、存储),三类指令并行执行,这将有效地提高效率,但由于各类指令间存在依赖关系,如,计算前可能需要先加载完数据等,因此并行执行存在执行效率无法保障的问题。
发明内容
本公开的目的是提供一种可重构处理器中的指令分发方法。该方法通过提取各指令的内存标识字段,将内存标识字段进行one-hot编码后生成存储同步ID表,根据存储同步ID表生成同步表建立各指令之间的依赖关系,根据同步表执行各指令,使各指令依次执行,可保证多指令的并行执行效率,减少内存冲突、缩短指令运行时间。
本公开的另一个目的是提供一种可重构处理器中的指令分发系统。该系统通过提取各指令的内存标识字段,将内存标识字段进行one-hot编码后生成存储同步ID表,根据存储同步ID表生成同步表建立各指令之间的依赖关系,根据同步表执行各指令,使各指令依次执行,可保证多指令的并行执行效率,减少内存冲突、缩短指令运行时间。
本公开的第一个方面提供了一种可重构处理器中的指令分发方法。所述可重构处理器包括取指令模块、指令同步控制模块及指令队列模块;所述取指令模块用于分发多行待运行指令给所述指令同步控制模块及所述指令队列模块;所述指令同步控制模块用于控制所述指令队列模块中的指令执行;所述指令队列模块包括多个指令队列;每个所述指令队列中依次排 列设置多个指令单元;每个所述指令队列对应一个指令类型。所述方法包括:步骤S101,设置各指令类型的存储同步ID表格式;存储同步ID表中包括依次设置的多个存储单元;每个存储单元中设置第一内存标识字段存储位和/或第二内存标识字段存储位;各指令类型对应设定数量的第一内存标识字段和/或第二内存标识字段;步骤S102,依次提取所述多行待运行指令中各指令的第一内存标识字段和第二内存标识字段;步骤S103,获取所述各指令的第一内存标识字段的one-hot编码和/或第二内存标识字段的one-hot编码;将所述各指令的第一内存标识字段的one-hot编码、第二内存标识字段的one-hot编码根据所述多行待运行指令的执行次序存入对应的所述存储同步ID表;将所述各指令的指令参数发送到所述指令队列模块中对应的所述指令队列的各指令单元中;步骤S104,所述指令同步控制模块根据所述存储同步ID表中各指令类型的第一内存标识字段one-hot编码和所述第二内存标识字段one-hot编码获取任一个指令类型与其他两个指令类型的依赖关系标识信息;根据所述依赖关系标识信息沿第一数据维度生成同步表;所述同步表包括交汇的第一数据维度和第二数据维度;所述各指令类型的第一数据维度与各指令类型的存储同步ID表的存储位数相应;步骤S105,沿所述同步表的第二数据维度,根据各类型指令的依赖关系执行相应指令;所述各指令类型的第二数据维度的数量与所述指令队列中的指令单元数量相应,所述指令同步控制模块通过所述同步表在执行相应指令的同时调用对应指令单元中各指令对应的指令参数,以执行所述多行待运行指令行中的各指令。
在本公开的实施例中,所述指令类型包括加载指令、计算指令和存储指令;所述存储同步ID表包括:加载指令的存储同步ID表、计算指令的存储同步ID表和存储指令的存储同步ID表;以及所述指令队列包括:所述加载指令对应加载指令队列、所述计算指令对应计算指令队列、所述存储指令对应存储指令队列。
在本公开的实施例中,所述存储同步ID表中包括依次设置的8个存储单元;所述指令队列中包括8个指令单元。
在本公开的实施例中,所述加载指令的存储同步ID表中的每个存储单元包括第一内存标识字段和第二内存标识字段;所述计算指令的存储同步ID表中的每个存储单元包括第一内存标识字段和第二内存标识字段;以及所述存储指令的存储同步ID表中的每个存储单元包括第二内存标识字段。
在本公开的实施例中,所述步骤S103中还包括:判断所述指令队列的各指令单元是否已满,若是,则返回本步骤,直到所述指令队列中具有空闲指令单元;若否,则执行步骤S104。
本公开的第二个方面提供了一种可重构处理器中的指令分发系统。所述可重构处理器包 括取指令模块、指令同步控制模块及指令队列模块;所述取指令模块用于分发多行待运行指令给所述指令同步控制模块及所述指令队列模块;所述指令同步控制模块用于控制所述指令队列模块中的指令执行;所述指令队列模块包括多个指令队列;每个所述指令队列中依次排列设置多个指令单元;每个所述指令队列对应一个指令类型。
所述系统包括下述单元。
存储同步ID表设置单元,其配置为设置各指令类型的存储同步ID表格式;存储同步ID表中包括依次设置的多个存储单元;每个存储单元中设置第一内存标识字段存储位和/或第二内存标识字段存储位;各指令类型对应设定数量的第一内存标识字段和/或第二内存标识字段。
指令提取单元,其配置为依次提取所述多行待运行指令中各指令的第一内存标识字段和第二内存标识字段。
one-hot编码单元,其配置获取所述各指令的第一内存标识字段的one-hot编码和/或第二内存标识字段的one-hot编码;将所述各指令的第一内存标识字段的one-hot编码、第二内存标识字段的one-hot编码根据所述多行待运行指令的执行次序存入对应的所述存储同步ID表;将所述各指令的指令参数发送到所述指令队列模块中对应的所述指令队列的各指令单元中。
生成同步表单元,其配置为所述指令同步控制模块根据所述存储同步ID表中各指令类型的第一内存标识字段one-hot编码和所述第二内存标识字段one-hot编码获取任一个指令类型与其他两个指令类型的依赖关系标识信息;根据所述依赖关系标识信息沿第一数据维度生成同步表;所述同步表包括交汇的第一数据维度和第二数据维度;所述各指令类型的第一数据维度与各指令类型的存储同步ID表的存储位数相应。
指令执行单元,其配置为沿所述同步表的第二数据维度,根据各类型指令的依赖关系执行相应指令;所述各指令类型的第二数据维度的数量与所述指令队列中的指令单元数量相应,所述指令同步控制模块通过所述同步表在执行相应指令的同时调用对应指令单元中各指令对应的指令参数,以执行所述多行待运行指令行中的各指令。
在本公开的实施例中,所述指令类型包括加载指令、计算指令和存储指令;所述存储同步ID表包括:加载指令的存储同步ID表、计算指令的存储同步ID表和存储指令的存储同步ID表;以及所述指令队列包括:所述加载指令对应加载指令队列、所述计算指令对应计算指令队列、所述存储指令对应存储指令队列。
在本公开的实施例中,所述存储同步ID表中包括依次设置的8个存储单元;所述指令队列中包括8个指令单元数量。
在本公开的实施例中,所述加载指令的存储同步ID表中的每个存储单元包括第一内存 标识字段和第二内存标识字段;所述计算指令的存储同步ID表中的每个存储单元包括第一内存标识字段和第二内存标识字段;以及所述存储指令的存储同步ID表中的每个存储单元包括第二内存标识字段。
在本公开的实施例中,所述one-hot编码单元中还配置为:判断所述指令队列的各指令单元是否已满,若是,则返回本单元,直到所述指令队列中具有空闲指令单元;若否,则执行生成同步表单元。
下文将以明确易懂的方式,结合附图对一种可重构处理器中的指令分发方法及系统的特性、技术特征、优点及其实现方式予以进一步说明。
附图说明
图1是用于说明在本公开在一种实施方式中,中可重构处理器中的指令分发方法的流程示意图。
图2是用于说明在本公开在一种实施方式中,存储同步ID表格式的组成示意图。
图3是用于说明在本公开在一种实施方式中,同步表的组成示意图。
图4是用于说明在本公开在一种实施方式中,指令队列的结构示意图。
图5是用于说明在本公开在一种实施方式中,可重构处理器的组成示意图。
图6是用于说明在本公开在一种实施方式中,指令队列模块中的多种状态图。
具体实施方式
为了对发明的技术特征、目的和效果有更加清楚的理解,现对照附图说明本公开的具体实施方式,在各图中相同的标号表示结构相同或结构相似但功能相同的部件。
在本文中,“示意性”表示“充当实例、例子或说明”,不应将在本文中被描述为“示意性”的任何图示、实施方式解释为一种更优选的或更具优点的技术方案。为使图面简洁,各图中只示意性地表示出了与本示例性实施例相关的部分,它们并不代表其作为产品的实际结构及真实比例。
图5是用于说明在本公开在一种实施方式中,可重构处理器的组成示意图。如图5所示,可重构处理器包括取指令模块202、指令同步控制模块201及指令队列模块203。
取指令模块202能够将多行待运行指令经过拆分后分别分发给指令同步控制模块201及指令队列模块203。
指令同步控制模块201控制指令队列模块中的指令执行。待运行指令类型包括在一个指令集中。指令集中包括多个指令类型。
图4是用于说明在本公开在一种实施方式中,指令队列的结构示意图。如图4所示,在指令队列模块中设置多个指令队列。例如:计算PEA_EXEC队列102、存储WDMA队列103及加载RDMA队列101。计算PEA_EXEC队列102、存储WDMA队列103及加载RDMA队列101中分别依次排列设置8指令单元。一个指令队列对应一个指令类型。
本公开的第一个方面,提供了一种可重构处理器中的指令分发方法。图1是用于说明在本公开在一种实施方式中,可重构处理器中的指令分发方法的流程示意图。如图1所示,可重构处理器中的指令分发方法包括:
步骤S101,设置各指令类型的存储同步ID表格式。
在本步骤中,如图2所示,为各指令类型的存储同步ID表格式。存储同步ID表中包括依次设置的多个存储单元。每个存储单元上设置第一内存标识字段CSPM_ID存储位和/或第二内存标识字段LSHM_ID存储位。各指令类型对应设定数量的第一内存标识字段和/或第二内存标识字段。
如图2所示,在加载RDMA指令的存储同步ID表具有8个单元,每个单元均包括第一内存标识字段CSPM_ID和第二内存标识字段LSHM_ID。计算PEA_EXEC指令的存储同步ID表格式中具有8个单元,每个单元均包括第一内存标识字段CSPM_ID和第二内存标识字段LSHM_ID。存储WDMA指令的存储同步ID表格中具有8个单元,其具有第二内存标识字段LSHM_ID。
步骤S102,获取各指令的第一内存标识字段和第二内存标识字段。
本步骤中,依次提取多行待运行指令中各指令的第一内存标识字段和/或第二内存标识字段。各指令的指令类型包括在指令集中。
如,下述当前程序1:
(1)RDMA context。
(2)RDMA data。
(3)PEA_EXEC。
(4)RDMA context。
(5)PEA_EXEC。
(6)WDMA data。
(7)RDMA context。
(8)RDMA data。
(9)PEA_EXEC。
(10)WDMA data。
依次读取程序1的10条指令中的“RDMA数据加载指令”、“PEA_EXEC计算指令”及“WDMA数据存储指令”。获取“RDMA数据加载指令”、“PEA_EXEC计算指令”及“WDMA数据存储指令”的第一内存标识字段CSPM_ID和/或第二内存标识字段LSHM_ID。
“RDMA数据加载指令”、“PEA_EXEC计算指令”、“WDMA数据存储指令”所调用的第一内存标识字段CSPM_ID和第二内存标识字段LSHM_ID数量的对应关系如下表1所示:
  CSPM_ID LSHM_ID
PEA_EXEC 1个 2个
WDMA 0个 1个
RDMA 0个或1个 0个或1个
表1
步骤S103,获取第一、第二内存标识字段的one-hot编码。
在本步骤中,根据各指令的指令类型所对应的第一内存标识字段和/或第二内存标识字段获取第一内存标识字段的one-hot编码和/或第二内存标识字段的one-hot编码。
将各指令的第一内存标识字段one-hot编码、第二内存标识字段one-hot编码根据多行待运行指令的执行次序存入存储同步ID表中。
将各指令中的其他指令参数发送到指令队列模块的指令队列的各指令单元中。
如,根据程序1中的“RDMA数据加载指令”的CSPM_ID字段的one-hot规则编码和LSHM_ID字段的one-hot规则编码存入存储同步ID表中,将RDMA数据加载指令的其他指令参数对应存到指令队列的指令单元中。
各指令的第一内存库号CSPM_ID字段编码及第二内存库号LSHM_ID字段编码举例如下:
(1)RDMA context,CSPM_ID=0
(2)RDMA data,LSHM_ID=0
(3)PEA_EXEC CSPM_ID=0,LSHM_ID=0,LSHM_ID=1
(4)RDMA context,CSPM_ID=1
(5)PEA_EXEC CSPM_ID=1,LSHM_ID=1,LSHM_ID=2
(6)WDMA data,LSHM_ID=2
(7)RDMA context,CSPM_ID=0
(8)RDMA data,LSHM_ID=0
(9)PEA_EXEC CSPM_ID=0,LSHM_ID=0,LSHM_ID=1
(10)WDMA data,LSHM_ID=1
步骤S104,获取同步表。
在本步骤中,指令同步控制模块根据各指令存储同步ID表中各指令类型的第一内存标识字段one-hot编码和第二内存标识字段one-hot编码获取任一个指令类型与其他两个指令类型的依赖关系标识信息。根据依赖关系标识信息沿一个第一数据维度生成同步表。同步表包括交汇的一个第一数据维度和一个第二数据维度。各指令类型的第一数据维度与各指令类型的存储同步ID表的存储位数相应。
如图3所示,其第一数据维度为“A”所指的方向。第二数据维度为“B”所指的方向。
根据PEA_EXEC计算指令的one-hot编码分别对比WDMA数据存储指令one-hot编码及RDMA数据加载指令one-hot编码后,从第一数据维度“A”的方向,将依赖关系标识信息标识在表3中。例如表3中行A1的“√”代表WDMA数据存储指令中第1位与PEA_EXEC计算指令的对应依赖关系。根据指令行中的指令,存入一个指令和另外两个指令的依赖关系。
例如:将当前指令的CSPM_ID、LSHM_ID信息的one-hot编码与非当前指令所属队列的其余2组队列的Memory Sync ID Table存储同步ID表比较,CSPM_ID、LSHM_ID需要单独比较,只要有1项ID发生冲突,即认为存在冲突关系,将所有存在冲突关系的指令位置标记为1。
例如:如图3所示,同步表中RDMA指令所对应的一维数据维度r0~r7的单元数量与图2中加载RDMA指令的存储同步ID表中的8个单元数量是相对应的。
如图3中所示,RDMA指令与PEA_EXEC计算指令的对应关系是,加载RDMA指令的第3、4位指令依赖于PEA_EXEC计算指令的0位指令。
例如:如图3所示,同步表中WDMA数据存储指令所对应的一维数据维度w0~w7的单元数量与图2中WDMA数据存储指令的存储同步ID表中的8个单元数量是相对应的。
如图3中所示,WDMA数据存储指令与PEA_EXEC计算指令的对应关系是,WDMA数据存储指令的第0位指令依赖于PEA_EXEC计算指令的1位指令,WDMA数据存储指令的第1位指令依赖于PEA_EXEC计算指令的0、1、2位指令。
PEA_EXEC计算指令的0位指令,依赖于RDMA的0、1位。PEA_EXEC计算指 令的1位指令,依赖于RDMA的2位,PEA_EXEC计算指令的2位指令,依赖于RDMA的0、1、3、4位。
步骤S105,执行多行运行指令行中的各指令。
本步骤中,沿同步表的第二数据维度,根据各类型指令的依赖关系执行相应指令;各指令类型的第二数据维度的数量与所述指令队列中的指令单元数量相应,指令同步控制模块能够通过同步表在执行相应指令的同时调用对应指令单元中各指令对应的其他指令参数,以执行多行待运行指令行中的各指令。
例如:如图3所示,RDMA数据加载指令的第二数据维度的8个单元数量与图4中指令队列中的RDMA数据加载队列中的8个单元数量一致。PEA_EXEC计算指令的第二数据维度的8个单元数量与图4中指令队列中的PEA_EXEC计算指令队列中的8个单元数量一致。WDMA数据存储指令的第二数据维度的8个单元数量与图4中指令队列中的WDMA数据存储指令加载队列中的8个单元数量一致。
例如:如图3所示,在程序执行时,是根据第二数据维度B向,即列B1的排布方向次序,依次执行。如:当执行到RDMA数据加载指令的第0位时,对应调用RDMA数据加载队列中的第0位的有关调用指令的其他相关数据。当执行到RDMA数据加载指令的第1位时,对应调用RDMA数据加载队列中的第1位的有关调用指令的其他相关数据。从0~7位依次调用。
如图3所示,由于RDMA指令的0、1位不依赖于任何指令,因此,RDMA的0、1执行时不受到任何指令的限制,RDMA的0位执行后,RDMA的0位对应该列中PEA_EXEC计算指令的0、2位指令的依赖消除,RDMA的1位执行后,RDMA的1位对应该列中PEA_EXEC计算指令的0、2位指令的依赖消除。RDMA第0位执行完后,会更新B0列,然后执行RDMA第1位,再更新B1列。
在本公开中指令分发方法的另一种实施方式中,各指令类型的存储同步ID表包括:加载指令的存储同步ID表、计算指令的存储同步ID表和存储指令的存储同步ID表。
指令队列设置为三个指令队列。指令类型分别为加载指令、计算指令和存储指令。加载指令对应加载指令队列、计算指令对应计算指令队列、存储对应存储指令队列。
本公开的可重构处理器中的指令分发方法的又一种实施方式中,存储同步ID表中包括依次设置的8个存储单元。存储单元地址为“0~7”。起始位的单元地址为“0”。
指令队列中的指令单元数量为8位。指令队列中单元地址为“0~7”。指令单元的起始位的单元地址为“0”。指令队列中单元地址“0~7”分别对应控制队列中单元地址“0~7”。
在本公开中指令分发方法的再一种实施方式中,指令集包括:各指令类型包括:加载指令RDMA、计算指令EXEC和存储指令WDMA。
第一内存标识字段的one-hot编码或第二内存标识字段的one-hot编码包括:“0,1,2...”。
加载指令RDMA、计算指令EXEC或存储指令WDMA的当前第一内存库号编码为“0,1,2...”。加载指令RDMA、计算指令EXEC或存储指令WDMA的当前第二内存库号编码为“0,1,2...”。
在本公开中指令分发方法的再一种实施方式中,步骤S103中还包括:
判断指令队列的各指令单元是否已满,若是,则返回本步骤,直到指令队列中具有空闲指令单元。若否,则执行步骤S104。
本公开的第二个方面提供了一种可重构处理器中的指令分发系统,可重构处理器包括一个取指令模块、一个指令同步控制模块及一个指令队列模块。取指令模块能够将多行待运行指令分别分发给指令同步控制模块及指令队列模块。
指令同步控制模块控制指令队列模块中的指令执行。待运行指令类型包括在一个指令集中。指令集中包括多个指令类型。
在指令队列模块中设置多个指令队列。指令队列中依次排列设置多个指令单元。一个指令队列对应一个指令类型。
可重构处理器中的指令分发系统包括:一个存储同步ID表设置单元、一个指令提取单元、一个one-hot编码单元、一个生成同步表单元和一个指令执行单元。其中:
存储同步ID表设置单元,其配置为设置各指令类型的存储同步ID表格式。存储同步ID表中包括依次设置的多个存储单元。每个存储单元中设置第一内存标识字段存储位和/或第二内存标识字段存储位。各指令类型对应设定数量的第一内存标识字段和/或第二内存标识字段。
指令提取单元,其配置为依次提取多行待运行指令中各指令的一个第一内存标识字段和一个第二内存标识字段。各指令的指令类型包括在指令集中。
one-hot编码单元,其配置为根据各指令的指令类型所对应的第一内存标识字段和/或第二内存标识字段获取第一内存标识字段的one-hot编码和/或第二内存标识字段的one-hot编码。
将各指令的第一内存标识字段one-hot编码、第二内存标识字段one-hot编码根据多行待运行指令的执行次序存入存储同步ID表。将多行待运行指令中各指令的其他指令参数发送到指令队列模块的指令队列各指令单元中。
生成同步表单元,其配置为指令同步控制模块根据各指令存储同步ID表中各指令类型的第一内存标识字段one-hot编码和第二内存标识字段one-hot编码获取任一个指令类型与其他两个指令类型的依赖关系标识信息。根据依赖关系标识信息沿一个第一数据维度生成同步表。同步表包括交汇的一个第一数据维度和一个第二数据维度。各指令类型的第一数据维度与各指令类型的存储同步ID表的存储位数相应。
指令执行单元,其配置为沿同步表的第二数据维度,根据各类型指令的依赖关系执行相应指令;各指令类型的第二数据维度的数量与指令队列中的指令单元数量相应,指令同步控制模块能够通过同步表在执行相应指令的同时调用对应指令单元中各指令对应的其他指令参数,以执行多行待运行指令行中的各指令。
在本公开中指令分发系统的另一种实施方式中,各指令类型的存储同步ID表包括:加载指令的存储同步ID表、计算指令的存储同步ID表和存储指令的存储同步ID表。
指令队列设置为三个指令队列。指令类型分别为加载指令、计算指令和存储指令。加载指令对应加载指令队列、计算指令对应计算指令队列、存储对应存储指令队列。
在本公开中指令分发系统的又一种实施方式中,存储同步ID表中包括依次设置的8个存储单元。存储单元地址为“0~7”。起始位的单元地址为“0”。
指令队列中的指令单元数量为8位。指令队列中单元地址为“0~7”。指令单元的起始位的单元地址为“0”。指令队列中单元地址“0~7”分别对应控制队列中单元地址“0~7”。
在本公开中指令分发系统的再一种实施方式中,指令集包括:各指令类型包括:加载指令RDMA、计算指令EXEC和存储指令WDMA。
第一内存标识字段的one-hot编码或第二内存标识字段的one-hot编码包括:“0,1,2...”。
加载指令RDMA、计算指令EXEC或存储指令WDMA的当前第一内存库号编码为“0,1,2...”。加载指令RDMA、计算指令EXEC或存储指令WDMA的当前第二内存库号编码为“,1,2...”。
在本公开中指令分发系统的再一种实施方式中,one-hot编码单元中还配置为:判断指令队列的各指令单元是否已满,若是,则返回本单元,直到指令队列中具有空闲指令单元。若否,则执行生成同步表单元。
在本公开的可重构处理器中的指令分发方法另一种实施方式中:
图6所示,指令队列模块203中三组指令队列的深度为8。各指令队列中的指令可分为5种状态:分别为:无指令,指令进入,等待,指令执行,指令退出。队列中的指 令状态可参照图6。针对示例情况的几点说明:
2.1、一组队列内部,指令严格按照进入顺序执行。
2.2、每组队列最多可以有1条指令处于执行状态,如有指令处于执行状态,该指令必位于队列中的0地址,如指令b0、指令c0。
2.3、不同队列的指令可以并行执行,如指令b0和指令c0。
2.4、每组队列可以没有任何指令处于执行状态,所有指令处于等待状态;(原因可能是此队列中位于0地址的指令需要等待其它队列的指令执行完成,此类队列间的同步控制由Sync Ctrl指令同步控制模块201实现)。
2.5、只要队列仍有空闲位置,取指模块就可以将指令送入相应队列,如指令c6。
2.6、如果某组队列已满,下一条取出来的指令仍属于此队列,则需要等待队列有空位指令才可进入,同时取指模块暂停取指并阻塞所有后续指令的取出。
2.7、处于执行状态的指令,只要收到相应模块的done信号,即可让执行中的指令完成并退出,如指令a0。
2.8、3组队列中,同一时刻仅允许有1条指令处于退出状态或者1条指令处于进入状态。如果发生冲突,优先指令退出;(图6中的指令队列同时出现指令进入和指令退出状态,仅为示意两种状态,实际电路中不允许出现这样的情况)。
RDMA从外部将数据搬到内部memory的A区域,完成后PEA_EXEC读取memory A区域内的数据进行计算,计算完成后存到B区域,WDMA再从B区域将数据搬到外部。由流程可知,指令间的依赖关系,本质上是对同一存储空间访问的冲突关系。
本公开采用基于Memory Bank ID存储位ID的同步机制,每条需要访问memory存储器的指令中都包含所访问的memory bank的ID信息,Sync Ctrl指令同步控制模块根据这些信息建立所有进入队列的指令之间的依赖关系,确保指令按照正确的顺序下发执行。
设置有两块memory,CSPM和Local shram。CSPM有2个物理bank,Local shram有32个物理bank。那么本设计中Memory Bank ID存储位ID就可分为2类:
CSPM_ID(CSPM Bank ID,1bit):表示指令需要用到的CSPM的bank信息。
LSHM_ID(Local shram Bank ID,5bit):表示指令需要用到的Local shram的bank信息。
指令中的Memory Bank ID存储位ID仅用于建立指令之间执行顺序的依赖关系,并不是与memory的物理bank划分完全对应。例如前一条指令需占用local shram的bank0-2,在指令中用LSHM_ID=0表示,后一条指令如果需要占用local shram的bank2, 那么视为其依赖于前一条指令,因此指令中需要令LSHM_ID=0来标识。
假设各队列指令涉及到的Memory Bank ID存储位ID数量整理如表1所示。
如图2所示,根据指令预译码模块的输出,将每条指令的CSPM_ID和LSHM_ID进行one-hot编码,建立与指令队列对应的Memory Sync ID Table存储同步ID表。因此每当指令队列内容发生变化(指令进入或指令退出),Memory Sync ID Table存储同步ID表也需要作出相应的更新。
建立Memory Sync ID Table存储同步ID表后,每条指令进入指令队列前,必须与其余2组队列的所有未执行指令进行冲突检测(无需与本指令所属队列进行检测,因为一组队列内严格按照进入顺序执行),以建立指令同步关系。指令同步关系仍以一个表格表示,定义为Sync Table同步表。Sync Table同步表的建立按以下步骤进行:
提取该指令的所有Memory Bank ID存储位,即CSPM_ID、LSHM_ID信息。
将该指令的CSPM_ID、LSHM_ID信息进行one-hot编码,并与非该指令所属队列的其余2组队列的Memory Sync ID Table存储同步ID表进行比较,2类ID需要单独比较,只要有1项ID发生冲突,即认为存在冲突关系,将所有存在冲突关系的指令位置标记为‘1’。
完成上一步后,产生了2组8bit的“01”序列,2组序列被写入Sync Table同步表中的相应位置。这些序列表示该指令与所有指令队列中的非同一队列的指令之间的冲突关系。只有当所有标记为“1”的位置所对应的指令执行完毕后,该条指令才可以被下发。
图4中给出一段指令的示例,以及在指令队列中的指令依赖关系(黑色箭头表示)。注意,在指令依赖关系图中,省略了部分可以被优化的依赖关系箭头,在下面的Sync Table同步表的说明中会有所体现。
(1)RDMA context,CSPM_ID=0
(2)RDMA data,LSHM_ID=0
(3)PEA_EXEC CSPM_ID=0,LSHM_ID=0,LSHM_ID=1
(4)RDMA context,CSPM_ID=1
(5)PEA_EXEC CSPM_ID=1,LSHM_ID=1,LSHM_ID=2
(6)WDMA data,LSHM_ID=2
(7)RDMA context,CSPM_ID=0
(8)RDMA data,LSHM_ID=0
(9)PEA_EXEC CSPM_ID=0,LSHM_ID=0,LSHM_ID=1
(10)WDMA data,LSHM_ID=1
如图3所示,根据上述指令示例,示意出各指令依赖关系在Sync Table同步表中的呈现形式。几点说明如下:
3.1、每个“√”表示其所在位置的横纵坐标指令之间的依赖关系,以坐标为(PEA-2,RDMA-0)的“√”为例,其表示含义为,指令PEA-2依赖于指令RDMA-0,即RDMA-0执行完毕才可以下发PEA-2。
3.2、指令依赖关系允许一对多和多对一,例如:PEA-0依赖于RDMA-0和RDMA-1,WDMA-0和WDMA-1均依赖于PEA-1。
3.3、指令队列中的指令允许下发的条件为:指令位于所属队列的读指针所处位置,且不依赖于其他队列的任何指令(在Sync Table同步表中的对应行无“√”)。
3.4、每当一条指令进入队列时,除了指令队列需要更新,指令Sync Table同步表也需要同时更新。假设示例中的第10条指令进入队列,Sync Table同步表需要根据依赖关系更新图中的A1框中所示行。
每当一条指令执行完毕退出队列时,除了指令队列需要更新,指令Sync Table同步表也需要同时更新。假设示例中的第2条指令执行完毕,Sync Table同步表需要将图中B1框中所示列中的“√”全部清除,以表示受该条指令约束的所有依赖关系解除。
应当理解,虽然本说明书是按照各个实施方式中描述的,但并非每个实施方式仅包含一个独立的技术方案,说明书的这种叙述方式仅仅是为清楚起见,本领域技术人员应当将说明书作为一个整体,各实施例中的技术方案也可以经适当组合,形成本领域技术人员可以理解的其他实施方式。
上文所列出的一系列的详细说明仅仅是针对本公开的可行性实施方式的具体说明,它们并非用以限制本公开的保护范围,凡未脱离本公开技艺精神所作的等效实施方式或变更均应包含在本公开的保护范围之内。

Claims (10)

  1. 一种可重构处理器中的指令分发方法,其特征在于,所述可重构处理器包括取指令模块、指令同步控制模块及指令队列模块;所述取指令模块用于分发多行待运行指令给所述指令同步控制模块及所述指令队列模块;所述指令同步控制模块用于控制所述指令队列模块中的指令执行;所述指令队列模块包括多个指令队列;每个所述指令队列中依次排列设置多个指令单元;每个所述指令队列对应一个指令类型;
    所述方法包括:
    步骤S101,设置各指令类型的存储同步ID表格式;存储同步ID表中包括依次设置的多个存储单元;每个存储单元中设置第一内存标识字段存储位和/或第二内存标识字段存储位;各指令类型对应设定数量的第一内存标识字段和/或第二内存标识字段;
    步骤S102,依次提取所述多行待运行指令中各指令的第一内存标识字段和第二内存标识字段;
    步骤S103,获取所述各指令的第一内存标识字段的one-hot编码和/或第二内存标识字段的one-hot编码;
    将所述各指令的第一内存标识字段的one-hot编码、第二内存标识字段的one-hot编码根据所述多行待运行指令的执行次序存入对应的所述存储同步ID表;
    将所述各指令的指令参数发送到所述指令队列模块中对应的所述指令队列的各指令单元中;
    步骤S104,所述指令同步控制模块根据所述存储同步ID表中各指令类型的第一内存标识字段one-hot编码和所述第二内存标识字段one-hot编码获取任一个指令类型与其他两个指令类型的依赖关系标识信息;根据所述依赖关系标识信息沿第一数据维度生成同步表;所述同步表包括交汇的第一数据维度和第二数据维度;所述各指令类型的第一数据维度与各指令类型的存储同步ID表的存储位数相应;
    步骤S105,沿所述同步表的第二数据维度,根据各类型指令的依赖关系执行相应指令;所述各指令类型的第二数据维度的数量与所述指令队列中的指令单元数量相应,所述指令同步控制模块通过所述同步表在执行相应指令的同时调用对应指令单元中各指令对应的指令参数,以执行所述多行待运行指令行中的各指令。
  2. 根据权利要求1所述的指令分发方法,其特征在于,
    所述指令类型包括加载指令、计算指令和存储指令;
    所述存储同步ID表包括:加载指令的存储同步ID表、计算指令的存储同步ID表和存储指令的存储同步ID表;以及
    所述指令队列包括:
    所述加载指令对应加载指令队列、所述计算指令对应计算指令队列、所述存储指令对应存储指令队列。
  3. 根据权利要求1或2所述的指令分发方法,其特征在于,所述存储同步ID表中包括依次设置的8个存储单元;
    所述指令队列中包括8个指令单元。
  4. 根据权利要求3中所述的指令分发方法,其特征在于,
    所述加载指令的存储同步ID表中的每个存储单元包括第一内存标识字段和第二内存标识字段;
    所述计算指令的存储同步ID表中的每个存储单元包括第一内存标识字段和第二内存标识字段;以及
    所述存储指令的存储同步ID表中的每个存储单元包括第二内存标识字段。
  5. 根据权利要求1至4中任一项所述的指令分发方法,其特征在于,所述步骤S103中还包括:
    判断所述指令队列的各指令单元是否已满,若是,则返回本步骤,直到所述指令队列中具有空闲指令单元;若否,则执行步骤S104。
  6. 一种可重构处理器中的指令分发系统,其特征在于,所述可重构处理器包括取指令模块、指令同步控制模块及指令队列模块;所述取指令模块用于分发多行待运行指令给所述指令同步控制模块及所述指令队列模块;所述指令同步控制模块用于控制所述指令队列模块中的指令执行;所述指令队列模块包括多个指令队列;每个所述指令队列中依次排列设置多个指令单元;每个所述指令队列对应一个指令类型;
    所述系统包括:
    存储同步ID表设置单元,其配置为设置各指令类型的存储同步ID表格式;存储同步ID表中包括依次设置的多个存储单元;每个存储单元中设置第一内存标识字段存储位和/或第二内存标识字段存储位;各指令类型对应设定数量的第一内存标识字段和/或第二内存标 识字段;
    指令提取单元,其配置为依次提取所述多行待运行指令中各指令的第一内存标识字段和第二内存标识字段;
    one-hot编码单元,其配置获取所述各指令的第一内存标识字段的one-hot编码和/或第二内存标识字段的one-hot编码;
    将所述各指令的第一内存标识字段的one-hot编码、第二内存标识字段的one-hot编码根据所述多行待运行指令的执行次序存入对应的所述存储同步ID表;
    将所述各指令的指令参数发送到所述指令队列模块中对应的所述指令队列的各指令单元中;
    生成同步表单元,其配置为所述指令同步控制模块根据所述存储同步ID表中各指令类型的第一内存标识字段one-hot编码和所述第二内存标识字段one-hot编码获取任一个指令类型与其他两个指令类型的依赖关系标识信息;根据所述依赖关系标识信息沿第一数据维度生成同步表;所述同步表包括交汇的第一数据维度和第二数据维度;所述各指令类型的第一数据维度与各指令类型的存储同步ID表的存储位数相应;
    指令执行单元,其配置为沿所述同步表的第二数据维度,根据各类型指令的依赖关系执行相应指令;所述各指令类型的第二数据维度的数量与所述指令队列中的指令单元数量相应,所述指令同步控制模块通过所述同步表在执行相应指令的同时调用对应指令单元中各指令对应的指令参数,以执行所述多行待运行指令行中的各指令。
  7. 根据权利要求6所述的指令分发系统,其特征在于,
    所述指令类型包括加载指令、计算指令和存储指令;
    所述存储同步ID表包括:加载指令的存储同步ID表、计算指令的存储同步ID表和存储指令的存储同步ID表;以及
    所述指令队列包括:所述加载指令对应加载指令队列、所述计算指令对应计算指令队列、所述存储指令对应存储指令队列。
  8. 根据权利要求6或7所述的指令分发系统,其特征在于,所述存储同步ID表中包括依次设置的8个存储单元;
    所述指令队列中包括8个指令单元数量。
  9. 根据权利要求8所述的指令分发系统,其特征在于,
    所述加载指令的存储同步ID表中的每个存储单元包括第一内存标识字段和第二内存标识字段;
    所述计算指令的存储同步ID表中的每个存储单元包括第一内存标识字段和第二内存标识字段;以及
    所述存储指令的存储同步ID表中的每个存储单元包括第二内存标识字段。
  10. 根据权利要求6至9中任一项所述的指令分发系统,其特征在于,所述one-hot编码单元中还配置为:判断所述指令队列的各指令单元是否已满,若是,则返回本单元,直到所述指令队列中具有空闲指令单元;若否,则执行生成同步表单元。
PCT/CN2021/092239 2020-12-23 2021-05-07 可重构处理器中的指令分发方法、系统以及存储介质 WO2022134426A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/770,553 US11977894B2 (en) 2020-12-23 2021-05-07 Method and system for distributing instructions in reconfigurable processor and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011539672.1A CN112256632B (zh) 2020-12-23 2020-12-23 一种可重构处理器中的指令分发方法及系统
CN202011539672.1 2020-12-23

Publications (1)

Publication Number Publication Date
WO2022134426A1 true WO2022134426A1 (zh) 2022-06-30

Family

ID=74225833

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/092239 WO2022134426A1 (zh) 2020-12-23 2021-05-07 可重构处理器中的指令分发方法、系统以及存储介质

Country Status (2)

Country Link
CN (1) CN112256632B (zh)
WO (1) WO2022134426A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112256632B (zh) * 2020-12-23 2021-06-04 北京清微智能科技有限公司 一种可重构处理器中的指令分发方法及系统

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7007271B2 (en) * 2002-04-18 2006-02-28 Sun Microsystems, Inc. Method and apparatus for integrated instruction scheduling and register allocation in a postoptimizer
CN105487838A (zh) * 2015-11-23 2016-04-13 上海交通大学 一种动态可重构处理器的任务级并行调度方法与系统
CN111897580A (zh) * 2020-09-29 2020-11-06 北京清微智能科技有限公司 一种可重构阵列处理器的指令调度系统及方法
CN112256632A (zh) * 2020-12-23 2021-01-22 北京清微智能科技有限公司 一种可重构处理器中的指令分发方法及系统

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7007271B2 (en) * 2002-04-18 2006-02-28 Sun Microsystems, Inc. Method and apparatus for integrated instruction scheduling and register allocation in a postoptimizer
CN105487838A (zh) * 2015-11-23 2016-04-13 上海交通大学 一种动态可重构处理器的任务级并行调度方法与系统
CN111897580A (zh) * 2020-09-29 2020-11-06 北京清微智能科技有限公司 一种可重构阵列处理器的指令调度系统及方法
CN112256632A (zh) * 2020-12-23 2021-01-22 北京清微智能科技有限公司 一种可重构处理器中的指令分发方法及系统

Also Published As

Publication number Publication date
US20230068463A1 (en) 2023-03-02
CN112256632B (zh) 2021-06-04
CN112256632A (zh) 2021-01-22

Similar Documents

Publication Publication Date Title
CN106991011B (zh) 基于cpu多线程与gpu多粒度并行及协同优化的方法
US9727338B2 (en) System and method for translating program functions for correct handling of local-scope variables and computing system incorporating the same
US10496404B2 (en) Data read-write scheduler and reservation station for vector operations
CN110888727B (zh) 并发无锁队列实现方法、装置及存储介质
US10261796B2 (en) Processor and method for executing in-memory copy instructions indicating on-chip or off-chip memory
US11650754B2 (en) Data accessing method, device, and storage medium
CN110223216B (zh) 一种基于并行plb的数据处理方法、装置及计算机存储介质
CN115033184A (zh) 访存处理装置、方法、处理器、芯片、板卡及电子设备
US10599586B2 (en) Information processing apparatus, memory control circuitry, and control method of information processing apparatus
WO2022134426A1 (zh) 可重构处理器中的指令分发方法、系统以及存储介质
CN112506823A (zh) 一种fpga数据读写方法、装置、设备及可读存储介质
US20160379336A1 (en) Methods of a graphics-processing unit for tile-based rendering of a display area and graphics-processing apparatus
US11748099B2 (en) Method for executing instructions, device, and computer readable storage medium
US10409610B2 (en) Method and apparatus for inter-lane thread migration
CN117215491A (zh) 一种快速数据访问方法、快速数据访问装置及光模块
CN115858417A (zh) 缓存数据处理方法、装置、设备及存储介质
CN115905040A (zh) 计数器的处理方法、图形处理器、设备及存储介质
US20220188380A1 (en) Data processing method and apparatus applied to graphics processing unit, and electronic device
CN109727187A (zh) 用于调整多个感兴趣区域数据的存储位置的方法和装置
CN115269199A (zh) 数据处理方法、装置、电子设备及计算机可读存储介质
US11977894B2 (en) Method and system for distributing instructions in reconfigurable processor and storage medium
WO2016201699A1 (zh) 指令处理方法及设备
CN112463218A (zh) 指令发射控制方法及电路、数据处理方法及电路
US20130166887A1 (en) Data processing apparatus and data processing method
US9583158B2 (en) Method of managing requests for access to memories and data storage system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21908437

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21908437

Country of ref document: EP

Kind code of ref document: A1