CN111061510A - Extensible ASIP structure platform and instruction processing method - Google Patents

Extensible ASIP structure platform and instruction processing method Download PDF

Info

Publication number
CN111061510A
CN111061510A CN201911289054.3A CN201911289054A CN111061510A CN 111061510 A CN111061510 A CN 111061510A CN 201911289054 A CN201911289054 A CN 201911289054A CN 111061510 A CN111061510 A CN 111061510A
Authority
CN
China
Prior art keywords
instruction
cluster
packet
execution
emu
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911289054.3A
Other languages
Chinese (zh)
Other versions
CN111061510B (en
Inventor
陈虎
万江华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan Guliang Microelectronics Co ltd
Original Assignee
Hunan Guliang Microelectronics Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan Guliang Microelectronics Co ltd filed Critical Hunan Guliang Microelectronics Co ltd
Priority to CN201911289054.3A priority Critical patent/CN111061510B/en
Publication of CN111061510A publication Critical patent/CN111061510A/en
Application granted granted Critical
Publication of CN111061510B publication Critical patent/CN111061510B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/82Architectures of general purpose stored program computers data or demand driven

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Advance Control (AREA)

Abstract

An extensible ASIP structure platform and instruction processing method, the platform has pipeline of the instruction execution logic, the instruction execution logic includes n clusters, wherein the cluster 0- (n-2) is the conventional cluster, used for realizing the conventional instruction, each cluster includes 2 arithmetic logic units ALU and a distributed register file RF of 4 reading 2 writing; cluster n-1 is an extended cluster, used to implement a special extended instruction, comprising an ERF with 6 read 2 write ports up to 32 general purpose registers, an EFU with 6 inputs and 2 outputs, and an arithmetic logic unit ALU; the clusters comprise an outflow control network unit ICN and an operand transmission network unit OPN; the execution management unit EMU coupled to each ALU or EFU, the register management unit RMU coupled to each RF or ERF, constitute the hardware of the instruction execution control mechanism. The method is based on the platform to complete the processing of the instruction. The invention has the advantages of easy realization, capability of improving the expansibility of the special instruction set processor and the like.

Description

Extensible ASIP structure platform and instruction processing method
Technical Field
The invention mainly relates to the technical field of processors, in particular to an extensible ASIP structure platform and an instruction processing method, which can improve the expansibility of a special instruction set processor.
Background
In a single-core processor, partitioning resources into multiple clusters (clusters) may eliminate or reduce the constraints of Register File (RF) access ports and centrally controlled instruction flow-out logic on the scalability of the processor. Each cluster typically includes one or more Functional units (Functional units), a register file, and inter-cluster communication is performed via shared storage (register file/cache, etc.) or a Crossbar switch (Crossbar). The Program Counter (PC) and decode logic may be shared by multiple clusters or may be distributed in each cluster.
An Application Specific Instruction set processor (ASIP) can effectively improve the performance of a processor by adding a special extended Instruction. The clustered uniprocessor has good resource expandability and software programmability, so that the clustered uniprocessor is suitable for being used as a basic platform to construct ASIP for data computation intensive applications (if video processing/wireless communication/network data processing and the like). But due to the limitations of the instruction word size (typically 16/32/64 bits), the operands of the instruction (typically no more than 3), the register resources available in clustered uniprocessors are very limited (typically 16/32/64), and dedicated extended instructions with more than 4 operands cannot be supported.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: aiming at the technical problems in the prior art, the invention provides an extensible ASIP structure platform and an instruction processing method which are easy to implement and can improve the expansibility of a special instruction set processor.
In order to solve the technical problems, the invention adopts the following technical scheme:
an extensible ASIP architecture platform having a pipeline of instruction execution logic, said instruction execution logic comprising n clusters, where cluster 0- (n-2) is a regular cluster, for implementing regular instructions, each cluster containing 2 arithmetic logic units ALU and a 4 read 2 write distributed register file RF; the cluster n-1 is an extended cluster and is used for realizing a special extended instruction, and comprises an extended register file ERF with 6 read ports and 2 write ports and at most 32 general registers, an extended functional unit EFU with 6 input ports and 2 output ports and an arithmetic logic unit ALU, wherein the arithmetic logic unit ALU is responsible for providing operands for the extended functional unit EFU and only executes a memory read/write instruction;
the clusters comprise an outflow control network unit ICN and an operand delivery network unit OPN;
the execution management unit EMU coupled to each ALU or EFU, the register management unit RMU coupled to each RF or ERF, constitute the hardware of the instruction execution control mechanism.
As a further improvement of the invention: within the clusters, each of the clusters 0- (n-1) includes: a cluster of instruction buffer units CIB, one or more execution management units EMU, a register management unit RMU. The cluster instruction buffer unit CIB is used for receiving an instruction packet from the instruction flow-out logic and accommodating one or more instruction packets; the execution management unit EMU comprises an instruction buffer, an output buffer, a data correlation table DDT and an operand management logic; the register management unit RMU comprises a register access table RAT and an execution management unit EMU request handling logic.
As a further improvement of the invention: in the outflow stage of the instruction execution process, if a certain source operand of an instruction is generated by an instruction in a certain instruction packet which has already flowed out from other clusters, the cluster is informed to update the register access table RAT of the cluster through an outflow control network unit ICN; the information passed on the egress control network element ICN includes: cluster number, register index, location or index of the producer of the source operand in the cluster instruction packet.
In the fetching stage of the instruction execution process, if the register type source operand of the instruction is not ready and the operand is located in the register file of other clusters, the execution management unit EMU operand transmission network OPN sends out a fetching request; if the data is ready, the operand transfer network OPN returns the data requested by the EMU; if the data is not ready, the EMU waits until the OPN returns the data; the information passed over the operand pass-through network OPN includes: cluster number, register index, location or index of the producer of the source operand in the cluster instruction packet, operand data returned by the OPN.
As a further improvement of the invention: if there are m clusters in the processor, there are a total of k instructions in the m clusters that have been issued but not yet committed; each register access table RF or extended register file ERF is assumed to contain r local registers, and the instruction buffer of each execution management unit EMU has i entries, i.e. each functional unit has at most i instructions being executed or not being executed; meanwhile, an instruction is provided with s input operands and d output operands at most.
As a further improvement of the invention: the RAT has r entries, each entry corresponding to a register in the local register file; each RAT item comprises k (0-k-1) read mark fields and k (0-k-1) write mark fields; the sequence of the reading mark field and the sequence of the writing mark field represent the reading and writing access sequence of k outgoing but uncommitted instructions in m clusters to the register; each of the read tag field and the write tag field includes a cluster number, an intra-cluster instruction number, corresponding to a location of an instruction in an instruction buffer in the EMU.
As a further improvement of the invention: the DDT has/entries, each entry corresponding to an instruction in the EMU's instruction buffer that is or has not been executed; each DDT entry comprises s input waiting bits and d output waiting bits, and when the input/output waiting bits are 1, the corresponding input/output operand is not obtained/written back; meanwhile, each DDT item comprises s input related fields and d output related fields; the input related field and the output related field indicate that a producer of an input operand of the instruction is a certain instruction of a certain cluster, and a consumer of an output operand is a certain instruction of a certain cluster; each of the input correlation field and the output correlation field includes a cluster number, an intra-cluster instruction number, corresponding to a location of an instruction in an instruction buffer in the EMU.
As a further improvement of the invention: the instruction set of the ASIP architecture platform includes 4 types:
alu indicates the original instruction of OR1200, and l.extd indicates the special extended instruction added by CASIP in OR1200, which contains at most 6 input operands and 2 output operands; pkgh is a header instruction of the instruction packet, contains common information required by the instructions in the instruction packet, and requires 4-byte alignment on a program address; an l.oprd is a service instruction that provides additional operands for l.extd class instructions with operands greater than 3, followed at program address by the l.extd instruction it services.
As a further improvement of the invention: the parallelism of an instruction fetching unit in the ASIP structure platform depends on the bit width of an instruction fetching logic and storage system interface; the parallelism of the instruction decoding unit is flexibly set according to the maximum length allowed by the instruction packet; the instruction flow-out unit is performed in units of instruction packets, and the instruction flow-out unit does not have a fixed flow-out width.
As a further improvement of the invention: the ASIP structure platform converts a first-level instruction outflow mode of centralized control in a single-core processor into 2-level distributed control instruction outflow, and converts centralized execution control into control distributed in each cluster and independently controlled by a special component in each cluster; wherein the level 2 instruction outflow comprises: the first level instruction packet flows out, namely, one instruction packet is flowed out to an instruction buffer in one cluster at a time, the correlation of the instructions is not checked, and the second level instruction flows out, namely, each cluster flows out the instructions in the instruction packet buffered respectively according to the instruction execution condition in the cluster.
The invention further provides an instruction processing method based on the extensible ASIP structure platform, which comprises the following steps:
step S1, fetching the finger; the instruction fetching management part fetches an instruction packet from the instruction cache at one time and buffers the instruction packet in an instruction packet buffer in instruction fetching management;
step S2, decoding; decoding the instruction packet, namely extracting a cluster number CI in a packet head instruction from the instruction packet; when the instruction flows out, determining which cluster the instruction packet is dispatched to for execution according to the cluster number CI;
step S3, outflow; the method comprises the following steps of: a) the instruction packet flows out; according to the cluster number CI in the header command, the command packet outflow control logic firstly checks whether a cluster command buffer CIB of a corresponding cluster is idle to contain a complete command packet; if yes, extracting an instruction packet in the instruction packet buffer of the instruction fetch management component, and storing the instruction packet in the CIB; b) instruction outflow; firstly, checking whether computing resources in a cluster are available; if the instruction is available, the instruction of the instruction which is not allocated with the computing resource in the CIB is flowed out to the computing resource in sequence, and a data correlation table DDT and a register access table RAT in the EMU and the RMU in the cluster are updated; if a certain source operand of the instruction is generated by the instruction in a certain instruction packet which flows out from other clusters, the cluster is informed to update the register access table RAT of the cluster through an outflow control network ICN; the outgoing instructions are backed up into the EMU's instruction buffer;
step S4, fetching; the issued instruction has allocated a computational resource (ALU or EFU);
step S5, execution;
step S6 write back.
Compared with the prior art, the invention has the advantages that:
the extensible ASIP structure platform and the instruction processing method have simple principle and easy realization, and can not increase the software and hardware expenses of the processor obviously:
1) the scalability of the processor is improved in an easy-to-program environment of a single processor. The CASSIP structure can be flexibly expanded to 32 clusters at most, and the whole system can contain 1024 general registers at most.
2) And providing support in software and hardware for application-oriented special extended instructions. The instruction set architecture of CASIP can represent up to 6 input operands and 2 output operands for application-oriented application-specific extended instructions in a 32-bit fixed-length instruction word.
Drawings
Fig. 1 is a schematic diagram of a pipeline structure of the CASIP structure of the present invention.
Fig. 2 is a diagram illustrating the structure of RAT and DDT and its bit allocation in a specific application example of the present invention.
Fig. 3 is a schematic diagram of the instruction type and instruction encoding of CASIP in an embodiment of the present invention.
FIG. 4 is a diagram illustrating the state transition of an instruction during its execution cycle in an embodiment of the present invention.
FIG. 5 is a flow chart illustrating the execution of an instruction according to an embodiment of the present invention.
Detailed Description
The invention will be described in further detail below with reference to the drawings and specific examples.
The invention relates to an extensible Application specific instruction-set Processor (CASSIP) which is a Clustered Processor constructed based on OR 1200. As shown in fig. 1, a block diagram of the CASIP architecture is shown. The CASSIP of the invention has a 6-stage integer pipeline, namely an instruction fetching unit, a decoding unit, an outflow unit, an access unit, an execution unit and a write-back unit. The parallelism of the fetch unit depends on the bit width of the fetch logic and memory system interface. The parallelism of the instruction decoding unit can be flexibly set according to the maximum length allowed by the instruction packet. Since the instruction flow-out unit is performed in units of instruction packets, the instruction flow-out unit does not have a fixed flow-out width. The CASSIP of the invention has the characteristics of sequential instruction fetching, sequential decoding, parallel outflow (inter-cluster), parallel execution and parallel write-back.
The invention provides an extensible ASIP structure platform CASSIP, which comprises 2 key mechanisms:
1) changing the execution resource allocation mode of the instruction and the encoding mode of an instruction word with a fixed word length (such as 32 bits) so as to support a special extended instruction with multiple operands (register type);
2) the instruction outflow and execution control mechanism is changed, a first-level instruction outflow mode of centralized control in the single-core processor is converted into 2-level distributed control instruction outflow, and the centralized execution control is converted into the instruction outflow distributed to each cluster and individually controlled by a special component in each cluster, so that the limitation of the instruction outflow and execution control on the expandability of the processor is eliminated or weakened. Wherein the level 2 instruction outflow comprises: first level instruction packet outflow, i.e., one instruction packet at a time is flowed out into an instruction buffer in one cluster (without checking for instruction dependency), and second level instruction outflow, i.e., each cluster flows out of instructions in the respective buffered instruction packet according to the instruction execution conditions in the cluster.
To this end, the CASIP of the present invention can be implemented without significantly increasing the software and hardware overhead of the processor:
1) the scalability of the processor is improved in an easy-to-program environment of a single processor. The CASSIP structure can be flexibly expanded to 32 clusters at most, and the whole system can contain 1024 general registers at most.
2) And providing support in software and hardware for application-oriented special extended instructions. The instruction set architecture of CASIP can represent up to 6 input operands and 2 output operands for application-oriented application-specific extended instructions in a 32-bit fixed-length instruction word.
In a specific application example, the instruction execution logic of CASIP is composed of n clusters. Wherein clusters 0-n-2 are conventional clusters, implementing add/subtract/multiply/shift/etc conventional instructions, each cluster containing 2 Arithmetic Logic Units (ALUs) and a 4 read 2 write distributed Register File (RF); cluster n-1 is an Extended cluster, which can implement a special Extended instruction, including an Extended Register File (ERF) with 6 read 2 write ports and up to 32 general purpose registers, an Extended Functional Unit (EFU) with 6 inputs and 2 outputs, and an ALU (responsible for providing operands for the EFU, executing only memory read/write instructions).
In a specific application example, each cluster of clusters 0-n-1 comprises, from the perspective of the resource within the cluster: a Cluster Instruction Buffer Unit (CIB), one or more Execution Management Units (EMU), and a Register Management Unit (RMU). The cluster instruction buffer unit CIB is used for receiving the instruction packet from the instruction flow-out logic and accommodating one or more instruction packets; the execution management unit EMU includes an instruction buffer, an output buffer, a Data Dependency Table (DDT), and an operand management logic; the Register management unit RMU includes a Register Access Table (RAT) and an execution management unit EMU request processing logic.
In a specific application example, from the viewpoint of inter-cluster resources, the method comprises the following steps: an egress control Network element (ICN), and an Operand delivery Network element (OPN).
In the egress phase of the instruction execution process (see fig. 5), if a certain source operand of an instruction (Src0-Src5) is generated by an instruction in a certain instruction packet that has already been streamed out in another cluster, the cluster is notified by the egress control network element ICN to update its register access table RAT. The information passed on the egress control network element ICN includes: cluster number, register index, location or index of the producer of the source operand in the cluster instruction packet.
In the fetch stage of the instruction execution process (see fig. 5), if the register type source operand of the instruction is not ready (i.e. the input wait bit in the DDT entry corresponding to the instruction has a non-0 value), and the operand is located in the register file of other cluster, the execution management unit EMU issues a fetch request through the operand transfer network OPN. If the data is ready, the operand transfer network OPN returns the data requested by the EMU; if the data is not ready, the EMU waits until the OPN returns the data. The information passed over the operand pass-through network OPN includes: cluster number, register index, location or index of the producer of the source operand in the cluster instruction packet, operand data returned by the OPN.
In a specific application example, from instruction operand management, the execution management unit EMU coupled to each ALU or EFU, and the register management unit RMU coupled to each RF or ERF constitute the hardware support of the instruction execution control mechanism of CASIP. 1) The DDT in the EMU is used to preserve data dependencies between instructions dispatched to the functional unit and other instructions. Operand management logic in the EMU issues read requests to the memory or register file to obtain data at the beginning of instruction execution and register or memory write requests to write back data for data in the output buffer at the end of instruction execution for instructions that have flowed but not yet executed in the instruction buffer, based on information held in the DDT. 2) The register access table RAT in each RMU records the order of access to the registers for the registers in the RF or ERF by instructions mapped to different functional units. The RMU decides whether to respond to read and write requests from an EMU according to the information in the RAT. 3) For each cluster, the DDT in the EMU and the RAT in the RMU need to be updated both after each instruction is issued and after data write back.
In a specific application example, the RAT and DDT structures are shown in fig. 2. Let there be m clusters in the processor, with a total of k instructions in the m clusters that have been issued but not yet committed. Let r local registers be included in each RF or ERF, and there are l entries in the instruction buffer of each EMU, i.e., there are at most l instructions being executed or not yet executed per functional unit. Meanwhile, an instruction is provided with s input operands and d output operands at most.
In FIG. 2, the RAT has r entries, each entry corresponding to a register in the local register file. Each RAT entry contains k (0-k-1) read flag fields and k (0-k-1) write flag fields. The sequence of the read flag fields and the sequence of the write flag fields indicate the read and write access sequence of the register by k outgoing but uncommitted instructions in the m clusters. Each of the read tag field and the write tag field includes a cluster number, an intra-cluster instruction number (corresponding to the location of an instruction in an instruction buffer in the EMU).
In FIG. 2, a DDT has/entries, each entry corresponding to an instruction in the EMU's instruction buffer that is executing or not yet executing. Each DDT entry includes s input wait bits and d output wait bits, where an input/output wait bit of 1 indicates that the corresponding input/output operand has not been fetched/written back. With each DDT entry containing s input correlation fields and d output correlation fields. The input dependency field and the output dependency field indicate that the producer of the input operand of the instruction is an instruction of a certain cluster and the consumer of the output operand is an instruction of a certain cluster. Each of the input and output dependency fields includes a cluster number, an intra-cluster instruction number (corresponding to the location of an instruction in an instruction buffer in the EMU).
In a specific application example, the instruction set of CASIP is an OR1200 based instruction set. The CASIP instruction set includes 4 types (see fig. 3): alu generally refers to the original instruction of OR1200, and extd generally refers to the special extended instruction added by CASIP in OR1200, which may include up to 6 input operands and 2 output operands; pkgh is a header instruction of the instruction packet, contains common information required by the instructions in the instruction packet, and requires 4-byte alignment on a program address; an l.oprd is a service instruction that provides additional operands for l.extd class instructions with operands greater than 3, followed at program address by the l.extd instruction it services.
The instruction packet refers to a group of instruction combinations which are allocated to one cluster at a time for execution, and consists of a header instruction l.pkgh and one or more other instructions. The instruction packet in the invention satisfies the following constraints: 1) in an instruction packet, input operands of at most 3 instructions are allowed to come from registers in other clusters; 2) an instruction that allows at most one input operand from a register in another cluster; 3) the length of a packet (the number of instructions included in one packet) is 16 at the maximum.
Pkgh Instruction defines its assigned Cluster number (CI), Instruction Packet Length (PL), position or Index in its Instruction packet (DII) for instructions in other Cluster Instruction packets that are related to the Instruction presence data within the packet. DCI (dependent Cluster index) indicates the Cluster index where the Operand from the out-of-Cluster register is located, and DOI (dependent Operand index) indicates which Operand of the instruction needs to be obtained from outside the Cluster. DOI, DCI, DII contain all the information when obtaining operands from outside the cluster. The DII may be a copy of Src0-Src 5. R is a reserved bit.
In a specific application example, the instruction execution control mechanism is: in a CASIP processor, the execution of an instruction goes through 6 steps, namely instruction fetch, decode, stream, fetch, execute, and write back. Wherein, the instruction fetching, decoding and flowing-out states are centralized control processes in the instruction execution process; the fetching, executing and writing-back states are distributed control processes in the instruction executing process. For simplicity, some default conditions are omitted from the state transition conditions in the figures.
In a specific application example, referring to fig. 5, the present invention further provides an instruction processing method based on an extensible ASIP architecture platform, where an execution flow of an instruction in the specific application example is as follows:
and step S1, fetching the finger.
The instruction fetch management unit in fig. 1 fetches one instruction packet at a time from the instruction cache, and buffers the instruction packet in the instruction packet buffer in the instruction fetch management. In the invention, the number of the instruction packet buffers is 4(0-3), namely, at most 4 instruction packets (containing a packet header instruction l.pkgh) are buffered, and the first instruction stored in each instruction packet buffer is the packet header instruction l.pkgh. The parallelism of the fetch depends on the width of the interface between the fetch management unit and the instruction Cache (128 bits in this invention). In the invention, the width of an interface between the instruction fetch management component and the instruction Cache is 128 bits, and at most 4 32-bit instructions can be fetched in each clock cycle.
And step S2, decoding.
The step is instruction packet decoding, namely extracting the cluster number CI in the packet header instruction l.pkgh for the instruction packets in the instruction packet buffer 0-3 in the instruction fetch management component in fig. 1. And when the instruction flows out, determining the cluster to which the instruction packet is dispatched according to the cluster number CI.
Step S3, flow out.
The instruction flow-out process is divided into 2 stages: a) the instruction packet flows out. The instruction packet outflow control logic checks whether the cluster instruction buffer CIB of the corresponding cluster is free to contain a complete instruction packet according to the cluster number CI in the packet header instruction l.pkgh. If yes, extracting one instruction packet in the instruction packet buffer of the instruction fetch management component, and storing the instruction packet in the CIB. b) The instruction flows out. The instruction outflow logic in FIG. 1 first checks whether the intra-cluster computational resources (ALUs or EFUs) are available. And if the instruction is available, the instruction of which the computing resource is not allocated by the instruction in the CIB is sequentially flowed out (or dispatched) to the computing resource, and the data correlation table DDT and the register access table RAT in the EMU and the RMU in the cluster are updated. If a certain source operand of an instruction (Src0-Src5) results from an instruction in a certain instruction packet that has been streamed out in another cluster, the cluster is notified via the egress control network ICN to update its register access table RAT. The outgoing instructions are backed up into the EMU's instruction buffer.
And step S4, taking the number.
The outgoing instruction has allocated computational resources (ALU or EFU). The execution management unit EMU corresponding to the computing resource is responsible for obtaining the source operand of the instruction. If the instruction active operand is not ready (i.e. the input wait bit in the DDT entry corresponding to the instruction has a value other than 0), the a) operand is register type. The EMU issues fetch requests to the RMU (if the data is located in the register file of this cluster) or to the operand delivery network OPN (if the data is located in the register file of the other cluster). If the data is ready, the RMU or the OPN returns the data requested by the EMU; if the data is not ready, the EMU waits until either the RMU or the OPN returns the data. The EMU may generate multiple requests to the RMU or OPN to obtain multiple operands simultaneously. b) The operands are of the memory type. The fetch process is the same as the write back process of a conventional RISC processor.
And step S5, execution.
The instruction execution of CASIP is similar to that of a conventional RISC processor. The EMU notifies a computational resource (ALU or EFU) to initiate instruction execution when all source operands of the instruction are ready. For an arithmetic logic instruction, performing arithmetic logic operation on a source operand by an execution process; for branch instructions, the execution process calculates the target address of the branch and stores the target address in a branch target register; for a store instruction, the execution process calculates an effective address to access the memory and issues memory read and write requests to the memory. The execution time of an instruction depends on the type of the particular instruction. The branch instructions and the arithmetic logic instructions other than the multiply instructions are each executed for 1 clock cycle. The execution time of the special extended instruction is 1-3 cycles. The execution time of the memory write instruction is also 1 clock cycle; the execution time of the memory read instruction is not fixed. At the end of the instruction execution cycle, the execution results are written to an output buffer in the EMU.
Step S6 write back.
If there are execution results in the output buffer that are not written back, the EMU first checks the DDT to determine that a write-back operation is allowed, i.e., that all read and write operations performed on the destination operands Dest0/Dest1 prior to the write-back operation are complete), and then generates a write request to memory or to the RF/ERF. The RMU decides whether to respond to a write request from the EMU based on the order of access to registers maintained in the RAT. For memory write instructions, the write back process is the same as that of a conventional RISC processor. The write-back stage has an important task of clearing the record of the written-back instruction in the pipeline, including: a) clearing the record of the instruction in the EMU; b) clearing the record of the instruction in the cluster instruction buffer CIB; c) and clearing the relevant information of the instruction in the RMU.
The above is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above-mentioned embodiments, and all technical solutions belonging to the idea of the present invention belong to the protection scope of the present invention. It should be noted that modifications and embellishments within the scope of the invention may be made by those skilled in the art without departing from the principle of the invention.

Claims (10)

1. An extensible ASIP architecture platform having a pipeline of instruction execution logic,
the instruction execution logic comprises n clusters, wherein cluster 0- (n-2) is a regular cluster for implementing regular instructions, each cluster containing 2 arithmetic logic units ALU and a 4 read 2 write distributed register file RF; the cluster n-1 is an extended cluster and is used for realizing a special extended instruction, and comprises an extended register file ERF with 6 read ports and 2 write ports and at most 32 general registers, an extended functional unit EFU with 6 input ports and 2 output ports and an arithmetic logic unit ALU, wherein the arithmetic logic unit ALU is responsible for providing operands for the extended functional unit EFU and only executes a memory read/write instruction;
the clusters comprise an outflow control network unit ICN and an operand delivery network unit OPN;
the execution management unit EMU coupled to each ALU or EFU, the register management unit RMU coupled to each RF or ERF, constitute the hardware of the instruction execution control mechanism.
2. The scalable ASIP fabric platform of claim 1, wherein each of clusters 0- (n-1) comprises, within the cluster: a cluster of instruction buffer units CIB, one or more execution management units EMU, a register management unit RMU. The cluster instruction buffer unit CIB is used for receiving an instruction packet from the instruction flow-out logic and accommodating one or more instruction packets; the execution management unit EMU comprises an instruction buffer, an output buffer, a data correlation table DDT and an operand management logic; the register management unit RMU comprises a register access table RAT and an execution management unit EMU request handling logic.
3. The extensible ASIP fabric platform as claimed in claim 1, wherein in an egress phase of an instruction execution process, if a source operand of an instruction is generated by an instruction in an instruction packet that has already been issued in another cluster, the egress control network element ICN notifies the cluster to update its register access table RAT; the information passed on the egress control network element ICN includes: cluster number, register index, location or index of the producer of the source operand in the cluster instruction packet;
in the fetching stage of the instruction execution process, if the register type source operand of the instruction is not ready and the operand is located in the register file of other clusters, the execution management unit EMU operand transmission network OPN sends out a fetching request; if the data is ready, the operand transfer network OPN returns the data requested by the EMU; if the data is not ready, the EMU waits until the OPN returns the data; the information passed over the operand pass-through network OPN includes: cluster number, register index, location or index of the producer of the source operand in the cluster instruction packet, operand data returned by the OPN.
4. The scalable ASIP fabric platform of claim 3, wherein if there are m clusters in the processor, there are a total of k instructions in the m clusters that have been issued but not yet committed; each register access table RF or extended register file ERF is assumed to contain r local registers, and the instruction buffer of each execution management unit EMU has i entries, i.e. each functional unit has at most i instructions being executed or not being executed; meanwhile, an instruction is provided with s input operands and d output operands at most.
5. The extensible ASIP fabric platform as recited in claim 4, wherein a RAT has r entries, each entry corresponding to a register in the local register file; each RAT item comprises k (0-k-1) read mark fields and k (0-k-1) write mark fields; the sequence of the reading mark field and the sequence of the writing mark field represent the reading and writing access sequence of k outgoing but uncommitted instructions in m clusters to the register; each of the read tag field and the write tag field includes a cluster number, an intra-cluster instruction number, corresponding to a location of an instruction in an instruction buffer in the EMU.
6. The extensible ASIP fabric platform according to claim 4, wherein the DDT has/entries, each entry corresponding to an instruction in the EMU's instruction buffer that is or has not been executed; each DDT entry comprises s input waiting bits and d output waiting bits, and when the input/output waiting bits are 1, the corresponding input/output operand is not obtained/written back; meanwhile, each DDT item comprises s input related fields and d output related fields; the input related field and the output related field indicate that a producer of an input operand of the instruction is a certain instruction of a certain cluster, and a consumer of an output operand is a certain instruction of a certain cluster; each of the input correlation field and the output correlation field includes a cluster number, an intra-cluster instruction number, corresponding to a location of an instruction in an instruction buffer in the EMU.
7. The extensible ASIP fabric platform according to claim 4, wherein the set of instructions for the ASIP fabric platform includes 4 types:
alu indicates the original instruction of OR1200, and l.extd indicates the special extended instruction added by CASIP in OR1200, which contains at most 6 input operands and 2 output operands; pkgh is a header instruction of the instruction packet, contains common information required by the instructions in the instruction packet, and requires 4-byte alignment on a program address; an l.oprd is a service instruction that provides additional operands for l.extd class instructions with operands greater than 3, followed at program address by the l.extd instruction it services.
8. The scalable ASIP fabric platform of any of claims 1-7, wherein a parallelism of an instruction fetch unit in the ASIP fabric platform is dependent on a bit width of an instruction fetch logic and storage system interface; the parallelism of the instruction decoding unit is flexibly set according to the maximum length allowed by the instruction packet; the instruction flow-out unit is performed in units of instruction packets, and the instruction flow-out unit does not have a fixed flow-out width.
9. The scalable ASIP fabric platform according to any one of claims 1 to 7, wherein the ASIP fabric platform converts a one-level instruction outflow manner of centralized control in a single-core processor into an instruction outflow manner of 2-level distributed control, converts centralized execution control into distribution into each cluster and is individually controlled by a dedicated component in each cluster; wherein the level 2 instruction outflow comprises: the first level instruction packet flows out, namely, one instruction packet is flowed out to an instruction buffer in one cluster at a time, the correlation of the instructions is not checked, and the second level instruction flows out, namely, each cluster flows out the instructions in the instruction packet buffered respectively according to the instruction execution condition in the cluster.
10. An instruction processing method based on the extensible ASIP architecture platform of any one of claims 1 to 9, comprising:
step S1, fetching the finger; the instruction fetching management part fetches an instruction packet from the instruction cache at one time and buffers the instruction packet in an instruction packet buffer in instruction fetching management;
step S2, decoding; decoding the instruction packet, namely extracting a cluster number CI in a packet head instruction from the instruction packet; when the instruction flows out, determining which cluster the instruction packet is dispatched to for execution according to the cluster number CI;
step S3, outflow; the method comprises the following steps of: a) the instruction packet flows out; according to the cluster number CI in the header command, the command packet outflow control logic firstly checks whether a cluster command buffer CIB of a corresponding cluster is idle to contain a complete command packet; if yes, extracting an instruction packet in the instruction packet buffer of the instruction fetch management component, and storing the instruction packet in the CIB; b) instruction outflow; firstly, checking whether computing resources in a cluster are available; if the instruction is available, the instruction of the instruction which is not allocated with the computing resource in the CIB is flowed out to the computing resource in sequence, and a data correlation table DDT and a register access table RAT in the EMU and the RMU in the cluster are updated; if a certain source operand of the instruction is generated by the instruction in a certain instruction packet which flows out from other clusters, the cluster is informed to update the register access table RAT of the cluster through an outflow control network ICN; the outgoing instructions are backed up into the EMU's instruction buffer;
step S4, fetching; the issued instruction has allocated a computational resource (ALU or EFU);
step S5, execution;
step S6 write back.
CN201911289054.3A 2019-12-12 2019-12-12 Extensible ASIP structure platform and instruction processing method Active CN111061510B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911289054.3A CN111061510B (en) 2019-12-12 2019-12-12 Extensible ASIP structure platform and instruction processing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911289054.3A CN111061510B (en) 2019-12-12 2019-12-12 Extensible ASIP structure platform and instruction processing method

Publications (2)

Publication Number Publication Date
CN111061510A true CN111061510A (en) 2020-04-24
CN111061510B CN111061510B (en) 2021-01-05

Family

ID=70301550

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911289054.3A Active CN111061510B (en) 2019-12-12 2019-12-12 Extensible ASIP structure platform and instruction processing method

Country Status (1)

Country Link
CN (1) CN111061510B (en)

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6442680B1 (en) * 1999-01-29 2002-08-27 International Business Machines Corporation Method and system for compressing reduced instruction set computer (RISC) executable code
CN1477520A (en) * 2002-08-21 2004-02-25 先进数字芯片株式会社 Central processor with extended instruction
WO2004016768A2 (en) * 2002-08-19 2004-02-26 Dnaprint Genomics, Inc. Compositions and methods for inferring ancestry
US20060095710A1 (en) * 2002-12-30 2006-05-04 Koninklijke Philips Electronics N.V. Clustered ilp processor and a method for accessing a bus in a clustered ilp processor
EP1771792A4 (en) * 2004-06-08 2008-12-17 Univ Rochester Dynamically managing the communication-parallelism trade-off in clustered processors
WO2012068504A2 (en) * 2010-11-18 2012-05-24 Texas Instruments Incorporated Method and apparatus for moving data
CN103229113A (en) * 2010-09-29 2013-07-31 数学工程公司 Interactive system for controlling multiple input multiple output control (mimo) structures
US20140314152A1 (en) * 2006-04-26 2014-10-23 Altera Corporation Methods And Apparatus For Motion Search Refinement In A SIMD Array Processor
CN104331267A (en) * 2013-07-22 2015-02-04 国际商业机器公司 Instruction set architecture with extensible register addressing
CN105009099A (en) * 2013-11-07 2015-10-28 株式会社日立制作所 Computer system and data control method
CN107077327A (en) * 2014-06-30 2017-08-18 微体系统工程有限公司 System and method for expansible wide operand instruction
CN109062684A (en) * 2018-07-04 2018-12-21 南京南大光电工程研究院有限公司 A kind of real-time dynamic self-adapting dynamic load balancing method of release of the hardware of multi-core processor
CN110119375A (en) * 2019-05-16 2019-08-13 湖南毂梁微电子有限公司 A kind of control method that multiple scalar cores are linked as to monokaryon Vector Processing array

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6442680B1 (en) * 1999-01-29 2002-08-27 International Business Machines Corporation Method and system for compressing reduced instruction set computer (RISC) executable code
WO2004016768A2 (en) * 2002-08-19 2004-02-26 Dnaprint Genomics, Inc. Compositions and methods for inferring ancestry
CN1477520A (en) * 2002-08-21 2004-02-25 先进数字芯片株式会社 Central processor with extended instruction
US20060095710A1 (en) * 2002-12-30 2006-05-04 Koninklijke Philips Electronics N.V. Clustered ilp processor and a method for accessing a bus in a clustered ilp processor
EP1771792A4 (en) * 2004-06-08 2008-12-17 Univ Rochester Dynamically managing the communication-parallelism trade-off in clustered processors
US20140314152A1 (en) * 2006-04-26 2014-10-23 Altera Corporation Methods And Apparatus For Motion Search Refinement In A SIMD Array Processor
CN103229113A (en) * 2010-09-29 2013-07-31 数学工程公司 Interactive system for controlling multiple input multiple output control (mimo) structures
WO2012068504A2 (en) * 2010-11-18 2012-05-24 Texas Instruments Incorporated Method and apparatus for moving data
CN104331267A (en) * 2013-07-22 2015-02-04 国际商业机器公司 Instruction set architecture with extensible register addressing
CN105009099A (en) * 2013-11-07 2015-10-28 株式会社日立制作所 Computer system and data control method
CN107077327A (en) * 2014-06-30 2017-08-18 微体系统工程有限公司 System and method for expansible wide operand instruction
CN109062684A (en) * 2018-07-04 2018-12-21 南京南大光电工程研究院有限公司 A kind of real-time dynamic self-adapting dynamic load balancing method of release of the hardware of multi-core processor
CN110119375A (en) * 2019-05-16 2019-08-13 湖南毂梁微电子有限公司 A kind of control method that multiple scalar cores are linked as to monokaryon Vector Processing array

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
王玉林等: "魂芯分簇VLIW DSP上指令调度的优化", 《微型机与应用》 *
陈虎: "面向应用的指令集处理器关键技术研究", 《中国博士学位论文全文数据库(信息科技辑)》 *

Also Published As

Publication number Publication date
CN111061510B (en) 2021-01-05

Similar Documents

Publication Publication Date Title
US11204769B2 (en) Memory fragments for supporting code block execution by using virtual cores instantiated by partitionable engines
US9934072B2 (en) Register file segments for supporting code block execution by using virtual cores instantiated by partitionable engines
US9990200B2 (en) Executing instruction sequence code blocks by using virtual cores instantiated by partitionable engines
KR19990087940A (en) A method and system for fetching noncontiguous instructions in a single clock cycle
WO2005093562A1 (en) Data processing device, data processing program, and recording medium containing the data processing program
KR20100042214A (en) Direct inter-thread communication buffer that supports software controlled arbitrary vector operand selection in a densely threaded network on a chip
US7143268B2 (en) Circuit and method for instruction compression and dispersal in wide-issue processors
CN111061510B (en) Extensible ASIP structure platform and instruction processing method
US11106466B2 (en) Decoupling of conditional branches
US11507372B2 (en) Processing of instructions fetched from memory
KR102170966B1 (en) Apparatus and method for managing reorder buffer of high-performance out-of-order superscalar cores
CN116339489A (en) System, apparatus, and method for throttle fusion of micro-operations in a processor
CN117931729A (en) Vector processor memory access instruction processing method and system
Kang et al. On-chip multiprocessor design
Koranne et al. The Synergistic Processing Element

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant