WO2022199043A1 - Method and system for implementing vsetli instruction in risv_v vector instruction set - Google Patents

Method and system for implementing vsetli instruction in risv_v vector instruction set Download PDF

Info

Publication number
WO2022199043A1
WO2022199043A1 PCT/CN2021/129454 CN2021129454W WO2022199043A1 WO 2022199043 A1 WO2022199043 A1 WO 2022199043A1 CN 2021129454 W CN2021129454 W CN 2021129454W WO 2022199043 A1 WO2022199043 A1 WO 2022199043A1
Authority
WO
WIPO (PCT)
Prior art keywords
instruction
vsetli
vectag
module
vector
Prior art date
Application number
PCT/CN2021/129454
Other languages
French (fr)
Chinese (zh)
Inventor
李长林
张弛
Original Assignee
广东赛昉科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN202110733967.0A external-priority patent/CN113467833A/en
Application filed by 广东赛昉科技有限公司 filed Critical 广东赛昉科技有限公司
Publication of WO2022199043A1 publication Critical patent/WO2022199043A1/en
Priority to US17/981,365 priority Critical patent/US20230068290A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the invention relates to the technical field of CPUs, in particular to a method and a system for implementing a risv_v vector instruction set vsetli instruction.
  • the isc_v instruction set has only recently announced the complete instruction set, and there are basically no implementations that can be referred to at present. In order to achieve simplicity, the easiest way is to refresh the pipeline when the vsetli instruction graduates, and at the same time, the execution unit, regardless of the inactive element part, is sent to the execution unit for execution, resulting in an increase in the execution cycle.
  • the existing vsetli instruction needs to refresh the pipeline when it graduates, resulting in low CPU execution efficiency.
  • the Vector instruction is also executed in the execution unit for the inactive element part.
  • the data is selected by the mask method. In fact, the mask data is originally There is no need to enter the execution unit, which results in a waste of power consumption and an increase in the execution cycle of the instruction.
  • the present invention discloses a method and a system for implementing a risv_v vector instruction set vsetli instruction, which are used to solve the above-mentioned problems.
  • the present invention discloses a method for realizing a risv_v vector instruction set vsetli instruction, comprising the following steps:
  • the S1CPU When the S1CPU executes out of order, it allocates vectag[n:0] information in the rename module to determine whether the instruction is vsetli;
  • S3 is sent to the execution unit, and the vsetli instruction is distributed to the csr module; the corresponding other vector instructions are distributed to the vpu module;
  • the execution of the S5 instruction is completed, and in the ROB module, it graduates in sequence, and the register vectag is updated when it graduates, and the execution ends.
  • the cycle only transmits the vsetli, and each cycle is assigned a vectag, and other instructions wait until the next cycle to transmit.
  • the inactive is transmitted to the execution unit, and 2*ncycles are executed to complete, the inactive is not transmitted to the execution unit, and the execution is completed for n cycles.
  • the instruction vectag in the vpu module reserve station needs to be compared with the register vectag, and only in the case of consistency, the instruction can be transmitted to the execution unit.
  • vectag[n:0] is allocated in the rename as a condition for the vpu instruction to be sent to the execution unit, so that the pipeline is not refreshed when the vsetli instruction is executed.
  • the present invention discloses a system for implementing a risv_v vector instruction set vsetli instruction, the system is used for implementing the risv_v vector instruction set vsetli instruction implementation method described in the first aspect, including a rename module, a dispatch module, a vpu modules and ROB modules.
  • the non-vsetl ⁇ i ⁇ Vector instruction of the present invention only needs to be executed according to the youngest instruction in the vsetl ⁇ i ⁇ older than itself before entering the execution unit, which is much higher than the current refresh pipeline efficiency.
  • the Vector instruction of the present invention is also executed in the execution unit for the part of the inactive element, and finally the data is selected by means of a mask, which can reduce power consumption, and at the same time, can reduce the execution cycle and latency.
  • Fig. 1 is a kind of principle step diagram of the realization method of risv_v vector instruction set vsetli instruction;
  • FIG. 2 is a basic block diagram of an out-of-order CPU according to an embodiment of the present invention.
  • FIG. 3 is a comparison block diagram of unactive emission according to an embodiment of the present invention.
  • the present embodiment discloses a method for realizing a risv_v vector instruction set vsetli instruction as shown in FIG. 1 , including the following steps:
  • the S1CPU When the S1CPU executes out of order, it allocates vectag[n:0] information in the rename module to determine whether the instruction is vsetli;
  • S3 is sent to the execution unit, and the vsetli instruction is distributed to the csr module; the corresponding other vector instructions are distributed to the vpu module;
  • the execution of the S5 instruction is completed, and in the ROB module, it graduates in sequence, and the register vectag is updated when it graduates, and the execution ends.
  • 0-5 instructions are issued in each cycle. If the vsetli instruction is accepted, the cycle only emits vsetli, each cycle is assigned a vectag, and other instructions wait until the next cycle to launch.
  • the inactive is transmitted to the execution unit, and the execution is completed in 2*n cycles, and the inactive is not transmitted to the execution unit, and the execution is completed in n cycles.
  • the instruction vectag in the reserve station of the vpu module needs to be compared with the register vectag, and the instruction can be transmitted to the execution unit only when they are consistent.
  • vectag[n:0] is allocated in rename as a condition for the vpu instruction to be sent to the execution unit, so that the pipeline is not refreshed when the vsetli instruction is executed.
  • the pipeline does not need to be refreshed when the vsetli instruction is graduated, and the inactive element part does not need to be sent to the execution unit for execution, which can reduce power consumption and execution cycle.
  • This embodiment refers to an out-of-order CPU, and its basic block is shown in FIG. 2 .
  • This embodiment discloses a system for implementing a risv_v vector instruction set vsetli instruction, which includes four modules: rename, dispatch, ROB, and vpu.
  • a vectag[n:0] information will be allocated in the rename module. If it is vsetli, vectag+1 will be added, and the vectag of non-vsetli instructions will remain unchanged.
  • the executed instruction can be sent to the execution unit for execution only when the vectag of the instruction in the reserve station is consistent with the vectage broadcasted by the csr.
  • the function of the dispatch module in this embodiment is to distribute instructions to different datapaths according to the type of the instructions, corresponding to the vsetli instruction, distribute them to the csr module; corresponding to other vector instructions, distribute them to the vpu module.
  • Each cycle can issue 5 instructions. If a vsetli instruction is encountered, the cycle will only issue vsetli, and other instructions will wait until the next cycle to issue, so only one vectag needs to be allocated for each cycle.
  • the vector instruction datapath an important condition for the instruction to be transmitted from the reserverstation (reservation station) to the execution unit is that the instruction vectag of this entry needs to be consistent with the vectag broadcasted by the ROB before it can be transmitted to the execution unit.
  • transmitting the inactive to the execution unit requires 2*n cycles to complete. If the inactive is not transmitted to the execution unit, only n cycles are required to complete. It can reduce power consumption while reducing latency.
  • the vectag allocation updates the vectage register, and the timeline table of the conditions that the vector instruction can emit is as follows
  • the non-vsetl ⁇ i ⁇ Vector instruction of the present invention only needs to be executed according to the youngest instruction in the vsetl ⁇ i ⁇ older than itself before entering the execution unit, which is much more efficient than the current refresh pipeline.
  • the refresh pipeline needs to start from a new instruction fetch, instead of refreshing just waiting in the reservation station, and waiting until the youngest instruction in the vsetl ⁇ i ⁇ older than itself is executed before it can be executed in the execution unit.
  • the Vector instruction of the present invention is also executed in the execution unit for the part of the inactive element, and finally the data is selected by means of a mask, which can reduce power consumption, and at the same time, can reduce the execution cycle and latency.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Advance Control (AREA)

Abstract

The present invention relates to the technical field of CPUs, and in particular, to a method and system for implementing a vsetli instruction in a risv_v vector instruction set. The method comprises: when a CPU executes out of order, allocating vectag[n:0] information in a rename module and determine whether an instruction is vsetli; if the instruction is vsetli, adding 1 to vectag, and if the instruction is not a vsetli instruction, keeping the vectag unchanged; transmitting to an execution unit, and distributing the vsetli instruction to a csr module; distributing corresponding other vector instructions to a vpu module; when it is determined that the vectag of the instruction is consistent with vectag broadcasted by ROB, transmitting the instruction from a reserver station to the execution unit; and after execution of the instruction is completed, graduating in sequence in the ROB module, updating vectag of a register at the time of graduation, and ending the execution. In the present invention, the execution efficiency of a non-vsetl{i}Vector instruction is high. Data is selected using a mask mode, and thus, the power consumption is reduced; moreover, the execution period can be reduced, the latency is reduced, and the present invention has a wide market application prospect.

Description

risv_v vector指令集vsetli指令的实现方法及系统Implementation method and system of risv_v vector instruction set vsetli instruction 技术领域technical field
本发明涉及CPU技术领域,具体涉及一种risv_v vector指令集vsetli指令的实现方法及系统。The invention relates to the technical field of CPUs, in particular to a method and a system for implementing a risv_v vector instruction set vsetli instruction.
背景技术Background technique
isc_v指令集才近期公布了完整的指令集,目前可参考的实现方式基本没有。为了实现简单,最简单的办法就是,vsetli指令毕业时就需要刷新流水,同时执行单元不管unactive element部分,都送到执行单元去执行,导致执行周期的增加。The isc_v instruction set has only recently announced the complete instruction set, and there are basically no implementations that can be referred to at present. In order to achieve simplicity, the easiest way is to refresh the pipeline when the vsetli instruction graduates, and at the same time, the execution unit, regardless of the inactive element part, is sent to the execution unit for execution, resulting in an increase in the execution cycle.
现有的vsetli指令毕业时就需要刷新流水,导致CPU执行效率低,Vector指令对unactive element部分也在执行单元中执行了,最后再通过mask的方式来对数据进行选择,其实mask的数据本来是不需要进入执行单元的,这样导致的结果就是让费功耗,同时还增加了指令的执行周期。The existing vsetli instruction needs to refresh the pipeline when it graduates, resulting in low CPU execution efficiency. The Vector instruction is also executed in the execution unit for the inactive element part. Finally, the data is selected by the mask method. In fact, the mask data is originally There is no need to enter the execution unit, which results in a waste of power consumption and an increase in the execution cycle of the instruction.
发明内容SUMMARY OF THE INVENTION
针对现有技术的不足,本发明公开了一种risv_v vector指令集vsetli指令的实现方法及系统,用于解决上述存在的问题。Aiming at the deficiencies of the prior art, the present invention discloses a method and a system for implementing a risv_v vector instruction set vsetli instruction, which are used to solve the above-mentioned problems.
本发明通过以下技术方案予以实现:The present invention is achieved through the following technical solutions:
第一方面,本发明公开了一种risv_v vector指令集vsetli指令的实现方法,包括以下步骤:In the first aspect, the present invention discloses a method for realizing a risv_v vector instruction set vsetli instruction, comprising the following steps:
S1CPU乱序执行时,在rename模块中分配vectag[n:0]信息,判断指令是否为vsetli;When the S1CPU executes out of order, it allocates vectag[n:0] information in the rename module to determine whether the instruction is vsetli;
S2若指令是vsetli,则将vectag+1,若是非vsetli指令,则vectag保持不变;S2 If the instruction is vsetli, vectag+1 will be added, and if it is not a vsetli instruction, the vectag will remain unchanged;
S3发射到执行单元,对vsetli指令分发到csr模块;对应其它的vector指令分发到vpu模块;S3 is sent to the execution unit, and the vsetli instruction is distributed to the csr module; the corresponding other vector instructions are distributed to the vpu module;
S4判断指令vectag与ROB广播的vectag一致时,指令从reserverstation发射到执行单元;When S4 judges that the instruction vectag is consistent with the vectag broadcast by the ROB, the instruction is transmitted from the serverstation to the execution unit;
S5指令执行完成,在ROB模块中,按顺序毕业,并在毕业时更新寄存器vectag,执行结束。The execution of the S5 instruction is completed, and in the ROB module, it graduates in sequence, and the register vectag is updated when it graduates, and the execution ends.
更进一步的,所述方法中,每个cycle发射0-5条指令。Further, in the method, 0-5 instructions are issued in each cycle.
更进一步的,所述方法中,若接受vsetli指令,则该cycle只发射vsetli,每个cycle分配一个vectag,其它指令等到下一cycle再发射。Further, in the method, if the vsetli instruction is accepted, the cycle only transmits the vsetli, and each cycle is assigned a vectag, and other instructions wait until the next cycle to transmit.
更进一步的,所述方法中,将unactive发射到执行单元,执行2*ncycle完成,将unactive不发射到执行单元,执行n cycle完成。Further, in the method, the inactive is transmitted to the execution unit, and 2*ncycles are executed to complete, the inactive is not transmitted to the execution unit, and the execution is completed for n cycles.
更进一步的,所述方法中,vpu模块reserve station中的指令vectag都需要和寄存器vectag比较,只有一致的情况下,该指令才能发射到执行单元。Further, in the described method, the instruction vectag in the vpu module reserve station needs to be compared with the register vectag, and only in the case of consistency, the instruction can be transmitted to the execution unit.
更进一步的,所述方法中,在rename中分配vectag[n:0],作为vpu指令发射到执行单元的一个条件,用以在执行vsetli指令时不刷新流水线。Further, in the method, vectag[n:0] is allocated in the rename as a condition for the vpu instruction to be sent to the execution unit, so that the pipeline is not refreshed when the vsetli instruction is executed.
第二方面,本发明公开了一种risv_v vector指令集vsetli指令的实现系统,所述系统用于执行第一方面所述的risv_v vector指令集vsetli指令的实现方法,包括rename模块、dispatch模块、vpu模块和ROB模块。In a second aspect, the present invention discloses a system for implementing a risv_v vector instruction set vsetli instruction, the system is used for implementing the risv_v vector instruction set vsetli instruction implementation method described in the first aspect, including a rename module, a dispatch module, a vpu modules and ROB modules.
本发明的有益效果为:The beneficial effects of the present invention are:
本发明非vsetl{i}Vector指令只需根据比自己更老的vsetl{i}中最年轻的指令执行完才能进入执行单元,比目前的刷新流水效率高出很多。The non-vsetl{i}Vector instruction of the present invention only needs to be executed according to the youngest instruction in the vsetl{i} older than itself before entering the execution unit, which is much higher than the current refresh pipeline efficiency.
本发明Vector指令对unactive element部分也在执行单元中执行了,最后再通过mask的方式来对数据进行选择,可以减少功耗,同时可以减少执行周期,减少latency。The Vector instruction of the present invention is also executed in the execution unit for the part of the inactive element, and finally the data is selected by means of a mask, which can reduce power consumption, and at the same time, can reduce the execution cycle and latency.
附图说明Description of drawings
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to explain the embodiments of the present invention or the technical solutions in the prior art more clearly, the following briefly introduces the accompanying drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only These are some embodiments of the present invention. For those of ordinary skill in the art, other drawings can also be obtained according to these drawings without creative efforts.
图1是一种risv_v vector指令集vsetli指令的实现方法的原理步骤图;Fig. 1 is a kind of principle step diagram of the realization method of risv_v vector instruction set vsetli instruction;
图2是本发明实施例乱序CPU基本框图;2 is a basic block diagram of an out-of-order CPU according to an embodiment of the present invention;
图3是本发明实施例unactive发射对比框图。FIG. 3 is a comparison block diagram of unactive emission according to an embodiment of the present invention.
具体实施方式Detailed ways
为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。In order to make the purposes, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments These are some embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.
实施例1Example 1
本实施例公开如图1所示的一种risv_v vector指令集vsetli指令的实现方法,包括以下步骤:The present embodiment discloses a method for realizing a risv_v vector instruction set vsetli instruction as shown in FIG. 1 , including the following steps:
S1CPU乱序执行时,在rename模块中分配vectag[n:0]信息,判断指令是否为vsetli;When the S1CPU executes out of order, it allocates vectag[n:0] information in the rename module to determine whether the instruction is vsetli;
S2若指令是vsetli,则将vectag+1,若是非vsetli指令,则vectag保持不变;S2 If the instruction is vsetli, vectag+1 will be added, and if it is not a vsetli instruction, the vectag will remain unchanged;
S3发射到执行单元,对vsetli指令分发到csr模块;对应其它的vector指令分发到vpu模块;S3 is sent to the execution unit, and the vsetli instruction is distributed to the csr module; the corresponding other vector instructions are distributed to the vpu module;
S4判断指令vectag与ROB广播的vectag一致时,指令从reserver  station发射到执行单元;When S4 judges that the instruction vectag is consistent with the vectag broadcast by the ROB, the instruction is transmitted from the server station to the execution unit;
S5指令执行完成,在ROB模块中,按顺序毕业,并在毕业时更新寄存器vectag,执行结束。The execution of the S5 instruction is completed, and in the ROB module, it graduates in sequence, and the register vectag is updated when it graduates, and the execution ends.
本实施例中,每个cycle发射0-5条指令。若接受vsetli指令,则该cycle只发射vsetli,每个cycle分配一个vectag,其它指令等到下一cycle再发射。In this embodiment, 0-5 instructions are issued in each cycle. If the vsetli instruction is accepted, the cycle only emits vsetli, each cycle is assigned a vectag, and other instructions wait until the next cycle to launch.
本实施例中,将unactive发射到执行单元,执行2*n cycle完成,将unactive不发射到执行单元,执行n cycle完成。In this embodiment, the inactive is transmitted to the execution unit, and the execution is completed in 2*n cycles, and the inactive is not transmitted to the execution unit, and the execution is completed in n cycles.
本实施例中,vpu模块reserve station中的指令vectag都需要和寄存器vectag比较,只有一致的情况下,该指令才能发射到执行单元。In this embodiment, the instruction vectag in the reserve station of the vpu module needs to be compared with the register vectag, and the instruction can be transmitted to the execution unit only when they are consistent.
本实施例中,在rename中分配vectag[n:0],作为vpu指令发射到执行单元的一个条件,用以在执行vsetli指令时不刷新流水线。In this embodiment, vectag[n:0] is allocated in rename as a condition for the vpu instruction to be sent to the execution unit, so that the pipeline is not refreshed when the vsetli instruction is executed.
本实施例vsetli指令毕业时不用刷新流水线,unactive element部分不用发射到执行单元去执行,这样可以减少功耗和执行周期。In this embodiment, the pipeline does not need to be refreshed when the vsetli instruction is graduated, and the inactive element part does not need to be sent to the execution unit for execution, which can reduce power consumption and execution cycle.
实施例2Example 2
本实施例参照乱序CPU,其基本框如图2所示,本实施例公开一种risv_v vector指令集vsetli指令的实现系统包括rename、dispatch、ROB和vpu四个模块。This embodiment refers to an out-of-order CPU, and its basic block is shown in FIG. 2 . This embodiment discloses a system for implementing a risv_v vector instruction set vsetli instruction, which includes four modules: rename, dispatch, ROB, and vpu.
本实施例的rename模块,在rename模块中会分配一个vectag[n:0]信息,如果是vsetli,则将vectag+1,非vsetli指令的vectag保持不变,这样做的目的是为了在vpu单元执行的指令,可以通过把reserve station中指令的vectag和csr广播出来的vectage一致的情况下,才可能发射到执行单元中去执行。In the rename module of this embodiment, a vectag[n:0] information will be allocated in the rename module. If it is vsetli, vectag+1 will be added, and the vectag of non-vsetli instructions will remain unchanged. The executed instruction can be sent to the execution unit for execution only when the vectag of the instruction in the reserve station is consistent with the vectage broadcasted by the csr.
本实施例的dispatch模块,功能是按照指令的类型,将指令分发到不同的datapath,对应vsetli指令,则分发到csr模块;对应其它的vector指令,则分发到vpu模块。其中每个cycle可以发射5条指 令,如果遇到vsetli指令,则该cycle只发射vsetli,其它指令等到下一cycle再发射,所以每个cycle只需要分配一个vectag。The function of the dispatch module in this embodiment is to distribute instructions to different datapaths according to the type of the instructions, corresponding to the vsetli instruction, distribute them to the csr module; corresponding to other vector instructions, distribute them to the vpu module. Each cycle can issue 5 instructions. If a vsetli instruction is encountered, the cycle will only issue vsetli, and other instructions will wait until the next cycle to issue, so only one vectag needs to be allocated for each cycle.
本实施例的vpu模块,vector指令datapath,指令从reserverstation(保留站)中发射到执行单元的一个重要的条件就是需要本entry的指令vectag能够和ROB广播出来的vectag一致,才能发射到执行单元。如图3所示将unactive发射到执行单元,需要2*n cycle完成,如果将unactive不发射到执行单元,只需要n cycle完成。可减少latency的同时,减少功耗。In the vpu module of this embodiment, the vector instruction datapath, an important condition for the instruction to be transmitted from the reserverstation (reservation station) to the execution unit is that the instruction vectag of this entry needs to be consistent with the vectag broadcasted by the ROB before it can be transmitted to the execution unit. As shown in Figure 3, transmitting the inactive to the execution unit requires 2*n cycles to complete. If the inactive is not transmitted to the execution unit, only n cycles are required to complete. It can reduce power consumption while reducing latency.
本实施例的ROB模块,每条指令执行完后,都要顺序的毕业,在毕业时,同时更新寄存器vectag。In the ROB module of this embodiment, after each instruction is executed, it must be graduated in sequence, and the register vectag will be updated at the same time when the instruction is graduated.
vectag分配更新vectage寄存器,vector指令可以发射的条件的时间轴表如下The vectag allocation updates the vectage register, and the timeline table of the conditions that the vector instruction can emit is as follows
Figure PCTCN2021129454-appb-000001
Figure PCTCN2021129454-appb-000001
Figure PCTCN2021129454-appb-000002
Figure PCTCN2021129454-appb-000002
综上,本发明非vsetl{i}Vector指令只需根据比自己更老的vsetl{i}中最年轻的指令执行完才能进入执行单元,比起目前的刷新流水效率高多了。刷新流水是需要从新从取指开始的,而不用刷新只是在保留站里等,等到比自己更老的vsetl{i}中最年轻的指令执行完就能到执行单元中执行了。To sum up, the non-vsetl{i}Vector instruction of the present invention only needs to be executed according to the youngest instruction in the vsetl{i} older than itself before entering the execution unit, which is much more efficient than the current refresh pipeline. The refresh pipeline needs to start from a new instruction fetch, instead of refreshing just waiting in the reservation station, and waiting until the youngest instruction in the vsetl{i} older than itself is executed before it can be executed in the execution unit.
本发明Vector指令对unactive element部分也在执行单元中执行了,最后再通过mask的方式来对数据进行选择,可以减少功耗,同时可以减少执行周期,减少latency。The Vector instruction of the present invention is also executed in the execution unit for the part of the inactive element, and finally the data is selected by means of a mask, which can reduce power consumption, and at the same time, can reduce the execution cycle and latency.
以上实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。The above embodiments are only used to illustrate the technical solutions of the present invention, but not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: The recorded technical solutions are modified, or some technical features thereof are equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (7)

  1. 一种risv_v vector指令集vsetli指令的实现方法,其特征在于,所述方法包括以下步骤:A realization method of risv_v vector instruction set vsetli instruction, is characterized in that, described method comprises the following steps:
    S1CPU乱序执行时,在rename模块中分配vectag[n:0]信息,判断指令是否为vsetli;When the S1CPU executes out of order, it allocates vectag[n:0] information in the rename module to determine whether the instruction is vsetli;
    S2若指令是vsetli,则将vectag+1,若是非vsetli指令,则vectag保持不变;S2 If the instruction is vsetli, vectag+1 will be added, and if it is not a vsetli instruction, the vectag will remain unchanged;
    S3发射到执行单元,对vsetli指令分发到csr模块;对应其它的vector指令分发到vpu模块;S3 is sent to the execution unit, and the vsetli instruction is distributed to the csr module; the corresponding other vector instructions are distributed to the vpu module;
    S4判断指令vectag与ROB广播的vectag一致时,指令从reserver station发射到执行单元;When S4 judges that the instruction vectag is consistent with the vectag broadcast by the ROB, the instruction is transmitted from the server station to the execution unit;
    S5指令执行完成,在ROB模块中,按顺序毕业,并在毕业时更新寄存器vectag,执行结束。The execution of the S5 instruction is completed, and in the ROB module, it graduates in sequence, and the register vectag is updated when it graduates, and the execution ends.
  2. 根据权利要求1所述的risv_v vector指令集vsetli指令的实现方法,其特征在于,所述方法中,每个cycle发射0-5条指令。The realization method of risv_v vector instruction set vsetli instruction according to claim 1, is characterized in that, in described method, each cycle transmits 0-5 instruction.
  3. 根据权利要求2所述的risv_v vector指令集vsetli指令的实现方法,其特征在于,所述方法中,若接受vsetli指令,则该cycle只发射vsetli,每个cycle分配一个vectag,其它指令等到下一cycle再发射。The method for realizing the risv_v vector instruction set vsetli instruction according to claim 2, wherein in the method, if the vsetli instruction is accepted, the cycle only transmits the vsetli, and each cycle allocates a vectag, and other instructions wait until the next cycle and launch again.
  4. 根据权利要求1所述的risv_v vector指令集vsetli指令的实现方法,其特征在于,所述方法中,将unactive发射到执行单元,执行2*n cycle完成,将unactive不发射到执行单元,执行n cycle完成。The method for realizing the risv_v vector instruction set vsetli instruction according to claim 1, wherein in the method, inactive is sent to the execution unit, and 2*n cycles are executed to complete, the inactive is not sent to the execution unit, and n is executed. cycle is complete.
  5. 根据权利要求1所述的risv_v vector指令集vsetli指令的实现方法,其特征在于,所述方法中,vpu模块reserve station中的指令vectag都需要和寄存器vectag比较,只有一致的情况下,该指令才能发射到执行单元。The realization method of risv_v vector instruction set vsetli instruction according to claim 1, is characterized in that, in the described method, the instruction vectag in the vpu module reserve station all needs to compare with the register vectag, only under the consistent situation, this instruction can Emitted to the execution unit.
  6. 根据权利要求1所述的risv_v vector指令集vsetli指令的实现方法,其特征在于,所述方法中,在rename中分配vectag[n:0],作为vpu指令发射到执行单元的一个条件,用以在执行vsetli指令时不刷新流水线。The realization method of risv_v vector instruction set vsetli instruction according to claim 1, is characterized in that, in described method, in rename, allocate vectag[n:0], as a condition that vpu instruction is transmitted to execution unit, in order to The pipeline is not flushed when executing vsetli instructions.
  7. 一种risv_v vector指令集vsetli指令的实现系统,所述系统用于执行如权利要求1-6任一项所述的risv_v vector指令集vsetli指令的实现方法,其特征在于,包括rename模块、dispatch模块、vpu模块和ROB模块。A realization system of risv_v vector instruction set vsetli instruction, described system is used for carrying out the realization method of risv_v vector instruction set vsetli instruction as described in any one of claim 1-6, it is characterized in that, comprise rename module, dispatch module , vpu module and ROB module.
PCT/CN2021/129454 2021-03-22 2021-11-09 Method and system for implementing vsetli instruction in risv_v vector instruction set WO2022199043A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/981,365 US20230068290A1 (en) 2021-03-22 2022-11-04 Implementation method and system of risc_v vector instruction set vsetvli instruction

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN202110300024.9 2021-03-22
CN202110300024 2021-03-22
CN202110733967.0 2021-06-30
CN202110733967.0A CN113467833A (en) 2021-06-30 2021-06-30 Method and system for realizing risv _ v vector instruction set vselti instruction

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/981,365 Continuation US20230068290A1 (en) 2021-03-22 2022-11-04 Implementation method and system of risc_v vector instruction set vsetvli instruction

Publications (1)

Publication Number Publication Date
WO2022199043A1 true WO2022199043A1 (en) 2022-09-29

Family

ID=83395174

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/129454 WO2022199043A1 (en) 2021-03-22 2021-11-09 Method and system for implementing vsetli instruction in risv_v vector instruction set

Country Status (1)

Country Link
WO (1) WO2022199043A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104932945A (en) * 2015-06-18 2015-09-23 合肥工业大学 Task-level out-of-order multi-issue scheduler and scheduling method thereof
US20200210186A1 (en) * 2018-12-27 2020-07-02 Intel Corporation Apparatus and method for non-spatial store and scatter instructions
CN113467833A (en) * 2021-06-30 2021-10-01 广东赛昉科技有限公司 Method and system for realizing risv _ v vector instruction set vselti instruction

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104932945A (en) * 2015-06-18 2015-09-23 合肥工业大学 Task-level out-of-order multi-issue scheduler and scheduling method thereof
US20200210186A1 (en) * 2018-12-27 2020-07-02 Intel Corporation Apparatus and method for non-spatial store and scatter instructions
CN113467833A (en) * 2021-06-30 2021-10-01 广东赛昉科技有限公司 Method and system for realizing risv _ v vector instruction set vselti instruction

Similar Documents

Publication Publication Date Title
US10768989B2 (en) Virtual vector processing
US9971635B2 (en) Method and apparatus for a hierarchical synchronization barrier in a multi-node system
US7962679B2 (en) Interrupt balancing for multi-core and power
US10235180B2 (en) Scheduler implementing dependency matrix having restricted entries
US8688883B2 (en) Increasing turbo mode residency of a processor
US8473681B2 (en) Atomic-operation coalescing technique in multi-chip systems
US8478926B1 (en) Co-processing acceleration method, apparatus, and system
US9703605B2 (en) Fine-grained heterogeneous computing
US7360103B2 (en) P-state feedback to operating system with hardware coordination
JP6580307B2 (en) Multi-core apparatus and job scheduling method for multi-core apparatus
US7490225B2 (en) Synchronizing master processor by stalling when tracking of coprocessor rename register resource usage count for sent instructions reaches credited apportioned number
US20070198983A1 (en) Dynamic resource allocation
JP2017049999A (en) Method of operating cpu, and method of operating system having the cpu
WO2022199043A1 (en) Method and system for implementing vsetli instruction in risv_v vector instruction set
US20160179532A1 (en) Managing allocation of physical registers in a block-based instruction set architecture (isa), and related apparatuses and methods
US20070043932A1 (en) Wakeup mechanisms for schedulers
CN113467833A (en) Method and system for realizing risv _ v vector instruction set vselti instruction
KR20080025652A (en) Demand-based processing resource allocation
WO2015096031A1 (en) Method and apparatus for allocating thread shared resource
WO2023029591A1 (en) Processor, physical register management method, and electronic apparatus
WO2021127255A1 (en) Renaming for hardware micro-fused memory operations
US7493471B2 (en) Coprocessor receiving renamed register identifier from master to complete an operation upon register data ready
KR20220154700A (en) Operand pool instruction reservation clustering in the processor's scheduler circuit
US10514925B1 (en) Load speculation recovery
US20240119015A1 (en) Instruction set architecture support for at-speed near-memory atomic operations in a non-cached distributed memory system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21932651

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21932651

Country of ref document: EP

Kind code of ref document: A1