WO2017024798A1 - 超长指令字指令集的指令处理方法及装置 - Google Patents
超长指令字指令集的指令处理方法及装置 Download PDFInfo
- Publication number
- WO2017024798A1 WO2017024798A1 PCT/CN2016/076933 CN2016076933W WO2017024798A1 WO 2017024798 A1 WO2017024798 A1 WO 2017024798A1 CN 2016076933 W CN2016076933 W CN 2016076933W WO 2017024798 A1 WO2017024798 A1 WO 2017024798A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- instruction
- line
- instructions
- memory
- row
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
Definitions
- the present invention relates to the field of a Very Long Instruction Word (VLIW) instruction set, and more particularly to an instruction processing method and apparatus for a very long instruction word instruction set.
- VLIW Very Long Instruction Word
- the VLIW instruction set processor has multiple pipelines and variable number of instructions per clock cycle.
- the VLIW instruction set processor performs instruction prefetching, each time an instruction from the memory space of the memory is fetched into the instruction buffer area; when a non-contiguous address fetch operation (jump operation) occurs, the VLIW instruction set
- the processor first clears the instruction buffer and then re-fetches the memory into the instruction buffer.
- instructions of the same clock cycle may be saved to the adjacent two rows of memory in memory; if the instructions in the instruction buffer can be combined into one clock cycle The complete instruction to be transmitted, these instructions will be transmitted to the corresponding pipeline; if the individual instructions in the instruction buffer cannot be combined into a complete instruction to be transmitted in one clock cycle, then it is necessary to wait for the next line from the memory.
- the instructions are saved to the instruction buffer, which results in a time period without instructions being transmitted, ie, an empty transmission, which reduces the efficiency of the VLIW instruction set processor.
- embodiments of the present invention are directed to an instruction processing method and apparatus for an ultra-long instruction word instruction set to avoid the occurrence of null transmission and improve the execution efficiency of the VLIW instruction set processor.
- Embodiments of the present invention provide an instruction processing method for a super long instruction word instruction set, and the method include:
- the determining, according to the first preset rule, that the instruction of the destination instruction line is stored in different lines of the memory includes:
- the method further includes:
- the inserting at least one empty operation instruction to the instruction line before the destination instruction line according to the second preset rule includes:
- k is a second parameter
- k h+A n-1
- the maximum number of instructions of the command line is j.
- the method further includes:
- the embodiment of the invention further discloses an instruction processing device for a very long instruction word instruction set, the device comprising:
- Determining a module configured to determine a destination instruction line for a non-contiguous address fetch operation
- a first processing module configured to determine, according to the first preset rule, that the instruction of the destination instruction line is stored in different rows of the memory
- the second processing module is configured to insert at least one null operation instruction into the instruction line before the destination instruction line according to the second preset rule, so that the instruction of the destination instruction line is stored in the same row of the memory.
- the first processing module is specifically configured to perform comparison with the A n by h, and if the h is smaller than the A n , determine that the A n instructions are stored in the memory. Row;
- the first processing module is further configured to determine that the A n instructions are stored in the same row of the memory if the h is greater than or equal to the A n .
- the second processing module is specifically configured to compare k and j, and if the k is greater than the j, insert the h empty operation instructions into the n-1 row;
- k is a second parameter
- k h+A n-1
- the maximum number of instructions of the command line is j.
- the second processing module is further configured to: if the k is less than or equal to Said j, inserting one empty operation instruction into the n-1 line, and the remaining y empty operation instructions are inserted into the instruction line before the n-1 line;
- the determining module, the first processing module, and the second processing module may perform a processing by using a central processing unit (CPU), a digital signal processor (DSP, Digital Singnal Processor), or programmable.
- a central processing unit CPU
- DSP digital signal processor
- FPGA Field-Programmable Gate Array
- an instruction processing apparatus of a VLIW instruction set finds and determines a destination instruction line of a non-contiguous address fetch operation in an assembly code, and an instruction instruction in the line Quantity, further, can also find and determine the instruction line before the destination instruction line of the non-contiguous address fetch operation and the number of instructions present in the corresponding line, the start address of the start instruction in the memory; then, the VLIW instruction set
- the instruction processing device determines, according to the first preset rule, whether all instructions of the destination instruction line determining the non-contiguous address fetch operation are stored in the same row of the memory, and if stored in the same row of the memory, ending the processing flow; if stored in The different processing lines of the memory, the instruction processing device of the VLIW instruction set inserts at least one empty operation instruction to the instruction line before the destination instruction line according to the second preset rule, so that the instruction of the destination instruction line is stored in the The same line of memory;
- Embodiment 1 is a flowchart of Embodiment 1 of an instruction processing method for a very long instruction word instruction set according to the present invention
- Embodiment 2 is a flowchart of Embodiment 2 of an instruction processing method for a very long instruction word instruction set according to the present invention
- FIG. 3 is an insertion operation of the second embodiment of the instruction processing method of the ultra-long instruction word instruction set of the present invention
- FIG. 4 is a schematic diagram of an assembly code instruction in a storage space after inserting a null operation instruction according to Embodiment 2 of the instruction processing method of the very long instruction word instruction set of the present invention
- FIG. 5 is a schematic structural diagram of an embodiment of an instruction processing apparatus for a very long instruction word instruction set according to the present invention.
- Embodiment 1 is a flowchart of Embodiment 1 of an instruction processing method for a very long instruction word instruction set according to the present invention. As shown in FIG. 1, the method may include:
- Step 101 Determine a destination instruction line of the non-contiguous address fetch operation.
- the instruction processing device of the VLIW instruction set uses the compiler to scan the assembly code, finds and determines the destination instruction line of the non-contiguous address fetch operation, and the number of instructions existing in the row, and can take the non-contiguous address fetch operation.
- the instruction line is defined as n lines, and the number of instructions in the n lines is defined as A n , wherein the n is a positive integer greater than or equal to 2.
- the assembly code given here represents a loop jump.
- the instruction jumps back to "LABEL" in the middle of the 1st line and the 2nd line.
- Re-execute the second line instruction this is a non-contiguous address fetch operation, the second action non-contiguous address fetches the destination instruction line of the operation; where Inst represents the instruction, each instruction A unit is needed in the memory to save.
- the instruction processing apparatus of the VLIW instruction set can also find and determine the instruction line before the destination instruction line of the non-contiguous address fetch operation and the number of existing instructions in the corresponding line, and the start instruction is in the memory.
- the starting address of the destination line of the destination instruction line of the non-contiguous address fetch operation can be defined as n-1 lines, and the number of instructions in the n-1 line is defined as the line A n-1 , n-1
- the start address of the start instruction is defined as addr n-1 in the memory; the first two lines of the instruction line of the non-contiguous address fetch operation are defined as n-2 lines, and the number of instructions in the n-2 line is defined as A n-2 , and so on.
- Step 102 Determine, according to the first preset rule, that the instruction of the destination instruction line is stored in different rows of the memory.
- the instruction processing device of the VLIW instruction set may determine, according to the first preset rule, whether all instructions of the destination instruction line determining the non-contiguous address fetch operation are stored in the same row of the memory, wherein the first preset rule may be the first A parameter is compared with the number of instructions of the destination command line of the non-contiguous address fetch operation.
- the processing operation according to the first preset rule is as follows:
- Step 103 Insert at least one empty operation instruction into the instruction line before the destination instruction line according to the second preset rule.
- the instruction processing apparatus of the VLIW instruction set inserts at least one empty into the instruction line before the destination instruction line according to the second preset rule.
- An operation instruction that is, according to the second preset rule, it may be determined which instruction lines in the instruction line before the destination instruction line are inserted, wherein the second preset rule may be the second parameter and an instruction line in the assembly code. The maximum number of instructions is used to judge the comparison.
- the processing operation according to the second preset rule is as follows:
- Step 104 Store the instruction of the destination instruction line in the same row of the memory.
- the instruction processing device of the VLIW instruction set can ensure that all the instructions of the destination instruction line of the non-contiguous address fetch operation are saved in the same row in the memory after completing the above-mentioned judgment of each instruction line and inserting the null operation instruction, VLIW
- the instruction set processor needs to obtain the non-contiguous address fetch operation. All the instructions of the instruction line can be fetched in one clock cycle, which can be put together into a complete instruction, avoiding the occurrence of empty transmit clock cycles and improving the VLIW instruction set.
- the processor's execution efficiency also avoids the deadweight loss of power consumption.
- FIG. 2 is a flowchart of Embodiment 2 of an instruction processing method for a very long instruction word instruction set according to the present invention
- FIG. 3 is an assembly code instruction before inserting a null operation instruction according to Embodiment 2 of the instruction processing method for a very long instruction word instruction set according to the present invention
- Schematic diagram of the storage space FIG. 4 is a super long instruction word instruction of the present invention.
- the instruction processing method of the second embodiment is a schematic diagram of the assembly code instruction after inserting the empty operation instruction in the storage space, as shown in FIG. 2, the method includes:
- Step 201 Determine a destination instruction line of the non-contiguous address fetch operation.
- the assembly code given here represents a loop jump.
- the instruction will jump back to "LABEL" between the 1st line and the 2nd line, and re-execute the 2nd line instruction.
- This is a non-contiguous address fetch operation.
- the second action is a non-contiguous address fetch operation destination instruction line.
- the instruction processing device of the VLIW instruction set clears the instruction buffer area and then re-fetches the second line instruction from the memory. Inst2_0" and "Inst2_1".
- the instruction processing device of the VLIW instruction set uses the compiler to scan the assembly code, checks the "goto LABEL” instruction on line 4, and finds and determines the purpose of the second action non-contiguous address fetch operation according to the "goto LABEL" instruction.
- the instruction line and the number of instructions present in the line are two.
- the number of instructions in the first line before the second line is three, the first The starting address of the line start instruction in the memory is 0x00 line 0th bit, the number of instructions fixedly stored in each line in the memory is 4, and the maximum number of instructions in one instruction line in the assembly code is 4.
- Step 202 Align the first parameter with the number of existing instructions in the second row, and determine whether all the instructions of the second row are stored in different rows of the memory.
- step 203 is performed; if the first parameter is less than 2, step 204 is performed.
- step 204 is performed.
- step 203 it is determined that the two instructions in the second line are stored in the same line of the memory, and the processing flow is ended.
- Step 204 Determine that the two instructions of the second row are stored in different rows of the memory.
- Step 205 Align the second parameter with the maximum number of instructions of an instruction line in the assembly code, and determine to insert at least one null operation instruction into the second line.
- the assembly code after inserting a null operation instruction "NOP" is as follows:
- Step 206 Store the instructions of the second row in the same row of the memory.
- the instruction set processor can get all the instructions in the second line in one clock cycle, and can be put together into a complete instruction, avoiding the appearance of the empty transmission clock cycle and improving the execution efficiency of the VLIW instruction set processor. Avoid the deadweight loss of power consumption.
- the instruction processing apparatus 05 of the super long instruction word instruction set may include: a determining module 51 and a first processing module 52. a second processing module 53; wherein
- the determining module 51 is configured to determine a destination command line of the non-contiguous address fetch operation
- the first processing module 52 is configured to determine, according to the first preset rule, that the instruction of the destination instruction line is stored in different rows of the memory;
- the second processing module 53 is configured to insert at least one empty operation instruction into the instruction line before the destination instruction line according to the second preset rule, so that the instruction of the destination instruction line is stored in the same line of the memory .
- the first processing module 52 is specifically configured to perform comparison with the A n by h, and if the h is smaller than the A n , determine the A n instruction storage. In different rows of the memory;
- the first processing module 52 is further configured to: if the h is greater than or equal to the A n , determine that the A n instructions are stored in the same row of the memory.
- the second processing module 53 is specifically configured to compare k and j. If the k is greater than the j, insert the h spaces into the n-1 row. Operation instruction
- k is a second parameter
- k h+A n-1
- the maximum number of instructions of the command line is j.
- the second processing module 53 is further configured to insert one null operation instruction into the n-1 row if the k is less than or equal to the j, and the remaining y a dummy operation instruction is inserted into the instruction line before the n-1 line;
- the device in this embodiment may be used to implement the technical solution of the foregoing method embodiment, and the implementation principle and the technical effect are similar, and details are not described herein again.
- the determining module 51, the first processing module 52, and the second processing module 53 may be implemented by a central processing unit (CPU), a microprocessor (MPU), a digital signal processor (DSP) or a field located on the terminal.
- CPU central processing unit
- MPU microprocessor
- DSP digital signal processor
- FPGA programmable gate array
- embodiments of the present invention can be provided as a method, system, or computer program product. Accordingly, the present invention can take the form of a hardware embodiment, a software embodiment, or a combination of software and hardware. Moreover, the invention can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage and optical storage, etc.) including computer usable program code.
- the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device.
- the apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.
- These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device.
- the instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.
- an instruction processing apparatus of a VLIW instruction set finds and determines a destination instruction line of a non-contiguous address fetch operation in an assembly code, and an instruction instruction in the line Quantity, further, can also find and determine the instruction line before the destination instruction line of the non-contiguous address fetch operation and the number of instructions present in the corresponding line, the start address of the start instruction in the memory; then, the VLIW instruction set
- the instruction processing device determines, according to the first preset rule, whether all instructions of the destination instruction line determining the non-contiguous address fetch operation are stored in the same row of the memory, and if stored in the same row of the memory, ending the processing flow; if stored in The different processing lines of the memory, the instruction processing device of the VLIW instruction set inserts at least one empty operation instruction to the instruction line before the destination instruction line according to the second preset rule, so that the instruction of the destination instruction line is stored in the The same line of memory;
- All the instructions of the instruction line can be fetched in one clock cycle, which can be put together into a complete instruction, avoiding the occurrence of an empty transmit clock cycle and improving the VLIW.
- the execution efficiency of the instruction set processor also avoids the deadweight loss of power consumption.
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Advance Control (AREA)
- Executing Machine-Instructions (AREA)
Abstract
一种超长指令字指令集的指令处理方法,包括:确定非连续地址取指操作的目的指令行(101);根据第一预设规则确定所述目的指令行的指令存储在存储器的不同行(102);根据第二预设规则向所述目的指令行之前的指令行插入至少一个空操作指令(103),使所述目的指令行的指令存储在所述存储器的同一行(104)。还公开了一种超长指令字指令集的指令处理装置(05)。
Description
本发明涉及超长指令字(Very Long Instruction Word,VLIW)指令集领域,尤其涉及一种超长指令字指令集的指令处理方法及装置。
VLIW指令集处理器有多流水线、每个时钟周期发射指令条数可变的特点。VLIW指令集处理器在进行指令预取时,每次从存储器的存储空间中取一个整行的指令放到指令缓存区;当出现非连续地址取指操作(跳转操作)时,VLIW指令集处理器先清空指令缓存区,再重新到存储器中取指令放到指令缓存区。
因每个时钟周期发射指令条数可变的原因,同一时钟周期的指令可能会被分别保存到存储器中的相邻的两行存储空间中;如果指令缓存区的各个指令能拼成一个时钟周期要发射的完整的指令,这些指令会被发射到相应流水线上执行;如果在指令缓存区中的各个指令不能拼成一个时钟周期要发射的完整的指令,则需要等待从存储器中再取得下一行指令并保存到指令缓存区,这样,就会产生一个没有指令发射的时间周期,也就是空发射,空发射会降低VLIW指令集处理器的执行效率。
发明内容
有鉴于此,本发明实施例期望提供一种超长指令字指令集的指令处理方法及装置,以避免空发射的出现,提高VLIW指令集处理器的执行效率。
为达到上述目的,本发明的技术方案是这样实现的:
本发明实施例提供一种超长指令字指令集的指令处理方法,所述方法
包括:
确定非连续地址取指操作的目的指令行;
根据第一预设规则确定所述目的指令行的指令存储在存储器的不同行;
根据第二预设规则向所述目的指令行之前的指令行插入至少一个空操作指令,使所述目的指令行的指令存储在所述存储器的同一行。
上述方法中,所述根据第一预设规则确定所述目的指令行的指令存储在存储器的不同行,包括:
通过h与所述An进行比对,如果所述h小于所述An,则确定所述An个指令存储在所述存储器的不同行;
其中,所述h为第一参数、h=i-((addrn-1+An-1)&(i-1)),所述i为所述存储器每行固定存储的指令数量,所述目的指令行为n行,所述n行为大于等于2的正整数,所述An为所述n行对应的指令数量,所述n行的前一指令行为n-1行,所述An-1为所述n-1行对应的指令数量,所述addrn-1为所述n-1行的起始指令在所述存储器中的起始地址,所述&为与运算。
上述方法中,所述方法还包括:
如果所述h大于等于所述An,则确定所述An个指令存储在所述存储器同一行。
上述方法中,所述根据第二预设规则向所述目的指令行之前的指令行插入至少一个空操作指令,包括:
通过k与j进行对比,如果所述k大于所述j,则向所述n-1行插入所述h个空操作指令;
其中,所述k为第二参数、k=h+An-1,所述指令行的最大指令数量为j个。
上述方法中,所述方法还包括:
如果所述k小于等于所述j,则向所述n-1行插入l个空操作指令,剩余的y个空操作指令插入所述n-1行之前的指令行中;
其中,所述k为第二参数、k=h+An-1,所述l=j-An-1,所述y=h-l。
本发明实施例还公开了一种超长指令字指令集的指令处理装置,所述装置包括:
确定模块,配置为确定非连续地址取指操作的目的指令行;
第一处理模块,配置为根据第一预设规则确定所述目的指令行的指令存储在存储器的不同行;
第二处理模块,配置为根据第二预设规则向所述目的指令行之前的指令行插入至少一个空操作指令,使所述目的指令行的指令存储在所述存储器的同一行。
上述装置中,所述第一处理模块,具体配置为通过h与所述An进行比对,如果所述h小于所述An,则确定所述An个指令存储在所述存储器的不同行;
其中,所述h为第一参数、h=i-((addrn-1+An-1)&(i-1)),所述i为所述存储器每行固定存储的指令数量,所述目的指令行为n行,所述n行为大于等于2的正整数,所述An为所述n行对应的指令数量,所述n行的前一指令行为n-1行,所述An-1为所述n-1行对应的指令数量,所述addrn-1为所述n-1行的起始指令在所述存储器中的起始地址,所述&为与运算。
上述装置中,所述第一处理模块,还具体配置为如果所述h大于等于所述An,则确定所述An个指令存储在所述存储器同一行。
上述装置中,所述第二处理模块,具体配置为通过k与j进行对比,如果所述k大于所述j,则向所述n-1行插入所述h个空操作指令;
其中,所述k为第二参数、k=h+An-1,所述指令行的最大指令数量为j个。
上述装置中,所述第二处理模块,还具体配置为如果所述k小于等于所
述j,则向所述n-1行插入l个空操作指令,剩余的y个空操作指令插入所述n-1行之前的指令行中;
其中,所述k为第二参数、k=h+An-1,所述l=j-An-1,所述y=h-l。
所述确定模块、所述第一处理模块、所述第二处理模块在执行处理时,可以采用中央处理器(CPU,Central Processing Unit)、数字信号处理器(DSP,Digital Singnal Processor)或可编程逻辑阵列(FPGA,Field-Programmable Gate Array)实现。
本发明实施例提供的超长指令字指令集的指令处理方法及装置,VLIW指令集的指令处理装置在汇编代码中找到并确定非连续地址取指操作的目的指令行以及该行中存在指令的数量,进一步的,还可以找到并确定非连续地址取指操作的目的指令行之前的指令行以及相应行中存在指令的数量、起始指令在存储器中的起始地址;接着,VLIW指令集的指令处理装置根据第一预设规则来判断确定非连续地址取指操作的目的指令行的所有指令是否存储在存储器的同一行,如果存储在所述存储器同一行,则结束处理流程;如果存储在所述存储器的不同行,则VLIW指令集的指令处理装置根据第二预设规则,向所述目的指令行之前的指令行插入至少一个空操作指令,使所述目的指令行的指令存储在所述存储器的同一行;如此,使得VLIW指令集处理器需要得到非连续地址取指操作的目的指令行的全部指令只需一个时钟周期就能取到,从而能够拼凑成完整的指令,避免了空发射时钟周期的出现,提高了VLIW指令集处理器的执行效率,也避免了功耗的无谓损失。
图1为本发明超长指令字指令集的指令处理方法实施例一的流程图;
图2为本发明超长指令字指令集的指令处理方法实施例二的流程图;
图3为本发明超长指令字指令集的指令处理方法实施例二的插入空操
作指令前的汇编代码指令在存储空间的示意图;
图4为本发明超长指令字指令集的指令处理方法实施例二的插入空操作指令后的汇编代码指令在存储空间的示意图;
图5为本发明超长指令字指令集的指令处理装置实施例的结构示意图。
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述。
图1为本发明超长指令字指令集的指令处理方法实施例一的流程图,如图1所示,本方法可以包括:
步骤101、确定非连续地址取指操作的目的指令行。
在编译阶段,VLIW指令集的指令处理装置利用编译器扫描汇编代码,找到并确定非连续地址取指操作的目的指令行以及该行中存在指令的数量,可以将非连续地址取指操作的目的指令行定义为n行,n行中指令的数量定义为An个,其中,所述n为大于等于2的正整数。
如上所示的汇编代码,这里给出的汇编代码表示一个循环跳转,当汇编代码运行到第4行“goto LABEL”,指令会跳转回第1行与第2行中间的“LABEL”,重新执行第2行指令,这样就是一个非连续地址取指操作,第2行为非连续地址取指操作的目的指令行;其中,Inst表示指令,每个指令
在存储器中需要一个单元来保存。
本发明实施例一实施方式中,VLIW指令集的指令处理装置还可以找到并确定非连续地址取指操作的目的指令行之前的指令行以及相应行中存在指令的数量、起始指令在存储器中的起始地址,可以将非连续地址取指操作的目的指令行的前一行指令行定义为n-1行、n-1行中指令的数量定义为An-1、n-1行的起始指令在存储器中的起始地址定义为addrn-1;将非连续地址取指操作的目的指令行的前两行指令行定义为n-2行、n-2行中指令的数量定义为An-2,以此类推。
步骤102、根据第一预设规则确定所述目的指令行的指令存储在存储器的不同行。
这里,可由VLIW指令集的指令处理装置根据第一预设规则来判断确定非连续地址取指操作的目的指令行的所有指令是否存储在存储器的同一行,其中,第一预设规则可以是第一参数与非连续地址取指操作的目的指令行的指令数量进行判断对比的规则。
具体的,根据第一预设规则处理操作如下:
通过h与所述An进行比对,如果所述h小于所述An,则确定所述An个指令存储在所述存储器的不同行,继续执行步骤103;如果所述h大于等于所述An,则确定所述An个指令存储在所述存储器同一行,结束处理流程;其中,所述h为第一参数、h=i-((addrn-1+An-1)&(i-1)),所述i为所述存储器每行固定存储的指令数量,所述&为与运算。
步骤103、根据第二预设规则向所述目的指令行之前的指令行插入至少一个空操作指令。
本步骤中,在确定所述An个指令存储在所述存储器的不同行后,VLIW指令集的指令处理装置根据第二预设规则,向所述目的指令行之前的指令行插入至少一个空操作指令;即根据第二预设规则可以确定向所述目的指
令行之前的哪些指令行插入多少个空操作指令,其中,第二预设规则可以是第二参数与汇编代码中一个指令行的最大指令数量进行判断对比的规则。
具体的,根据第二预设规则处理操作如下:
通过k与所述j进行对比,如果所述k大于所述j,则向所述n-1行插入所述h个空操作指令;如果所述k小于等于所述j,则向所述n-1行插入l个空操作指令,剩余的y个空操作指令插入所述n-1行之前的指令行中,这里需要说明的是,在插入空操作指令后的指令行中的指令个数要小于等于指令行的最大指令数j,如果剩余的y个空操作指令在插入n-2行后还有剩余z个,那么就将剩余的z个插入n-3行中,以此类推,直至空操作指令全部被插入到目的指令行n之前的指令行中为止;其中,所述k为第二参数、k=h+An-1,所述h=i-((addrn-1+An-1)&(i-1)),所述l=j-An-1,所述j为一个指令行的最大指令数量,所述y=h-l。
步骤104、使所述目的指令行的指令存储在所述存储器的同一行。
这里,VLIW指令集的指令处理装置在完成上述对各指令行的判断和插入空操作指令后,就能够确保非连续地址取指操作的目的指令行的所有指令保存在存储器中的同一行,VLIW指令集处理器要得到非连续地址取指操作的目的指令行的全部指令只需一个时钟周期就能取到,能够拼凑成完整的指令,避免了空发射的时钟周期的出现,提高了VLIW指令集处理器的执行效率,也避免了功耗的无谓损失。
为了更加体现出本发明的目的,在上述实施例的基础上,进一步的举例说明。
图2为本发明超长指令字指令集的指令处理方法实施例二的流程图,图3为本发明超长指令字指令集的指令处理方法实施例二的插入空操作指令前的汇编代码指令在存储空间的示意图,图4为本发明超长指令字指令
集的指令处理方法实施例二的插入空操作指令后的汇编代码指令在存储空间的示意图,如图2所示,本方法包括:
步骤201、确定非连续地址取指操作的目的指令行。
如下面的汇编代码所示:
这里给出的汇编代码表示一个循环跳转,当汇编代码运行到第4行“goto LABEL”,指令会跳转回第1行与第2行中间的“LABEL”,重新执行第2行指令,这样就是一个非连续地址取指操作,第2行为非连续地址取指操作的目的指令行;VLIW指令集的指令处理装置会先清空指令缓存区,然后再重新从存储器中取第2行指令“Inst2_0”和“Inst2_1”。
如图3所示,因为第2行的指令“Inst2_0”和“Inst2_1”分别被保存在存储器的不同行,“Inst2_0”被保存在0x00行,“Inst2_1”被保存在0x04行,所以在一个时钟周期内只能取到0x00行中的“Inst2_0”,而0x04行中的“Inst2_1”只能等到下一个时钟周期才能取到;因而在第一个时钟周期内没有能够拼凑成完整的指令,就会出现空发射的时钟周期。
在编译阶段,VLIW指令集的指令处理装置利用编译器扫描汇编代码,检查到第4行的“goto LABEL”指令,根据“goto LABEL”指令找到并确定第2行为非连续地址取指操作的目的指令行以及该行中存在指令的数量为2个。
另外,还可以确定第2行之前的第1行中的指令的数量为3个、第1
行的起始指令在存储器中的起始地址为0x00行第0位,存储器中每行固定存储的指令数量为4个,汇编代码中一个指令行的最大指令数量为4个。
步骤202、通过第一参数与第2行中存在指令的数量进行比对,判断第2行的所有指令是否存储在存储器的不同行。
如果第一参数大于等于2,则执行步骤203;如果第一参数小于2,则执行步骤204。
具体的,第一参数通过上述获取到的数据进行计算,得到第一参数的值为4-((0+3)&(4-1))=1。
1与2进行对比,1小于2,所以执行步骤204。
步骤203、确定第2行的2个指令存储在存储器的同一行,结束处理流程。
步骤204、确定第2行的2个指令存储在存储器的不同行。
步骤205、通过第二参数与汇编代码中一个指令行的最大指令数量进行比对,确定向第2行插入至少一个空操作指令。
具体的,通过上述获取到的数据对第二参数及插入的空操作指令的数量进行计算,得到第二参数的值为1+4=5、插入的空操作指令的数量为4-((0+3)&(4-1))=1;
5大于4,所以向第2行插入1个空操作指令“NOP”。
插入1个空操作指令“NOP”后的汇编代码如下所示:
步骤206、使第2行的指令存储在所述存储器的同一行。
如图4所示,在插入1个空操作指令“NOP”后的指令的存储空间,这样就可以使第2行的指令“Inst2_0”和“Inst2_1”保存到存储器的同一行0x04行中,VLIW指令集处理器要得到第2行的全部指令只需一个时钟周期就能取到,能够拼凑成完整的指令,避免了空发射的时钟周期的出现,提高了VLIW指令集处理器的执行效率,也避免了功耗的无谓损失。
图5为本发明超长指令字指令集的指令处理装置实施例的结构示意图,如图5所示,超长指令字指令集的指令处理装置05可以包括:确定模块51、第一处理模块52、第二处理模块53;其中,
所述确定模块51,配置为确定非连续地址取指操作的目的指令行;
所述第一处理模块52,配置为根据第一预设规则,确定所述目的指令行的指令存储在存储器的不同行;
所述第二处理模块53,配置为根据第二预设规则,向所述目的指令行之前的指令行插入至少一个空操作指令,使所述目的指令行的指令存储在所述存储器的同一行。
本发明实施例一实施方式中,所述第一处理模块52,具体配置为通过h与所述An进行比对,如果所述h小于所述An,则确定所述An个指令存储在所述存储器的不同行;
其中,所述h为第一参数、h=i-((addrn-1+An-1)&(i-1)),所述i为所述存储器每行固定存储的指令数量,所述目的指令行为n行,所述n行为大于等于2的正整数,所述An为所述n行对应的指令数量,所述n行的前一指令行为n-1行,所述An-1为所述n-1行对应的指令数量,所述addrn-1为所述n-1行的起始指令在所述存储器中的起始地址,所述&为与运算。
本发明实施例一实施方式中,所述第一处理模块52,还具体配置为如
果所述h大于等于所述An,则确定所述An个指令存储在所述存储器同一行。
本发明实施例一实施方式中,所述第二处理模块53,具体配置为通过k与j进行对比,如果所述k大于所述j,则向所述n-1行插入所述h个空操作指令;
其中,所述k为第二参数、k=h+An-1,所述指令行的最大指令数量为j个。
本发明实施例一实施方式中,所述第二处理模块53,还具体配置为如果所述k小于等于所述j,则向所述n-1行插入l个空操作指令,剩余的y个空操作指令插入所述n-1行之前的指令行中;
其中,所述k为第二参数、k=h+An-1,所述l=j-An-1,所述y=h-l。
本实施例的装置,可以用于执行上述所示方法实施例的技术方案,其实现原理和技术效果类似,此处不再赘述。
在实际应用中,所述确定模块51、第一处理模块52、第二处理模块53可由位于终端上的中央处理器(CPU)、微处理器(MPU)、数字信号处理器(DSP)或现场可编程门阵列(FPGA)等器件实现。
本领域内的技术人员应明白,本发明的实施例可提供为方法、系统、或计算机程序产品。因此,本发明可采用硬件实施例、软件实施例、或结合软件和硬件方面的实施例的形式。而且,本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器和光学存储器等)上实施的计算机程序产品的形式。
本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得
通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
以上所述,仅为本发明的较佳实施例而已,并非用于限定本发明的保护范围。
本发明实施例提供的超长指令字指令集的指令处理方法及装置,VLIW指令集的指令处理装置在汇编代码中找到并确定非连续地址取指操作的目的指令行以及该行中存在指令的数量,进一步的,还可以找到并确定非连续地址取指操作的目的指令行之前的指令行以及相应行中存在指令的数量、起始指令在存储器中的起始地址;接着,VLIW指令集的指令处理装置根据第一预设规则来判断确定非连续地址取指操作的目的指令行的所有指令是否存储在存储器的同一行,如果存储在所述存储器同一行,则结束处理流程;如果存储在所述存储器的不同行,则VLIW指令集的指令处理装置根据第二预设规则,向所述目的指令行之前的指令行插入至少一个空操作指令,使所述目的指令行的指令存储在所述存储器的同一行;如此,使
得VLIW指令集处理器需要得到非连续地址取指操作的目的指令行的全部指令只需一个时钟周期就能取到,从而能够拼凑成完整的指令,避免了空发射时钟周期的出现,提高了VLIW指令集处理器的执行效率,也避免了功耗的无谓损失。
Claims (10)
- 一种超长指令字指令集的指令处理方法,所述方法包括:确定非连续地址取指操作的目的指令行;根据第一预设规则确定所述目的指令行的指令存储在存储器的不同行;根据第二预设规则向所述目的指令行之前的指令行插入至少一个空操作指令,使所述目的指令行的指令存储在所述存储器的同一行。
- 根据权利要求1所述的方法,其中,所述根据第一预设规则确定所述目的指令行的指令存储在存储器的不同行,包括:通过h与所述An进行比对,如果所述h小于所述An,则确定所述An个指令存储在所述存储器的不同行;其中,所述h为第一参数、h=i-((addrn-1+An-1)&(i-1)),所述i为所述存储器每行固定存储的指令数量,所述目的指令行为n行,所述n行为大于等于2的正整数,所述An为所述n行对应的指令数量,所述n行的前一指令行为n-1行,所述An-1为所述n-1行对应的指令数量,所述addrn-1为所述n-1行的起始指令在所述存储器中的起始地址,所述&为与运算。
- 根据权利要求2所述的方法,其中,所述方法还包括:如果所述h大于等于所述An,则确定所述An个指令存储在所述存储器同一行。
- 根据权利要求2所述的方法,其中,所述根据第二预设规则向所述目的指令行之前的指令行插入至少一个空操作指令,包括:通过k与j进行对比,如果所述k大于所述j,则向所述n-1行插入所述h个空操作指令;其中,所述k为第二参数、k=h+An-1,所述指令行的最大指令数量为j 个。
- 根据权利要求4所述的方法,其中,所述方法还包括:如果所述k小于等于所述j,则向所述n-1行插入l个空操作指令,剩余的y个空操作指令插入所述n-1行之前的指令行中;其中,所述k为第二参数、k=h+An-1,所述l=j-An-1,所述y=h-l。
- 一种超长指令字指令集的指令处理装置,所述装置包括:确定模块,配置为确定非连续地址取指操作的目的指令行;第一处理模块,配置为根据第一预设规则确定所述目的指令行的指令存储在存储器的不同行;第二处理模块,配置为根据第二预设规则向所述目的指令行之前的指令行插入至少一个空操作指令,使所述目的指令行的指令存储在所述存储器的同一行。
- 根据权利要求6所述的装置,其中,所述第一处理模块,配置为通过h与所述An进行比对,如果所述h小于所述An,则确定所述An个指令存储在所述存储器的不同行;其中,所述h为第一参数、h=i-((addrn-1+An-1)&(i-1)),所述i为所述存储器每行固定存储的指令数量,所述目的指令行为n行,所述n行为大于等于2的正整数,所述An为所述n行对应的指令数量,所述n行的前一指令行为n-1行,所述An-1为所述n-1行对应的指令数量,所述addrn-1为所述n-1行的起始指令在所述存储器中的起始地址,所述&为与运算。
- 根据权利要求7所述的装置,其中,所述第一处理模块,还配置为如果所述h大于等于所述An,则确定所述An个指令存储在所述存储器同一行。
- 根据权利要求7所述的装置,其中,所述第二处理模块,配置为通过k与j进行对比,如果所述k大于所述j,则向所述n-1行插入所述h个空 操作指令;其中,所述k为第二参数、k=h+An-1,所述指令行的最大指令数量为j个。
- 根据权利要求9所述的装置,其中,所述第二处理模块,还配置为如果所述k小于等于所述j,则向所述n-1行插入l个空操作指令,剩余的y个空操作指令插入所述n-1行之前的指令行中;其中,所述k为第二参数、k=h+An-1,所述l=j-An-1,所述y=h-l。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510496662.7 | 2015-08-13 | ||
CN201510496662.7A CN106445466B (zh) | 2015-08-13 | 2015-08-13 | 超长指令字指令集的指令处理方法及装置 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2017024798A1 true WO2017024798A1 (zh) | 2017-02-16 |
Family
ID=57982992
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2016/076933 WO2017024798A1 (zh) | 2015-08-13 | 2016-03-21 | 超长指令字指令集的指令处理方法及装置 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN106445466B (zh) |
WO (1) | WO2017024798A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111124416A (zh) * | 2019-12-09 | 2020-05-08 | 龙芯中科(合肥)技术有限公司 | 向内联汇编传递参数的方法、装置、设备以及存储介质 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1675618A (zh) * | 2002-08-05 | 2005-09-28 | 皇家飞利浦电子股份有限公司 | 用于处理vliw指令的处理器和方法 |
US20080256334A1 (en) * | 2005-11-15 | 2008-10-16 | Nxp B.V. | Processing System and Method for Executing Instructions |
CN102855120A (zh) * | 2012-09-14 | 2013-01-02 | 北京中科晶上科技有限公司 | 超长指令字vliw的处理器和处理方法 |
CN103116485A (zh) * | 2013-01-30 | 2013-05-22 | 西安电子科技大学 | 一种基于超长指令字专用指令集处理器的汇编器设计方法 |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6112299A (en) * | 1997-12-31 | 2000-08-29 | International Business Machines Corporation | Method and apparatus to select the next instruction in a superscalar or a very long instruction word computer having N-way branching |
JP5168143B2 (ja) * | 2006-06-15 | 2013-03-21 | 日本電気株式会社 | プロセッサ、および、命令制御方法 |
WO2015013895A1 (zh) * | 2013-07-30 | 2015-02-05 | 华为技术有限公司 | 指令的跳转处理方法和装置 |
-
2015
- 2015-08-13 CN CN201510496662.7A patent/CN106445466B/zh active Active
-
2016
- 2016-03-21 WO PCT/CN2016/076933 patent/WO2017024798A1/zh active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1675618A (zh) * | 2002-08-05 | 2005-09-28 | 皇家飞利浦电子股份有限公司 | 用于处理vliw指令的处理器和方法 |
US20080256334A1 (en) * | 2005-11-15 | 2008-10-16 | Nxp B.V. | Processing System and Method for Executing Instructions |
CN102855120A (zh) * | 2012-09-14 | 2013-01-02 | 北京中科晶上科技有限公司 | 超长指令字vliw的处理器和处理方法 |
CN103116485A (zh) * | 2013-01-30 | 2013-05-22 | 西安电子科技大学 | 一种基于超长指令字专用指令集处理器的汇编器设计方法 |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111124416A (zh) * | 2019-12-09 | 2020-05-08 | 龙芯中科(合肥)技术有限公司 | 向内联汇编传递参数的方法、装置、设备以及存储介质 |
CN111124416B (zh) * | 2019-12-09 | 2024-02-13 | 龙芯中科(合肥)技术有限公司 | 向内联汇编传递参数的方法、装置、设备以及存储介质 |
Also Published As
Publication number | Publication date |
---|---|
CN106445466B (zh) | 2019-07-09 |
CN106445466A (zh) | 2017-02-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
TW541458B (en) | loop cache memory and cache controller for pipelined microprocessors | |
TWI489779B (zh) | 狀態機晶格中之布林邏輯 | |
TWI486810B (zh) | 在狀態機晶格中之計數器操作 | |
US20110072249A1 (en) | Unanimous branch instructions in a parallel thread processor | |
US20120060016A1 (en) | Vector Loads from Scattered Memory Locations | |
TWI582692B (zh) | 三維摩頓座標轉換處理器,方法,系統,及指令 | |
CN110109859B (zh) | 可编程平台上的加速器架构 | |
KR101624777B1 (ko) | 효율적인 벡터 롤링 연산 장치 및 방법 | |
TW201543357A (zh) | 基於同時多執行緒(smt)的中央處理單元以及用於檢測指令的資料相關性的裝置 | |
JP3749233B2 (ja) | パイプラインでの命令実行方法及び装置 | |
JP2004529405A (ja) | 依存性を決定するためのコンテンツ・アドレス指定可能メモリを実装したスーパースケーラ・プロセッサ | |
US20220197655A1 (en) | Broadcast synchronization for dynamically adaptable arrays | |
KR102152735B1 (ko) | 그래픽 처리 장치 및 이의 동작 방법 | |
TWI732775B (zh) | 用於資料處理的裝置、方法、電腦軟體、儲存媒體及虛擬機 | |
WO2017024798A1 (zh) | 超长指令字指令集的指令处理方法及装置 | |
US11539509B2 (en) | Memory optimization for nested hash operations | |
CN103336681A (zh) | 针对采用变长指令集的流水线结构处理器的取指方法 | |
CN102779026B (zh) | 一种高性能dsp处理器中的指令多发射方法 | |
JP4771079B2 (ja) | Vliw型プロセッサ | |
JP2012150634A (ja) | ベクトル命令制御回路及びリストベクトルの追い越し制御方法 | |
US8589661B2 (en) | Odd and even start bit vectors | |
US8631173B2 (en) | Semiconductor device | |
CN102063290B (zh) | 一种系统化risc cpu流水线控制方法 | |
TWI509509B (zh) | 資料儲存方法與應用其之處理器 | |
US11080054B2 (en) | Data processing apparatus and method for generating a status flag using predicate indicators |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 16834424 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 16834424 Country of ref document: EP Kind code of ref document: A1 |