WO2024108836A1 - Computing device, operating method, and machine-readable storage medium - Google Patents

Computing device, operating method, and machine-readable storage medium Download PDF

Info

Publication number
WO2024108836A1
WO2024108836A1 PCT/CN2023/084091 CN2023084091W WO2024108836A1 WO 2024108836 A1 WO2024108836 A1 WO 2024108836A1 CN 2023084091 W CN2023084091 W CN 2023084091W WO 2024108836 A1 WO2024108836 A1 WO 2024108836A1
Authority
WO
WIPO (PCT)
Prior art keywords
instruction
descriptor
address
circuit
address information
Prior art date
Application number
PCT/CN2023/084091
Other languages
French (fr)
Chinese (zh)
Inventor
陈林
陈小强
梁伟
Original Assignee
上海壁仞科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海壁仞科技股份有限公司 filed Critical 上海壁仞科技股份有限公司
Publication of WO2024108836A1 publication Critical patent/WO2024108836A1/en

Links

Landscapes

  • Executing Machine-Instructions (AREA)

Abstract

The present invention provides a computing device, an operating method therefor, and a machine-readable storage medium. The computing device comprises an execution unit (EU) circuit and an instruction scheduler (IS) circuit. The IS circuit may transmit an instruction to the EU circuit for execution. The IS circuit comprises a descriptor address register (DAR). The DAR is used for storing address information of a resource descriptor. When a current instruction processed by the IS circuit uses the resource descriptor, the IS circuit obtains the address information stored in the local DAR of the IS circuit, without triggering the EU circuit to read, to the IS circuit, the address information of the resource descriptor from a scalar register file external to the IS circuit. After obtaining the address information of the resource descriptor, the IS circuit transmits, on the basis of the address information, the current instruction using the resource descriptor to the EU circuit for execution.

Description

计算装置、操作方法和机器可读存储介质Computing device, operating method and machine-readable storage medium
本申请要求于2022年11月24日递交的中国专利申请第202211486231.9号的优先权,在此全文引用上述中国专利申请公开的内容以作为本申请的一部分。This application claims priority to Chinese Patent Application No. 202211486231.9 filed on November 24, 2022. The contents of the above-mentioned Chinese patent application disclosure are hereby cited in their entirety as a part of this application.
技术领域Technical Field
本公开的实施例涉及一种计算装置、操作方法和机器可读存储介质。Embodiments of the present disclosure relate to a computing device, an operating method, and a machine-readable storage medium.
背景技术Background technique
一般而言,资源描述符(resource descriptor)用于描述资源(例如图像、纹理等)的相关信息(例如尺寸等)。指令可以依据资源描述符去访问在存储器内的资源。指令可能使用一个或多个资源描述符。举例来说,加载指令(load instruction)、存储指令(store instruction)或原子指令(atomic instruction,即最小操作指令)使用一个资源描述符,而纹理指令(texture instruction)通常使用两个资源描述符。无绑定资源描述符(bindless resource descriptor)通常不预先绑定到任何片上可寻址存储器(on-chip addressable memory)。无绑定资源描述符通常由资源访问(resource access instruction)指令动态获取。例如,存储器访问指令必须在对资源执行存储器访问之前获取资源描述符。In general, a resource descriptor is used to describe information about a resource (e.g., image, texture, etc.) (e.g., size, etc.). Instructions can access resources in memory based on resource descriptors. Instructions may use one or more resource descriptors. For example, a load instruction, a store instruction, or an atomic instruction (i.e., the minimum operation instruction) uses one resource descriptor, while a texture instruction usually uses two resource descriptors. A bindless resource descriptor is usually not pre-bound to any on-chip addressable memory. A bindless resource descriptor is usually obtained dynamically by a resource access instruction. For example, a memory access instruction must obtain a resource descriptor before performing a memory access on a resource.
标量寄存器堆(scalar register file,SRF)预先存有资源描述符的64位完整存储器地址(目标资源在存储器的地址)或资源描述符在描述符集(descriptor set)中的32位偏移量(offset)。在指令使用无绑定资源描述符去访问存储器的过程中,指令调度器(Instruction Scheduler,IS)会依据指令所提供的SRF的地址去从SRF取得无绑定资源描述符的完整存储器地址。在取得资源描述符的地址/偏移量过程的第一步中,指令调度器向执行单元(Execution Unit、EU)发出额外的隐藏指令(hidden instruction),以读取在SRF内的值(资源描述符的地址/偏移量)。在第二步中,执行单元执行指令调度器所发出的隐藏指令而从SRF读取资源描述符的地址/偏移量。在第三步 中,执行单元将来自SRF的值(资源描述符的地址/偏移量)回传给指令调度器。若回传给指令调度器的值是资源描述符在描述符集内的偏移量,指令调度器可以使用描述符集的识别号(identifier,ID)(或者描述符集的基本存储器地址)与执行单元回传的偏移量去算出资源描述符的完整存储器地址。不论是指令调度器算出资源描述符的完整存储器地址或是执行单元直接将资源描述符的完整存储器地址回传给指令调度器,在确定资源描述符的完整存储器地址后,基于执行单元的执行,指令可以使用资源描述符去访问存储器。The scalar register file (SRF) pre-stores the 64-bit complete memory address of the resource descriptor (the address of the target resource in the memory) or the 32-bit offset of the resource descriptor in the descriptor set. When an instruction uses an unbound resource descriptor to access memory, the instruction scheduler (IS) obtains the complete memory address of the unbound resource descriptor from the SRF based on the address of the SRF provided by the instruction. In the first step of the process of obtaining the address/offset of the resource descriptor, the instruction scheduler issues an additional hidden instruction to the execution unit (EU) to read the value in the SRF (the address/offset of the resource descriptor). In the second step, the execution unit executes the hidden instruction issued by the instruction scheduler to read the address/offset of the resource descriptor from the SRF. In the third step In the example, the execution unit returns the value from the SRF (the address/offset of the resource descriptor) to the instruction scheduler. If the value returned to the instruction scheduler is the offset of the resource descriptor within the descriptor set, the instruction scheduler can use the identifier (ID) of the descriptor set (or the base memory address of the descriptor set) and the offset returned by the execution unit to calculate the complete memory address of the resource descriptor. Regardless of whether the instruction scheduler calculates the complete memory address of the resource descriptor or the execution unit directly returns the complete memory address of the resource descriptor to the instruction scheduler, after determining the complete memory address of the resource descriptor, based on the execution of the execution unit, the instruction can use the resource descriptor to access the memory.
问题是,指令调度器必须向执行单元发出额外的隐藏指令,而隐藏指令会导致执行单元流水线延迟(pipeline latency)。隐藏指令是指令调度器动态生成的额外指令,而不是指令集的指令。隐藏指令不在程序原始码中,因此无法在程序原始码的编译(compilation)过程中对隐藏指令进行优化。再者,每个描述符都强制执行隐藏指令,从而忽视了在指令调度器内的描述符的重复利用,并对硬件造成负担。The problem is that the instruction scheduler must issue additional hidden instructions to the execution unit, and hidden instructions cause pipeline latency in the execution unit. Hidden instructions are additional instructions dynamically generated by the instruction scheduler, and are not instructions in the instruction set. Hidden instructions are not in the program source code, so they cannot be optimized during the compilation of the program source code. Furthermore, hidden instructions are forced to be executed for each descriptor, which ignores the reuse of descriptors in the instruction scheduler and puts a burden on the hardware.
发明内容Summary of the invention
本公开提供一种计算装置及其操作方法,以及机器可读存储介质,以更有效率地取得资源描述符的地址信息。The present disclosure provides a computing device and an operating method thereof, and a machine-readable storage medium, so as to more efficiently obtain address information of a resource descriptor.
在根据本公开的实施例中,所述计算装置包括执行单元(Execution Unit、EU)电路以及指令调度器(Instruction Scheduler,IS)电路。指令调度器电路耦接至执行单元电路,用以发射指令给执行单元电路执行。指令调度器电路包括描述符地址寄存器(descriptor address registers,DAR)。描述符地址寄存器用以存放第一资源描述符(resource descriptor)的第一地址信息。在指令调度器电路所处理的目前指令使用第一资源描述符的情况下,指令调度器电路取用存放在指令调度器电路本地的描述符地址寄存器的第一地址信息,而无需触发执行单元电路从指令调度器电路外部的标量寄存器堆(scalar register file,SRF)读取第一资源描述符的第一地址信息给指令调度器电路。在取得第一资源描述符的第一地址信息后,指令调度器电路基于第一地址信息将使用第一资源描述符的目前指令发射给执行单元电路执行。In an embodiment according to the present disclosure, the computing device includes an execution unit (EU) circuit and an instruction scheduler (IS) circuit. The instruction scheduler circuit is coupled to the execution unit circuit to send instructions to the execution unit circuit for execution. The instruction scheduler circuit includes a descriptor address register (DAR). The descriptor address register is used to store the first address information of the first resource descriptor. In the case where the current instruction processed by the instruction scheduler circuit uses the first resource descriptor, the instruction scheduler circuit uses the first address information of the descriptor address register stored locally in the instruction scheduler circuit without triggering the execution unit circuit to read the first address information of the first resource descriptor from the scalar register file (SRF) outside the instruction scheduler circuit to the instruction scheduler circuit. After obtaining the first address information of the first resource descriptor, the instruction scheduler circuit sends the current instruction using the first resource descriptor to the execution unit circuit for execution based on the first address information.
在根据本公开的实施例中,所述操作方法包括:由计算装置的指令调度器电路的描述符地址寄存器存放第一资源描述符的第一地址信息;在指令调 度器电路所处理的目前指令使用第一资源描述符的情况下,取用存放在指令调度器电路本地的描述符地址寄存器的第一地址信息,而无需触发计算装置的执行单元电路从指令调度器电路外部的标量寄存器堆读取第一资源描述符的第一地址信息给指令调度器电路;以及在取得第一资源描述符的第一地址信息后,由指令调度器电路基于第一地址信息将使用第一资源描述符的目前指令发射给执行单元电路执行。In an embodiment of the present disclosure, the operation method includes: storing first address information of a first resource descriptor in a descriptor address register of an instruction scheduler circuit of a computing device; When the current instruction processed by the instruction scheduler circuit uses the first resource descriptor, the first address information of the descriptor address register stored locally in the instruction scheduler circuit is retrieved without triggering the execution unit circuit of the computing device to read the first address information of the first resource descriptor from the scalar register stack outside the instruction scheduler circuit to the instruction scheduler circuit; and after obtaining the first address information of the first resource descriptor, the instruction scheduler circuit transmits the current instruction using the first resource descriptor to the execution unit circuit for execution based on the first address information.
在根据本公开的实施例中,所述机器可读存储介质用于存储非暂时性机器可读指令。当所述非暂时性机器可读指令由计算机执行时,可以实现所述计算装置的操作方法。In an embodiment of the present disclosure, the machine-readable storage medium is used to store non-transitory machine-readable instructions. When the non-transitory machine-readable instructions are executed by a computer, the operating method of the computing device can be implemented.
基于上述,指令调度器电路本地配置有描述符地址寄存器。指令调度器电路可以直接取用存放在本地描述符地址寄存器的地址信息,因此可以更有效率地取得资源描述符的地址信息。基于实际设计与应用情境,在一些实施例中,所述地址信息可以包括资源描述符的完整存储器地址或是资源描述符在描述符集(descriptor set)中的偏移量(offset)。基此,指令调度器免除向执行单元发出额外的隐藏指令(hidden instruction),进而避免隐藏指令所导致执行单元流水线延迟(pipeline latency)。在接下来的指令使用的第二资源描述符的地址信息相同于目前指令使用的资源描述符的地址信息的情况下,指令调度器电路可以重复使用存放在描述符地址寄存器的地址信息。因此,指令调度器电路可以更有效率地利用被存放在指令调度器电路本地的描述符地址寄存器的地址信息。Based on the above, the instruction scheduler circuit is locally configured with a descriptor address register. The instruction scheduler circuit can directly access the address information stored in the local descriptor address register, so the address information of the resource descriptor can be obtained more efficiently. Based on the actual design and application scenario, in some embodiments, the address information may include the complete memory address of the resource descriptor or the offset of the resource descriptor in the descriptor set. Based on this, the instruction scheduler is exempted from issuing additional hidden instructions to the execution unit, thereby avoiding the pipeline latency of the execution unit caused by the hidden instructions. In the case where the address information of the second resource descriptor used by the next instruction is the same as the address information of the resource descriptor used by the current instruction, the instruction scheduler circuit can reuse the address information stored in the descriptor address register. Therefore, the instruction scheduler circuit can more efficiently utilize the address information of the descriptor address register stored locally in the instruction scheduler circuit.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
图1是依照本公开的一实施例的一种计算装置的电路方块(circuit block)示意图。Figure 1 is a schematic diagram of a circuit block of a computing device according to an embodiment of the present disclosure.
图2是依照本公开的一实施例的一种计算装置的操作方法的流程示意图。FIG. 2 is a flowchart of an operating method of a computing device according to an embodiment of the present disclosure.
附图标记说明
100:计算装置
110:指令调度器(IS)电路
111:描述符地址寄存器(DAR)
112:基址寄存器(BAR)
120:执行单元(EU)电路
130:标量寄存器堆
S210、S220、S230:步骤
Description of Reference Numerals
100: Computing Device
110: Instruction Scheduler (IS) Circuit
111: Descriptor Address Register (DAR)
112: Base Address Register (BAR)
120: Execution Unit (EU) Circuit
130: Scalar register file
S210, S220, S230: Steps
具体实施方式Detailed ways
现将详细地参考本公开的示范性实施例,示范性实施例的实例说明于附图中。只要有可能,相同组件符号在附图和描述中用来表示相同或相似部分。Reference will now be made in detail to exemplary embodiments of the present disclosure, examples of which are illustrated in the accompanying drawings. Whenever possible, the same reference numerals are used in the drawings and the description to refer to the same or like parts.
在本案说明书全文(包括权利要求)中所使用的“耦接(或连接)”一词可指任何直接或间接的连接手段。举例而言,若文中描述第一装置耦接(或连接)于第二装置,则应该被解释成该第一装置可以直接连接于该第二装置,或者该第一装置可以透过其他装置或某种连接手段而间接地连接至该第二装置。本案说明书全文(包括权利要求)中提及的“第一”、“第二”等用语是用以命名组件(element)的名称,而并非用来限制组件数量的上限或下限,亦非用来限制组件的次序。另外,凡可能之处,在附图及实施方式中使用相同标号的组件/构件/步骤代表相同或类似部分。不同实施例中使用相同标号或使用相同用语的组件/构件/步骤可以相互参照相关说明。The term "coupled (or connected)" used in the entire specification of this case (including the claims) may refer to any direct or indirect means of connection. For example, if the text describes a first device coupled (or connected) to a second device, it should be interpreted that the first device can be directly connected to the second device, or the first device can be indirectly connected to the second device through other devices or some connection means. The terms "first", "second", etc. mentioned in the entire specification of this case (including the claims) are used to name the components (element), and are not used to limit the upper or lower limit of the number of components, nor are they used to limit the order of components. In addition, wherever possible, components/components/steps with the same number in the drawings and embodiments represent the same or similar parts. Components/components/steps using the same number or the same terminology in different embodiments can refer to the relevant descriptions of each other.
图1是依照本公开的一实施例的一种计算装置100的电路方块(circuit block)示意图。图1所示计算装置100包括指令调度器(Instruction Scheduler,IS)电路110、执行单元(Execution Unit、EU)电路120以及标量寄存器堆(scalar register file,SRF)130。EU电路120耦接至IS电路110以及标量寄存器堆130。IS电路110可以发射指令给EU电路120执行。依照不同的设计需求,在一些实施例中,上述IS电路110以及(或是)EU电路120的实现方式可以是硬件(hardware)电路。在另一些实施例中,IS电路110以及(或是)EU电路120的实现方式可以是固件(firmware)、软件(software,即程序)或是前述二者的组合形式。在又一些实施例中,IS电路110以及(或是)EU电路120的实现方式可以是硬件、固件、软件中的多者的组合形式。FIG. 1 is a circuit block diagram of a computing device 100 according to an embodiment of the present disclosure. The computing device 100 shown in FIG. 1 includes an instruction scheduler (IS) circuit 110, an execution unit (EU) circuit 120, and a scalar register file (SRF) 130. The EU circuit 120 is coupled to the IS circuit 110 and the scalar register file 130. The IS circuit 110 can send instructions to the EU circuit 120 for execution. According to different design requirements, in some embodiments, the implementation of the IS circuit 110 and (or) the EU circuit 120 can be a hardware circuit. In other embodiments, the implementation of the IS circuit 110 and (or) the EU circuit 120 can be firmware, software (i.e., program) or a combination of the above two. In some other embodiments, the implementation of the IS circuit 110 and (or) the EU circuit 120 can be a combination of hardware, firmware, and software.
以硬件形式而言,上述IS电路110以及(或是)EU电路120可以实现于集成电路(integrated circuit)上的逻辑电路。举例来说,IS电路110以及(或是)EU电路120的相关功能可以被实现于一或多个控制器、微控制器 (Microcontroller)、微处理器(Microprocessor)、特殊应用集成电路(Application-specific integrated circuit,ASIC)、数字信号处理器(digital signal processor,DSP)、场可程序逻辑门阵列(Field Programmable Gate Array,FPGA)及/或其他处理单元中的各种逻辑区块、模块和电路。IS电路110以及(或是)EU电路120的相关功能可以利用硬件描述语言(hardware description languages,例如Verilog HDL或VHDL)或其他合适的编程语言来实现为硬件电路,例如集成电路中的各种逻辑区块、模块和电路。In terms of hardware, the IS circuit 110 and/or the EU circuit 120 may be implemented as a logic circuit on an integrated circuit. For example, the related functions of the IS circuit 110 and/or the EU circuit 120 may be implemented in one or more controllers, microcontrollers, or The various logic blocks, modules and circuits in a microcontroller, a microprocessor, an application-specific integrated circuit (ASIC), a digital signal processor (DSP), a field programmable gate array (FPGA) and/or other processing units. The related functions of the IS circuit 110 and/or the EU circuit 120 can be implemented as hardware circuits, such as various logic blocks, modules and circuits in an integrated circuit, using hardware description languages (such as Verilog HDL or VHDL) or other suitable programming languages.
以软件形式及/或固件形式而言,上述IS电路110以及(或是)EU电路120的相关功能可以被实现为编程码(programming codes)。例如,利用一般的编程语言(programming languages,例如C、C++或汇编语言)或其他合适的编程语言来实现IS电路110以及(或是)EU电路120。所述编程码可以被记录/存放在非临时的机器可读存储介质(non-transitory machine-readable storage medium)中。在一些实施例中,所述机器可读存储介质例如包括半导体内存以及(或是)存储装置。所述半导体内存包括记忆卡、只读存储器(Read Only Memory,ROM)、闪存(FLASH memory)、可程序设计的逻辑电路或是其他半导体内存。所述存储装置包括带(tape)、碟(disk)、硬盘(hard disk drive,HDD)、固态硬盘(Solid-state drive,SSD)或是其他存储装置。电子设备(例如中央处理器(Central Processing Unit,CPU)、控制器、微控制器或微处理器)可以从所述机器可读存储介质中读取并执行所述编程码,从而实现IS电路110以及(或是)EU电路120的相关功能。或者,所述编程码可经由任意传输媒体(例如通信网路或广播电波等)而提供给所述电子设备。所述通信网路例如是因特网(Internet)、有线通信(wired communication)网络、无线通信(wireless communication)网络或其它通信介质。In terms of software and/or firmware, the related functions of the IS circuit 110 and/or the EU circuit 120 can be implemented as programming codes. For example, the IS circuit 110 and/or the EU circuit 120 can be implemented using general programming languages (such as C, C++ or assembly language) or other suitable programming languages. The programming code can be recorded/stored in a non-transitory machine-readable storage medium. In some embodiments, the machine-readable storage medium includes, for example, a semiconductor memory and/or a storage device. The semiconductor memory includes a memory card, a read-only memory (ROM), a flash memory (FLASH memory), a programmable logic circuit or other semiconductor memory. The storage device includes a tape, a disk, a hard disk drive (HDD), a solid-state drive (SSD) or other storage devices. An electronic device (such as a central processing unit (CPU), a controller, a microcontroller or a microprocessor) can read and execute the programming code from the machine-readable storage medium to implement the relevant functions of the IS circuit 110 and (or) the EU circuit 120. Alternatively, the programming code can be provided to the electronic device via any transmission medium (such as a communication network or broadcast waves, etc.). The communication network is, for example, the Internet, a wired communication network, a wireless communication network or other communication media.
图2是依照本公开的一实施例的一种计算装置的操作方法的流程示意图。在一些实施例中,图2所示计算装置的操作方法可以实现于固件或软件(即程序)。例如,图2所示计算装置的操作方法的相关操作可以被实现为非暂时性机器可读指令(编程码或程序),而所述非暂时性机器可读指令可以被存储在机器可读存储介质。当非暂时性机器可读指令由计算机执行时可以实现图2所示计算装置的操作方法。在另一些实施例中,图2所示计算装置的操作方法可以实现于硬件,例如实现于图1所示计算装置100。 Figure 2 is a flow chart of an operating method of a computing device according to an embodiment of the present disclosure. In some embodiments, the operating method of the computing device shown in Figure 2 can be implemented in firmware or software (i.e., a program). For example, the relevant operations of the operating method of the computing device shown in Figure 2 can be implemented as non-transitory machine-readable instructions (programming code or program), and the non-transitory machine-readable instructions can be stored in a machine-readable storage medium. When the non-transitory machine-readable instructions are executed by a computer, the operating method of the computing device shown in Figure 2 can be implemented. In other embodiments, the operating method of the computing device shown in Figure 2 can be implemented in hardware, for example, in the computing device 100 shown in Figure 1.
请参照图1与图2。IS电路110包括描述符地址寄存器(descriptor address registers,DAR)111以及基址寄存器(base address registers,BAR)112。基址寄存器112可以存放描述符集的基本存储器地址(base memory address)。描述符集的基本存储器地址例如可以是64位。每一个描述符集对应于一个基址寄存器112。例如,当支持4个或8个描述符集时,分别需要4个或8个基址寄存器112。基址寄存器112的具体数量可以依照实际设计来决定。描述符集的识别号(Descriptor set identifier,Descriptor set ID)是一个嵌入在目前指令中的立即数(immediate number),通过描述符集的识别号可以对基址寄存器112进行选择,从而确定识别号所对应的描述符集的基本存储器地址。识别号的数量由目前指令确定,例如,当需要2个资源描述符时,目前指令中则会提供2个对应的识别号。描述符地址寄存器111可以存放资源描述符(resource descriptor)的地址信息。基于实际设计与应用情境,在一些实施例中,所述地址信息可以包括资源描述符的完整存储器地址,或是资源描述符在描述符集(descriptor set)中的偏移量(offset)。资源描述符的完整存储器地址为描述符集的基本存储器地址与资源描述符在描述符集中的偏移量之和。描述符地址寄存器111的具体数量可以依照实际设计来决定。举例来说,描述符地址寄存器111可以包含一个32位寄存器dar0(未绘示,或是其他位数量的寄存器),以存放一个资源描述符在描述符集内的偏移量。或者,描述符地址寄存器111可以包含两个32位寄存器dar0与dar1(未绘示,或是其他位数量的寄存器)。当所述地址信息为偏移量时,资源描述符的偏移量可以被存放在寄存器dar0。当所述地址信息为资源描述符的完整存储器地址时,资源描述符的完整存储器地址可以被存放在寄存器dar0与dar1。又或者,描述符地址寄存器111可以包含四个32位寄存器dar0、dar1、dar2与dar3(未绘示,或是其他位数量的寄存器)。当一个指令使用一个资源描述符时,资源描述符的偏移量可以被存放寄存器dar0,或是资源描述符的完整存储器地址可以被存放在寄存器dar0与dar1。当一个指令使用二个资源描述符时,其中一个资源描述符的完整存储器地址(或偏移量)可以被存放在寄存器dar0与dar1,而另一个资源描述符的完整存储器地址(或偏移量)可以被存放在寄存器dar2与dar3。Please refer to FIG. 1 and FIG. 2. The IS circuit 110 includes a descriptor address register (DAR) 111 and a base address register (BAR) 112. The base address register 112 can store the base memory address (base memory address) of the descriptor set. The base memory address of the descriptor set can be, for example, 64 bits. Each descriptor set corresponds to a base address register 112. For example, when 4 or 8 descriptor sets are supported, 4 or 8 base address registers 112 are required respectively. The specific number of the base address register 112 can be determined according to the actual design. The descriptor set identifier (Descriptor set ID) is an immediate number embedded in the current instruction. The base address register 112 can be selected by the descriptor set identifier, thereby determining the base memory address of the descriptor set corresponding to the identifier. The number of identification numbers is determined by the current instruction. For example, when two resource descriptors are required, two corresponding identification numbers are provided in the current instruction. The descriptor address register 111 can store the address information of the resource descriptor. Based on the actual design and application scenario, in some embodiments, the address information may include the complete memory address of the resource descriptor, or the offset of the resource descriptor in the descriptor set. The complete memory address of the resource descriptor is the sum of the basic memory address of the descriptor set and the offset of the resource descriptor in the descriptor set. The specific number of descriptor address registers 111 can be determined according to the actual design. For example, the descriptor address register 111 can include a 32-bit register dar0 (not shown, or a register of other bit numbers) to store the offset of a resource descriptor in the descriptor set. Alternatively, the descriptor address register 111 can include two 32-bit registers dar0 and dar1 (not shown, or a register of other bit numbers). When the address information is an offset, the offset of the resource descriptor can be stored in register dar0. When the address information is the complete memory address of the resource descriptor, the complete memory address of the resource descriptor can be stored in registers dar0 and dar1. Alternatively, the descriptor address register 111 can include four 32-bit registers dar0, dar1, dar2 and dar3 (not shown, or registers of other bit numbers). When an instruction uses one resource descriptor, the offset of the resource descriptor can be stored in register dar0, or the complete memory address of the resource descriptor can be stored in registers dar0 and dar1. When an instruction uses two resource descriptors, the complete memory address (or offset) of one resource descriptor can be stored in registers dar0 and dar1, and the complete memory address (or offset) of the other resource descriptor can be stored in registers dar2 and dar3.
举例来说(但不限于此),一个线程组(warp)可以有4个32位寄存器 dar0、dar1、dar2与dar3,而IS电路110可以直接读取描述符地址寄存器111的寄存器dar0~dar3。一般而言,一个线程群组(thread group)可以包括多个warp,而warp还可以称为线程束。加载指令(load instruction)或存储指令(store instruction)可以使用描述符地址寄存器111的寄存器dar0和dar1来存储描述符的偏移量或地址。纹理指令(texture instruction)可以使用描述符地址寄存器111的寄存器dar0和dar1来存储纹理描述符(texture descriptor)的偏移量或地址,以及使用描述符地址寄存器111的寄存器dar2和dar3来存储采样器描述符(sampler descriptor)的偏移量或地址。For example (but not limited to), a warp can have four 32-bit registers dar0, dar1, dar2 and dar3, and the IS circuit 110 can directly read the registers dar0~dar3 of the descriptor address register 111. Generally speaking, a thread group can include multiple warps, and a warp can also be called a thread bundle. A load instruction or a store instruction can use the registers dar0 and dar1 of the descriptor address register 111 to store the offset or address of the descriptor. A texture instruction can use the registers dar0 and dar1 of the descriptor address register 111 to store the offset or address of the texture descriptor, and use the registers dar2 and dar3 of the descriptor address register 111 to store the offset or address of the sampler descriptor.
在此假设,描述符地址寄存器111存放了第一资源描述符的第一地址信息(步骤S210)。在IS电路110所处理的目前指令使用所述第一资源描述符的情况下,IS电路110可以取用存放在IS电路110本地的描述符地址寄存器111的所述第一地址信息(步骤S220),而无需触发EU电路120从IS电路110外部的标量寄存器堆130读取所述第一资源描述符的所述第一地址信息给IS电路110。在取得所述第一资源描述符的所述第一地址信息后,IS电路110可以基于所述第一地址信息将使用所述第一资源描述符的目前指令发射给EU电路120执行(步骤S230)。It is assumed here that the descriptor address register 111 stores the first address information of the first resource descriptor (step S210). In the case where the current instruction processed by the IS circuit 110 uses the first resource descriptor, the IS circuit 110 can access the first address information of the descriptor address register 111 stored locally in the IS circuit 110 (step S220), without triggering the EU circuit 120 to read the first address information of the first resource descriptor from the scalar register file 130 outside the IS circuit 110 to the IS circuit 110. After obtaining the first address information of the first resource descriptor, the IS circuit 110 can send the current instruction using the first resource descriptor to the EU circuit 120 for execution based on the first address information (step S230).
在接下来的指令所使用的第二资源描述符的地址信息相同于目前指令所使用的第一资源描述符的第一地址信息的情况下,亦即当接下来的指令所使用的资源描述符相同于目前指令所使用的资源描述符时,IS电路110可以重复使用存放在描述符地址寄存器111的所述第一地址信息去访问在存储器(未绘示)内的资源,而无需触发EU电路120从标量寄存器堆130读取所述第二资源描述符的地址信息给IS电路110。在取得所述第二资源描述符的地址信息(亦即所述第一地址信息)后,IS电路110可以基于所述第一地址信息将使用所述第二资源描述符的目前指令发射给EU电路120执行(步骤S230)。In the case where the address information of the second resource descriptor used by the next instruction is the same as the first address information of the first resource descriptor used by the current instruction, that is, when the resource descriptor used by the next instruction is the same as the resource descriptor used by the current instruction, the IS circuit 110 can reuse the first address information stored in the descriptor address register 111 to access the resources in the memory (not shown) without triggering the EU circuit 120 to read the address information of the second resource descriptor from the scalar register file 130 to the IS circuit 110. After obtaining the address information of the second resource descriptor (that is, the first address information), the IS circuit 110 can transmit the current instruction using the second resource descriptor to the EU circuit 120 for execution based on the first address information (step S230).
基于实际设计与应用情境,在一些实施例中,所述地址信息可以包括资源描述符的完整存储器地址,或是资源描述符在描述符集中的偏移量。若所述地址信息包括资源描述符在描述符集内的偏移量,则IS电路110可以根据目前指令中所携带的描述符集的识别号来选择对应的基址寄存器112,然后使用存放在基址寄存器112的描述符集的基本存储器地址与存放在描述符地址寄存器111的偏移量去算出所述第一资源描述符的完整存储器地址。IS电 路110可以将所述完整存储器地址与使用第一资源描述符的目前指令发射给EU电路120执行。若所述地址信息包括第一资源描述符在存储器中的完整存储器地址,则IS电路110可以将存放在描述符地址寄存器111的完整存储器地址与使用第一资源描述符的目前指令发射给EU电路120执行。Based on the actual design and application scenario, in some embodiments, the address information may include the complete memory address of the resource descriptor, or the offset of the resource descriptor in the descriptor set. If the address information includes the offset of the resource descriptor in the descriptor set, the IS circuit 110 may select the corresponding base address register 112 according to the identification number of the descriptor set carried in the current instruction, and then use the basic memory address of the descriptor set stored in the base address register 112 and the offset stored in the descriptor address register 111 to calculate the complete memory address of the first resource descriptor. The IS circuit 110 may transmit the complete memory address and the current instruction using the first resource descriptor to the EU circuit 120 for execution. If the address information includes the complete memory address of the first resource descriptor in the memory, the IS circuit 110 may transmit the complete memory address stored in the descriptor address register 111 and the current instruction using the first resource descriptor to the EU circuit 120 for execution.
综上所述,描述符地址寄存器111可以被添加到IS电路110内。IS电路110可以直接取用存放在本地描述符地址寄存器111的地址信息,因此可以更有效率地取得资源描述符的地址信息(例如资源描述符的完整存储器地址或是资源描述符在描述符集内的偏移量)。基此,IS电路110免除向EU电路120发出额外的隐藏指令(hidden instruction),进而避免隐藏指令所导致EU电路120流水线延迟(pipeline latency),减小了硬件的负担。在接下来的指令所使用的第二资源描述符的地址信息相同于目前指令所使用的第一资源描述符的第一地址信息的情况下,IS电路110可以重复使用存放在描述符地址寄存器111的地址信息。因此,IS电路110可以更有效率地利用被存放在IS电路110本地的描述符地址寄存器111的地址信息。In summary, the descriptor address register 111 can be added to the IS circuit 110. The IS circuit 110 can directly access the address information stored in the local descriptor address register 111, so that the address information of the resource descriptor (such as the complete memory address of the resource descriptor or the offset of the resource descriptor in the descriptor set) can be obtained more efficiently. Based on this, the IS circuit 110 is exempted from issuing additional hidden instructions to the EU circuit 120, thereby avoiding the pipeline latency of the EU circuit 120 caused by the hidden instructions, and reducing the burden on the hardware. In the case where the address information of the second resource descriptor used by the next instruction is the same as the first address information of the first resource descriptor used by the current instruction, the IS circuit 110 can reuse the address information stored in the descriptor address register 111. Therefore, the IS circuit 110 can more efficiently utilize the address information of the descriptor address register 111 stored locally in the IS circuit 110.
作为对比,先前技术的指令调度器必须针对每一个无绑定资源描述符(bindless resource descriptor)向执行单元发出额外的隐藏指令,以触发执行单元读取在标量寄存器堆内的值(资源描述符的地址/偏移量)。问题是,先前技术的指令调度器发出的隐藏指令会导致执行单元流水线延迟。此外,隐藏指令是指令调度器动态生成的额外指令,而不是指令集(instruction set)的指令。隐藏指令不在程序原始码中,因此无法在程序原始码的编译(compilation)过程中对隐藏指令进行优化。再者,先前技术对每个描述符都强制发出/执行隐藏指令,从而忽视了在指令调度器内的描述符的重复利用,并对硬件造成负担。In contrast, the instruction scheduler of the prior art must issue an additional hidden instruction to the execution unit for each bindless resource descriptor to trigger the execution unit to read the value (address/offset of the resource descriptor) in the scalar register file. The problem is that the hidden instructions issued by the instruction scheduler of the prior art will cause pipeline delays in the execution unit. In addition, the hidden instructions are additional instructions dynamically generated by the instruction scheduler, rather than instructions in the instruction set. The hidden instructions are not in the program source code, so the hidden instructions cannot be optimized during the compilation of the program source code. Furthermore, the prior art forces the issuance/execution of hidden instructions for each descriptor, thereby ignoring the reuse of descriptors in the instruction scheduler and placing a burden on the hardware.
请参照图1所示本公开的实施例。本实施例可以在指令集添加新搬移指令SMOVD,以将值(资源描述符的完整存储器地址或资源描述符在描述符集内的偏移量)从标量寄存器堆130移动到描述符地址寄存器111。举例来说,依照实际操作情境,搬移指令SMOVD可以是下述伪代码(pseudo code)[1]。在下述伪代码[1]中,dest表示描述符地址寄存器111,而source可以表示标量寄存器堆130、常数随机存取内存(Random Access Memory,RAM)、常数寄存器或立即数(immediate)。程序(Programs)或着色器(shaders)可 以使用搬移指令SMOVD指令更新描述符地址寄存器111。Please refer to the embodiment of the present disclosure shown in Figure 1. In this embodiment, a new move instruction SMOVD can be added to the instruction set to move a value (the complete memory address of the resource descriptor or the offset of the resource descriptor in the descriptor set) from the scalar register file 130 to the descriptor address register 111. For example, according to the actual operation scenario, the move instruction SMOVD can be the following pseudo code [1]. In the following pseudo code [1], dest represents the descriptor address register 111, and source can represent the scalar register file 130, a constant random access memory (RAM), a constant register or an immediate value. Programs or shaders can The descriptor address register 111 is updated using the move instruction SMOVD.
smovd dest,source;         伪代码[1]smovd dest,source;         Pseudocode[1]
搬移指令SMOVD是指令集的指令。搬移指令SMOVD在程序原始码中,因此在程序原始码的编译过程中搬移指令SMOVD可以被优化。在接下来的指令所使用的第二资源描述符的第二地址信息不同于目前指令所使用的第一资源描述符的第一地址信息的情况下,亦即当接下来的指令所使用的资源描述符不是目前指令所使用的资源描述符时,IS电路110可以将指令集的搬移指令SMOVD发射给EU电路120。基于搬移指令SMOVD,EU电路120可以将所述第二资源描述符的第二地址信息从在IS电路110外部的标量寄存器堆130搬移至在IS电路110本地的描述符地址寄存器111。因此,在IS电路110处理使用第二资源描述符的所述接下来的指令时,IS电路110可以取用存放在描述符地址寄存器111的第二地址信息。在取得第二资源描述符的所述第二地址信息后,IS电路110可以基于所述第二地址信息将使用第二资源描述符的所述接下来的指令发射给EU电路120执行。The move instruction SMOVD is an instruction of the instruction set. The move instruction SMOVD is in the program source code, so the move instruction SMOVD can be optimized during the compilation of the program source code. When the second address information of the second resource descriptor used by the next instruction is different from the first address information of the first resource descriptor used by the current instruction, that is, when the resource descriptor used by the next instruction is not the resource descriptor used by the current instruction, the IS circuit 110 can transmit the move instruction SMOVD of the instruction set to the EU circuit 120. Based on the move instruction SMOVD, the EU circuit 120 can move the second address information of the second resource descriptor from the scalar register stack 130 outside the IS circuit 110 to the descriptor address register 111 local to the IS circuit 110. Therefore, when the IS circuit 110 processes the next instruction using the second resource descriptor, the IS circuit 110 can access the second address information stored in the descriptor address register 111. After obtaining the second address information of the second resource descriptor, the IS circuit 110 may transmit the next instruction using the second resource descriptor to the EU circuit 120 for execution based on the second address information.
作为对比,先前技术的指令调度器针对每一个无绑定资源描述符向执行单元发出额外的隐藏指令,以触发执行单元读取在标量寄存器堆内的值(资源描述符的地址/偏移量)。问题是,隐藏指令是指令调度器动态生成的额外指令,而不是指令集的指令。隐藏指令不在程序原始码中,因此隐藏指令无法在程序原始码的编译过程中被优化。In contrast, the instruction scheduler of the prior art issues an additional hidden instruction to the execution unit for each unbound resource descriptor to trigger the execution unit to read the value (address/offset of the resource descriptor) in the scalar register file. The problem is that the hidden instruction is an additional instruction dynamically generated by the instruction scheduler, not an instruction of the instruction set. The hidden instruction is not in the program source code, so the hidden instruction cannot be optimized during the compilation process of the program source code.
请参照图1所示本公开的实施例。在一些实际操作情境中,标量操作(scalar operation)所生成的资源描述符的地址信息可以直接存放在IS电路110本地的描述符地址寄存器111。在另一些实际操作情境中,标量操作所生成的资源描述符的地址信息可以被存放在IS电路110外部的标量寄存器堆130,以待IS电路110使用。Please refer to the embodiment of the present disclosure shown in FIG. 1 . In some actual operation scenarios, the address information of the resource descriptor generated by the scalar operation can be directly stored in the descriptor address register 111 local to the IS circuit 110. In other actual operation scenarios, the address information of the resource descriptor generated by the scalar operation can be stored in a scalar register stack 130 outside the IS circuit 110 for use by the IS circuit 110.
举例来说,在IS电路110处理目前指令之前,IS电路110处理标量操作指令而生成第一资源描述符的第一地址信息。此时,当IS电路110需重复使用该第一地址信息时,IS电路110可以直接将所述第一地址信息存放在IS电路110本地的描述符地址寄存器111,而不将所述第一地址信息通过EU电 路120存放在IS电路110外部的标量寄存器堆130。在IS电路110处理使用所述第一资源描述符的目前指令时,IS电路110可以取用存放在描述符地址寄存器111的所述第一地址信息。因此,IS电路110可以基于所述第一地址信息将使用第一资源描述符的目前指令发射给EU电路120执行。当IS电路110无需重复使用该第一地址信息时,IS电路110可以通过EU电路120将所述第一地址信息存放在IS电路110外部的标量寄存器堆130。For example, before the IS circuit 110 processes the current instruction, the IS circuit 110 processes the scalar operation instruction and generates the first address information of the first resource descriptor. At this time, when the IS circuit 110 needs to reuse the first address information, the IS circuit 110 can directly store the first address information in the local descriptor address register 111 of the IS circuit 110, without transmitting the first address information through the EU circuit. The first resource descriptor is stored in the scalar register file 130 outside the IS circuit 110 by the EU circuit 120. When the IS circuit 110 processes the current instruction using the first resource descriptor, the IS circuit 110 can access the first address information stored in the descriptor address register 111. Therefore, the IS circuit 110 can issue the current instruction using the first resource descriptor to the EU circuit 120 for execution based on the first address information. When the IS circuit 110 does not need to reuse the first address information, the IS circuit 110 can store the first address information in the scalar register file 130 outside the IS circuit 110 through the EU circuit 120.
又举例来说,在接下来的指令所使用的第二资源描述符的第二地址信息不同于目前指令所使用的第一资源描述符的第一地址信息的情况下,亦即当接下来的指令所使用的资源描述符不是目前指令所使用的资源描述符时,IS电路110处理标量操作指令而生成第二资源描述符的第二地址信息。此时,IS电路110可以直接将所述第二地址信息存放在IS电路110本地的描述符地址寄存器111,而不将所述第二地址信息存放在IS电路110外部的标量寄存器堆130(例如,当IS电路110无需重复使用该第二地址信息时)。因此,在IS电路110处理使用第二资源描述符的所述接下来的指令时,IS电路110可以取用存放在描述符地址寄存器111的第二地址信息。在取得第二资源描述符的所述第二地址信息后,IS电路110可以基于所述第二地址信息将使用第二资源描述符的所述接下来的指令发射给EU电路120执行。For another example, when the second address information of the second resource descriptor used by the next instruction is different from the first address information of the first resource descriptor used by the current instruction, that is, when the resource descriptor used by the next instruction is not the resource descriptor used by the current instruction, the IS circuit 110 processes the scalar operation instruction and generates the second address information of the second resource descriptor. At this time, the IS circuit 110 can directly store the second address information in the descriptor address register 111 local to the IS circuit 110, instead of storing the second address information in the scalar register stack 130 outside the IS circuit 110 (for example, when the IS circuit 110 does not need to reuse the second address information). Therefore, when the IS circuit 110 processes the next instruction using the second resource descriptor, the IS circuit 110 can access the second address information stored in the descriptor address register 111. After obtaining the second address information of the second resource descriptor, the IS circuit 110 can transmit the next instruction using the second resource descriptor to the EU circuit 120 for execution based on the second address information.
再举例来说,在IS电路110所处理的标量操作指令生成第一资源描述符的第一地址信息后,IS电路110可以将所述第一地址信息存放在IS电路110外部的标量寄存器堆130。在IS电路110处理使用所述第一资源描述符的目前指令之前,IS电路110可以将指令集的搬移指令SMOVD发射给EU电路120,以将所述第一资源描述符的所述第一地址信息从标量寄存器堆130搬移至IS电路110的描述符地址寄存器111。在IS电路110处理使用所述第一资源描述符的目前指令时,IS电路110可以取用存放在描述符地址寄存器111的所述第一地址信息。因此,IS电路110可以基于所述第一地址信息将使用第一资源描述符的目前指令发射给EU电路120执行。For another example, after the scalar operation instruction processed by the IS circuit 110 generates the first address information of the first resource descriptor, the IS circuit 110 may store the first address information in the scalar register file 130 outside the IS circuit 110. Before the IS circuit 110 processes the current instruction using the first resource descriptor, the IS circuit 110 may send a move instruction SMOVD of the instruction set to the EU circuit 120 to move the first address information of the first resource descriptor from the scalar register file 130 to the descriptor address register 111 of the IS circuit 110. When the IS circuit 110 processes the current instruction using the first resource descriptor, the IS circuit 110 may access the first address information stored in the descriptor address register 111. Therefore, the IS circuit 110 may send the current instruction using the first resource descriptor to the EU circuit 120 for execution based on the first address information.
基于实际设计,在一些实施例中,描述符地址寄存器111可以被加入标量寄存器堆130的地址空间。程序或着色器可以使用标量寄存器堆130的地址空间中的目标地址更新描述符地址寄存器111。例如,所述地址空间的第一区间可以对应至在IS电路110外部的标量寄存器堆130,而所述地址空间的 第二区间可以对应至在IS电路110本地的描述符地址寄存器111。在IS电路110所处理的标量操作指令生成第一资源描述符的第一地址信息后,IS电路110可以使用所述地址空间以将所述第一地址信息选择性地存放在标量寄存器堆130与描述符地址寄存器111其中一者。作为将描述符地址寄存器111添加到标量寄存器堆130地址空间的具体示例,在此假设标量寄存器堆130的地址空间为7位地址空间,其中地址0~63(地址空间的第一区间)的每一个地址可以被用来表示在标量寄存器堆130中的一个常规的标量寄存器,而地址64-67(地址空间的第二区间)的每一个地址可以被用来代表在描述符地址寄存器111的寄存器dar0~dar3中的一个。IS电路110可以使用标量寄存器堆130的地址空间指定将标量操作的结果写入描述符地址寄存器111。Based on actual design, in some embodiments, the descriptor address register 111 may be added to the address space of the scalar register file 130. The program or shader may update the descriptor address register 111 using the target address in the address space of the scalar register file 130. For example, the first interval of the address space may correspond to the scalar register file 130 outside the IS circuit 110, and the second interval of the address space may correspond to the scalar register file 130 outside the IS circuit 110. The second interval may correspond to the descriptor address register 111 local to the IS circuit 110. After the scalar operation instruction processed by the IS circuit 110 generates the first address information of the first resource descriptor, the IS circuit 110 may use the address space to selectively store the first address information in one of the scalar register file 130 and the descriptor address register 111. As a specific example of adding the descriptor address register 111 to the address space of the scalar register file 130, it is assumed that the address space of the scalar register file 130 is a 7-bit address space, wherein each address of addresses 0 to 63 (the first interval of the address space) may be used to represent a conventional scalar register in the scalar register file 130, and each address of addresses 64 to 67 (the second interval of the address space) may be used to represent one of the registers dar0 to dar3 in the descriptor address register 111. The IS circuit 110 may use the address space of the scalar register file 130 to specify that the result of the scalar operation is written to the descriptor address register 111.
综上所述,IS电路110可以直接访问在IS电路110本地的描述符地址寄存器111,因此IS电路110可以免除导致EU流水线延迟的隐藏指令。因为IS不需要准备和发出隐藏指令,所以IS电路110的复杂性降低。在接下来的指令使用的第二资源描述符的地址信息相同于目前指令使用的资源描述符的地址信息的情况下,IS电路110可以重复使用存放在描述符地址寄存器111的地址信息,亦即重复使用具有相同偏移量(或地址)的资源描述符(这在先前技术中是不可能的)。因此当一系列指令使用相同的资源描述符时,IS电路110可以获得更好的性能。In summary, the IS circuit 110 can directly access the descriptor address register 111 local to the IS circuit 110, so the IS circuit 110 can avoid hidden instructions that cause EU pipeline delays. Because the IS does not need to prepare and issue hidden instructions, the complexity of the IS circuit 110 is reduced. In the case where the address information of the second resource descriptor used by the next instruction is the same as the address information of the resource descriptor used by the current instruction, the IS circuit 110 can reuse the address information stored in the descriptor address register 111, that is, reuse the resource descriptor with the same offset (or address) (this is impossible in the prior art). Therefore, when a series of instructions use the same resource descriptor, the IS circuit 110 can achieve better performance.
最后应说明的是:以上各实施例仅用以说明本公开的实施例,而非对其限制;尽管参照前述各实施例对本公开进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例进行修改,或者对其中部分或者全部技术特征进行等同替换;而这些修改或者替换,并不使相应实施例的本质脱离本公开各实施例的范围。 Finally, it should be noted that the above embodiments are only used to illustrate the embodiments of the present disclosure, rather than to limit them. Although the present disclosure has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that they can still modify the aforementioned embodiments, or replace some or all of the technical features therein with equivalents. However, these modifications or replacements do not deviate the essence of the corresponding embodiments from the scope of the embodiments of the present disclosure.

Claims (17)

  1. 一种计算装置,包括:A computing device, comprising:
    执行单元电路;以及execution unit circuit; and
    指令调度器电路,耦接至所述执行单元电路,被配置为发射指令给所述执行单元电路执行,其中所述指令调度器电路包括描述符地址寄存器,所述描述符地址寄存器被配置为存放第一资源描述符的第一地址信息,An instruction scheduler circuit is coupled to the execution unit circuit and is configured to issue instructions to the execution unit circuit for execution, wherein the instruction scheduler circuit includes a descriptor address register, and the descriptor address register is configured to store first address information of a first resource descriptor.
    在所述指令调度器电路所处理的目前指令使用所述第一资源描述符的情况下,所述指令调度器电路取用存放在所述指令调度器电路本地的所述描述符地址寄存器的所述第一地址信息,而无需触发所述执行单元电路从所述指令调度器电路外部的标量寄存器堆读取所述第一资源描述符的所述第一地址信息给所述指令调度器电路,以及In a case where the current instruction processed by the instruction scheduler circuit uses the first resource descriptor, the instruction scheduler circuit uses the first address information of the descriptor address register stored locally in the instruction scheduler circuit without triggering the execution unit circuit to read the first address information of the first resource descriptor from a scalar register file external to the instruction scheduler circuit to the instruction scheduler circuit, and
    在取得所述第一资源描述符的所述第一地址信息后,所述指令调度器电路基于所述第一地址信息将使用所述第一资源描述符的所述目前指令发射给所述执行单元电路执行。After obtaining the first address information of the first resource descriptor, the instruction scheduler circuit issues the current instruction using the first resource descriptor to the execution unit circuit for execution based on the first address information.
  2. 根据权利要求1所述的计算装置,其中,在接下来的指令使用的第二资源描述符的地址信息相同于所述目前指令使用的所述第一资源描述符的所述第一地址信息的情况下,所述指令调度器电路重复使用存放在所述描述符地址寄存器的所述第一地址信息,而无需触发所述执行单元电路从所述标量寄存器堆读取所述第一地址信息给所述指令调度器电路。The computing device according to claim 1, wherein, when the address information of the second resource descriptor used by the next instruction is the same as the first address information of the first resource descriptor used by the current instruction, the instruction scheduler circuit reuses the first address information stored in the descriptor address register without triggering the execution unit circuit to read the first address information from the scalar register file to the instruction scheduler circuit.
  3. 根据权利要求1或2所述的计算装置,其中,The computing device according to claim 1 or 2, wherein:
    在接下来的指令使用的第二资源描述符的第二地址信息不是所述目前指令使用的所述第一资源描述符的所述第一地址信息的情况下:所述指令调度器电路将指令集的搬移指令发射给所述执行单元电路,以将所述第二资源描述符的所述第二地址信息从在所述指令调度器电路外部的所述标量寄存器堆搬移至在所述指令调度器电路本地的所述描述符地址寄存器,或者所述指令调度器电路处理标量操作指令而生成所述第二资源描述符的所述第二地址信息,以及所述指令调度器电路直接将所述第二地址信息存放在所述指令调度器电路本地的所述描述符地址寄存器,而不将所述第二地址信息存放在所述指令调度器电路外部的所述标量寄存器堆; In a case where the second address information of the second resource descriptor used by the next instruction is not the first address information of the first resource descriptor used by the current instruction: the instruction scheduler circuit transmits a move instruction of an instruction set to the execution unit circuit to move the second address information of the second resource descriptor from the scalar register file outside the instruction scheduler circuit to the descriptor address register local to the instruction scheduler circuit, or the instruction scheduler circuit processes a scalar operation instruction to generate the second address information of the second resource descriptor, and the instruction scheduler circuit directly stores the second address information in the descriptor address register local to the instruction scheduler circuit, instead of storing the second address information in the scalar register file outside the instruction scheduler circuit;
    在所述指令调度器电路处理使用所述第二资源描述符的所述接下来的指令时,所述指令调度器电路取用存放在所述描述符地址寄存器的所述第二地址信息;以及When the instruction scheduler circuit processes the next instruction using the second resource descriptor, the instruction scheduler circuit retrieves the second address information stored in the descriptor address register; and
    在取得所述第二资源描述符的所述第二地址信息后,所述指令调度器电路基于所述第二地址信息将使用所述第二资源描述符的所述接下来的指令发射给所述执行单元电路执行。After obtaining the second address information of the second resource descriptor, the instruction scheduler circuit issues the next instruction using the second resource descriptor to the execution unit circuit for execution based on the second address information.
  4. 根据权利要求1-3中任一项所述的计算装置,其中,在所述指令调度器电路处理所述目前指令之前,所述指令调度器电路处理标量操作指令而生成所述第一资源描述符的所述第一地址信息,以及所述指令调度器电路直接将所述第一地址信息存放在所述指令调度器电路本地的所述描述符地址寄存器,而不将所述第一地址信息存放在所述指令调度器电路外部的所述标量寄存器堆。A computing device according to any one of claims 1-3, wherein, before the instruction scheduler circuit processes the current instruction, the instruction scheduler circuit processes a scalar operation instruction to generate the first address information of the first resource descriptor, and the instruction scheduler circuit directly stores the first address information in the descriptor address register local to the instruction scheduler circuit, instead of storing the first address information in the scalar register file outside the instruction scheduler circuit.
  5. 根据权利要求1-3中任一项所述的计算装置,其中,The computing device according to any one of claims 1 to 3, wherein:
    在所述指令调度器电路所处理的标量操作指令生成所述第一资源描述符的所述第一地址信息后,所述指令调度器电路将所述第一地址信息存放在所述指令调度器电路外部的所述标量寄存器堆;以及After the scalar operation instruction processed by the instruction scheduler circuit generates the first address information of the first resource descriptor, the instruction scheduler circuit stores the first address information in the scalar register file outside the instruction scheduler circuit; and
    在所述指令调度器电路处理所述目前指令之前,所述指令调度器电路将指令集的搬移指令发射给所述执行单元电路,以将所述第一资源描述符的所述第一地址信息从所述标量寄存器堆搬移至所述指令调度器电路的所述描述符地址寄存器。Before the instruction scheduler circuit processes the current instruction, the instruction scheduler circuit issues a move instruction of an instruction set to the execution unit circuit to move the first address information of the first resource descriptor from the scalar register file to the descriptor address register of the instruction scheduler circuit.
  6. 根据权利要求1-3中任一项所述的计算装置,其中,所述描述符地址寄存器被加入所述标量寄存器堆的地址空间,所述地址空间的第一区间对应至在所述指令调度器电路外部的所述标量寄存器堆,所述地址空间的第二区间对应至在所述指令调度器电路本地的所述描述符地址寄存器,以及在所述指令调度器电路所处理的标量操作指令生成所述第一资源描述符的所述第一地址信息后,所述指令调度器电路使用所述地址空间以将所述第一地址信息选择性地存放在所述标量寄存器堆与所述描述符地址寄存器其中一者。A computing device according to any one of claims 1-3, wherein the descriptor address register is added to the address space of the scalar register stack, a first interval of the address space corresponds to the scalar register stack outside the instruction scheduler circuit, a second interval of the address space corresponds to the descriptor address register local to the instruction scheduler circuit, and after the scalar operation instruction processed by the instruction scheduler circuit generates the first address information of the first resource descriptor, the instruction scheduler circuit uses the address space to selectively store the first address information in one of the scalar register stack and the descriptor address register.
  7. 根据权利要求1-6中任一项所述的计算装置,其中,所述第一地址信息包括所述第一资源描述符在描述符集内的偏移量,所述指令调度器电路还包括: The computing device according to any one of claims 1 to 6, wherein the first address information comprises an offset of the first resource descriptor within a descriptor set, and the instruction scheduler circuit further comprises:
    基址寄存器,被配置为存放所述描述符集的基本存储器地址,a base address register configured to store a base memory address of the descriptor set,
    其中所述指令调度器电路使用存放在所述基址寄存器的所述基本存储器地址与存放在所述描述符地址寄存器的所述偏移量去算出所述第一资源描述符的完整存储器地址,以及所述指令调度器电路将所述完整存储器地址与使用所述第一资源描述符的所述目前指令发射给所述执行单元电路执行。The instruction scheduler circuit uses the basic memory address stored in the base address register and the offset stored in the descriptor address register to calculate the complete memory address of the first resource descriptor, and the instruction scheduler circuit transmits the complete memory address and the current instruction using the first resource descriptor to the execution unit circuit for execution.
  8. 根据权利要求1-6中任一项所述的计算装置,其中,所述第一地址信息包括所述第一资源描述符在存储器中的完整存储器地址,以及所述指令调度器电路将存放在所述描述符地址寄存器的所述完整存储器地址与使用所述第一资源描述符的所述目前指令发射给所述执行单元电路执行。A computing device according to any one of claims 1-6, wherein the first address information includes a complete memory address of the first resource descriptor in a memory, and the instruction scheduler circuit transmits the complete memory address stored in the descriptor address register and the current instruction using the first resource descriptor to the execution unit circuit for execution.
  9. 一种计算装置的操作方法,包括:A method for operating a computing device, comprising:
    由所述计算装置的指令调度器电路的描述符地址寄存器存放第一资源描述符的第一地址信息;The descriptor address register of the instruction scheduler circuit of the computing device stores first address information of the first resource descriptor;
    在所述指令调度器电路所处理的目前指令使用所述第一资源描述符的情况下,取用存放在所述指令调度器电路本地的所述描述符地址寄存器的所述第一地址信息,而无需触发所述计算装置的执行单元电路从所述指令调度器电路外部的标量寄存器堆读取所述第一资源描述符的所述第一地址信息给所述指令调度器电路;以及When the current instruction processed by the instruction scheduler circuit uses the first resource descriptor, fetch the first address information of the descriptor address register stored locally in the instruction scheduler circuit without triggering the execution unit circuit of the computing device to read the first address information of the first resource descriptor from a scalar register file external to the instruction scheduler circuit to the instruction scheduler circuit; and
    在取得所述第一资源描述符的所述第一地址信息后,由所述指令调度器电路基于所述第一地址信息将使用所述第一资源描述符的所述目前指令发射给所述执行单元电路执行。After obtaining the first address information of the first resource descriptor, the instruction scheduler circuit issues the current instruction using the first resource descriptor to the execution unit circuit for execution based on the first address information.
  10. 根据权利要求9所述的操作方法,还包括:The operating method according to claim 9, further comprising:
    在接下来的指令使用的第二资源描述符的地址信息相同于所述目前指令使用的所述第一资源描述符的所述第一地址信息的情况下,重复使用存放在所述描述符地址寄存器的所述第一地址信息,而无需触发所述执行单元电路从所述标量寄存器堆读取所述第一地址信息给所述指令调度器电路。When the address information of the second resource descriptor used by the next instruction is the same as the first address information of the first resource descriptor used by the current instruction, the first address information stored in the descriptor address register is reused without triggering the execution unit circuit to read the first address information from the scalar register file to the instruction scheduler circuit.
  11. 根据权利要求9或10所述的操作方法,还包括:The operating method according to claim 9 or 10, further comprising:
    在接下来的指令使用的第二资源描述符的第二地址信息不是所述目前指令使用的所述第一资源描述符的所述第一地址信息的情况下:由所述指令调度器电路将指令集的搬移指令发射给所述执行单元电路,以将所述第二资源描述符的所述第二地址信息从在所述指令调度器电路外部的所述标量寄存器 堆搬移至在所述指令调度器电路本地的所述描述符地址寄存器,或者由所述指令调度器电路处理标量操作指令而生成所述第二资源描述符的所述第二地址信息,以及所述指令调度器电路直接将所述第二地址信息存放在所述指令调度器电路本地的所述描述符地址寄存器,而不将所述第二地址信息存放在所述指令调度器电路外部的所述标量寄存器堆;In a case where the second address information of the second resource descriptor used by the next instruction is not the first address information of the first resource descriptor used by the current instruction: the instruction scheduler circuit issues a move instruction of the instruction set to the execution unit circuit to move the second address information of the second resource descriptor from the scalar register outside the instruction scheduler circuit to the scalar register outside the instruction scheduler circuit. The instruction scheduler circuit is configured to move the second address information of the second resource descriptor to the descriptor address register local to the instruction scheduler circuit, or the instruction scheduler circuit processes a scalar operation instruction to generate the second address information of the second resource descriptor, and the instruction scheduler circuit directly stores the second address information in the descriptor address register local to the instruction scheduler circuit, instead of storing the second address information in the scalar register stack outside the instruction scheduler circuit;
    在所述指令调度器电路处理使用所述第二资源描述符的所述接下来的指令时,取用存放在所述描述符地址寄存器的所述第二地址信息;以及When the instruction scheduler circuit processes the next instruction using the second resource descriptor, accessing the second address information stored in the descriptor address register; and
    在取得所述第二资源描述符的所述第二地址信息后,由所述指令调度器电路基于所述第二地址信息将使用所述第二资源描述符的所述接下来的指令发射给所述执行单元电路执行。After obtaining the second address information of the second resource descriptor, the instruction scheduler circuit issues the next instruction using the second resource descriptor to the execution unit circuit for execution based on the second address information.
  12. 根据权利要求9-11中任一项所述的操作方法,还包括:The operating method according to any one of claims 9 to 11, further comprising:
    在所述指令调度器电路处理所述目前指令之前,由所述指令调度器电路处理标量操作指令而生成所述第一资源描述符的所述第一地址信息;以及Before the instruction scheduler circuit processes the current instruction, the instruction scheduler circuit processes a scalar operation instruction to generate the first address information of the first resource descriptor; and
    直接将所述第一地址信息存放在所述指令调度器电路本地的所述描述符地址寄存器,而不将所述第一地址信息存放在所述指令调度器电路外部的所述标量寄存器堆。The first address information is directly stored in the descriptor address register local to the instruction scheduler circuit, instead of being stored in the scalar register file outside the instruction scheduler circuit.
  13. 根据权利要求9-11中任一项所述的操作方法,还包括:The operating method according to any one of claims 9 to 11, further comprising:
    在所述指令调度器电路所处理的标量操作指令生成所述第一资源描述符的所述第一地址信息后,将所述第一地址信息存放在所述指令调度器电路外部的所述标量寄存器堆;以及After the scalar operation instruction processed by the instruction scheduler circuit generates the first address information of the first resource descriptor, storing the first address information in the scalar register file outside the instruction scheduler circuit; and
    在所述指令调度器电路处理所述目前指令之前,由所述指令调度器电路将指令集的搬移指令发射给所述执行单元电路,以将所述第一资源描述符的所述第一地址信息从所述标量寄存器堆搬移至所述指令调度器电路的所述描述符地址寄存器。Before the instruction scheduler circuit processes the current instruction, the instruction scheduler circuit issues a move instruction of an instruction set to the execution unit circuit to move the first address information of the first resource descriptor from the scalar register file to the descriptor address register of the instruction scheduler circuit.
  14. 根据权利要求9-11中任一项所述的操作方法,其中,所述描述符地址寄存器被加入所述标量寄存器堆的地址空间,所述地址空间的第一区间对应至在所述指令调度器电路外部的所述标量寄存器堆,所述地址空间的第二区间对应至在所述指令调度器电路本地的所述描述符地址寄存器,以及所述操作方法还包括:The operating method according to any one of claims 9 to 11, wherein the descriptor address register is added to the address space of the scalar register file, a first interval of the address space corresponds to the scalar register file outside the instruction scheduler circuit, a second interval of the address space corresponds to the descriptor address register local to the instruction scheduler circuit, and the operating method further comprises:
    在所述指令调度器电路所处理的标量操作指令生成所述第一资源描述符 的所述第一地址信息后,使用所述地址空间以将所述第一地址信息选择性地存放在所述标量寄存器堆与所述描述符地址寄存器其中一者。The scalar operation instruction processed by the instruction scheduler circuit generates the first resource descriptor After receiving the first address information, the address space is used to selectively store the first address information in one of the scalar register file and the descriptor address register.
  15. 根据权利要求9-14中任一项所述的操作方法,其中,所述第一地址信息包括所述第一资源描述符在描述符集内的偏移量,以及所述操作方法还包括:The operating method according to any one of claims 9 to 14, wherein the first address information comprises an offset of the first resource descriptor within a descriptor set, and the operating method further comprises:
    由所述指令调度器电路的基址寄存器存放所述描述符集的基本存储器地址;The base address register of the instruction scheduler circuit stores the base memory address of the descriptor set;
    使用存放在所述基址寄存器的所述基本存储器地址与存放在所述描述符地址寄存器的所述偏移量去算出所述第一资源描述符的完整存储器地址;以及Using the base memory address stored in the base address register and the offset stored in the descriptor address register to calculate a complete memory address of the first resource descriptor; and
    由所述指令调度器电路将所述完整存储器地址与使用所述第一资源描述符的所述目前指令发射给所述执行单元电路执行。The instruction scheduler circuit issues the complete memory address and the current instruction using the first resource descriptor to the execution unit circuit for execution.
  16. 根据权利要求9-14中任一项所述的操作方法,其中,所述第一地址信息包括所述第一资源描述符在存储器中的完整存储器地址,以及所述操作方法还包括:The operating method according to any one of claims 9 to 14, wherein the first address information comprises a complete memory address of the first resource descriptor in a memory, and the operating method further comprises:
    由所述指令调度器电路将存放在所述描述符地址寄存器的所述完整存储器地址与使用所述第一资源描述符的所述目前指令发射给所述执行单元电路执行。The instruction scheduler circuit issues the complete memory address stored in the descriptor address register and the current instruction using the first resource descriptor to the execution unit circuit for execution.
  17. 一种机器可读存储介质,用于存储非暂时性机器可读指令,当所述非暂时性机器可读指令由计算机执行时可以实现权利要求9-16任一项所述的计算装置的操作方法。 A machine-readable storage medium for storing non-transitory machine-readable instructions, which, when executed by a computer, can implement the operating method of the computing device described in any one of claims 9 to 16.
PCT/CN2023/084091 2022-11-24 2023-03-27 Computing device, operating method, and machine-readable storage medium WO2024108836A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211486231.9 2022-11-24
CN202211486231.9A CN115756612A (en) 2022-11-24 2022-11-24 Computing device, operating method and machine-readable storage medium

Publications (1)

Publication Number Publication Date
WO2024108836A1 true WO2024108836A1 (en) 2024-05-30

Family

ID=85337586

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/084091 WO2024108836A1 (en) 2022-11-24 2023-03-27 Computing device, operating method, and machine-readable storage medium

Country Status (2)

Country Link
CN (1) CN115756612A (en)
WO (1) WO2024108836A1 (en)

Also Published As

Publication number Publication date
CN115756612A (en) 2023-03-07

Similar Documents

Publication Publication Date Title
US11288069B2 (en) Systems, methods, and apparatuses for tile store
JP5128122B2 (en) Address space emulation
JP4813485B2 (en) Processing device with burst read / write operation
KR101851439B1 (en) Systems, apparatuses, and methods for performing conflict detection and broadcasting contents of a register to data element positions of another register
JP6633119B2 (en) Autonomous memory method and system
CN111208933B (en) Method, device, equipment and storage medium for data access
CN108319559B (en) Data processing apparatus and method for controlling vector memory access
JP2021163449A (en) Microprocessor having self-resetting register scoreboard
JP4202244B2 (en) VLIW DSP and method of operating the same
US20080209087A1 (en) Method and Apparatus for Transferring Data from a Memory Subsystem to a Network Adapter by Extending Data Lengths to Improve the Memory Subsystem and PCI Bus Efficiency
US11726787B2 (en) Reusing fetched, flushed instructions after an instruction pipeline flush in response to a hazard in a processor to reduce instruction re-fetching
US20060095726A1 (en) Independent hardware based code locator
TWI722009B (en) Hardware mechanism for performing atomic actions on remote processors
WO2024108836A1 (en) Computing device, operating method, and machine-readable storage medium
US7003651B2 (en) Program counter (PC) relative addressing mode with fast displacement
US20130339667A1 (en) Special case register update without execution
EP4152152A1 (en) Interrupt handling by migrating interrupts between processing cores
WO2022140043A1 (en) Condensed command packet for high throughput and low overhead kernel launch
US20220012156A1 (en) Methods and systems for hardware-based statistics management using a general purpose memory
US8856498B2 (en) Prefetch request circuit
US6502177B1 (en) Single cycle linear address calculation for relative branch addressing
US6820191B2 (en) Apparatus and method for executing an instruction with a register bit mask for transferring data between a plurality of registers and memory inside a processor
CN114896179B (en) Memory page copying method and device, computing equipment and readable storage medium
US11036512B2 (en) Systems and methods for processing instructions having wide immediate operands
US8874882B1 (en) Compiler-directed sign/zero extension of a first bit size result to overwrite incorrect data before subsequent processing involving the result within an architecture supporting larger second bit size values