WO2019153684A1 - 一种低延迟指令调度器的自动管理方法 - Google Patents

一种低延迟指令调度器的自动管理方法 Download PDF

Info

Publication number
WO2019153684A1
WO2019153684A1 PCT/CN2018/099753 CN2018099753W WO2019153684A1 WO 2019153684 A1 WO2019153684 A1 WO 2019153684A1 CN 2018099753 W CN2018099753 W CN 2018099753W WO 2019153684 A1 WO2019153684 A1 WO 2019153684A1
Authority
WO
WIPO (PCT)
Prior art keywords
automatic
module
instruction
management module
instruction scheduler
Prior art date
Application number
PCT/CN2018/099753
Other languages
English (en)
French (fr)
Inventor
洪振洲
李庭育
陈育鸣
魏智汎
Original Assignee
江苏华存电子科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 江苏华存电子科技有限公司 filed Critical 江苏华存电子科技有限公司
Publication of WO2019153684A1 publication Critical patent/WO2019153684A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution

Definitions

  • the invention relates to the technical field of instruction scheduler management, in particular to an automatic management method of a low delay instruction scheduler.
  • Instruction scheduling is a technique in which instructions are executed in parallel.
  • the compiler or machine hardware increases the number of machine execution instructions per beat by adjusting the order of instructions.
  • the shot is the machine execution instruction that the compiler simulates when compiling the source program. Clock cycle.
  • a table scheduling algorithm is usually used to implement instruction scheduling, and a candidate instruction queue is usually adopted.
  • the data dependency graph is composed of a plurality of nodes, each node represents an instruction, and the data dependency graph can be used to represent a dependency between the instructions. relationship.
  • the priority of each instruction is then calculated, and then the instructions in the data dependency graph are scheduled on a beat-by-shot basis.
  • Instruction scheduling is an effective means of compiler-level mining of program-level parallelism. It improves the number of instructions that the target machine can execute in a cycle by re-adjusting the order of instructions without changing the semantics of the program and satisfying the dependencies and resource dependencies of the target machine. Instruction scheduling is a key technology of modern high-performance compilers. It determines the relative execution order of each operation, the specific execution time and which hardware resources are used. From the perspective of code block partitioning, instruction scheduling can be divided into local instruction scheduling and global instruction scheduling, where local instruction scheduling refers to instruction scheduling within a basic block, and global scheduling refers to instruction scheduling between basic blocks.
  • the existing system chip architecture consists of a plurality of sub-modules including a central processing unit, and is connected by an external bus. If there is a central command to control the controller, the actions of the dispatcher are mostly: 1. Write the command into 2. Tell the central command to control the control How many instructions are written to notify the hardware to execute, two actions, and the action of reading the command is: 1. Read the instruction 2. Tell the central command to control how many commands are read by the controller; Power consumption is large and needs further improvement.
  • an automatic management method for a low-latency instruction scheduler including a central processing unit, an instruction scheduler, and a plurality of hardware modules, wherein the central processing unit is connected through a bus An instruction dispatcher, wherein the instruction scheduler is respectively connected to a plurality of hardware modules, wherein the instruction scheduler is provided with an automatic management module, wherein the automatic management module comprises an automatic indicator management module and an automatic indicator maintenance module; and the automatic indicator management module Connect the automatic indicator maintenance module.
  • the method comprises the following steps:
  • A the central processor write command
  • the automatic indicator maintenance module will automatically update the internal registers.
  • the plurality of hardware modules comprise a first hardware module, a second hardware module, a third hardware module and an Nth hardware module, and N is an integer greater than 3.
  • an automatic indicator management module is added in the present invention, and the automatic indicator management module monitors the amount of instruction data written by each instruction queue every time, such as setting by an instruction queue.
  • the instruction length is 16 bytes.
  • the automatic indicator maintenance module will automatically update the register.
  • the number of instructions because on the system-on-chip, the microprocessor notifies the central command that the number of instructions written by the controller will consume at least 30 microprocessor cycles. Through such an automatic management module, this is omitted.
  • the action will reduce the microprocessor workload, allowing the microprocessor to free up more computing power to handle other tasks, and reduce the amount of traffic sent to the bus, thereby improving system performance.
  • the hardware sub-module read command is also the same mode, saving steps, allowing the hardware sub-module to process other data more quickly, eliminating the delay of the central bus and reducing the traffic flow of the bus.
  • Figure 1 is a schematic view of the structure of the present invention.
  • an automatic management method for a low-delay instruction scheduler including a central processing unit 1, an instruction dispatcher 2, and a plurality of hardware modules, wherein: the central processing unit 1 is connected to the instruction scheduler 2 via a bus, the instruction scheduler 2 is respectively connected to a plurality of hardware modules, and the instruction dispatcher 2 is provided with an automatic management module 3, and the automatic management module 3 includes an automatic indicator management module 4 and an automatic The indicator maintenance module 5; the automatic indicator management module 4 is connected to the automatic indicator maintenance module 5; wherein the plurality of hardware modules include the first hardware module 6, the second hardware module 7, the third hardware module 8, and the Nth hardware module, N Is an integer greater than 3.
  • the management method of the present invention includes the following steps:
  • A the central processor write command
  • the automatic indicator maintenance module will automatically update the internal registers.
  • the automatic indicator management module is added in the present invention.
  • the automatic indicator management module monitors the amount of instruction data written by each instruction queue each time, for example, the instruction length set by an instruction queue is 16 bytes, when each When the write command data enters the memory, the accumulated write is continuously accumulated.
  • the automatic indicator maintenance module will automatically update the number of instructions in the register;
  • the microprocessor notifies the central command that the number of instructions written by the controller will consume at least 30 microprocessor calculation cycles. Through such an automatic management module, the operation of the microprocessor is omitted, and the microprocessor operation can be reduced.
  • the hardware sub-module read command is also the same mode, saving steps, allowing the hardware sub-module to process other data more quickly, eliminating the delay of the central bus and reducing the traffic flow of the bus.

Abstract

本发明公开了一种低延迟指令调度器的自动管理方法,包括中央处理器、指令调度器和多个硬件模块,中央处理器通过总线连接指令调度器,指令调度器分别连接多个硬件模块,指令调度器内设有自动管理模块,自动管理模块包括自动指标管理模块和自动指标维护模块,自动指标管理模块连接自动指标维护模块;本发明中增加了自动指标管理模块,当每次写入命令时也同时会把这动作送给自动指标维护模块,当命令写入完成,此自动指标维护模块将会自动更新内部寄存器,省略了告知此中央指令支配控制器写入多少指令的动作,将可提升系统整体效能,并减少发送至总线的通信量,进而提升系统效能。

Description

一种低延迟指令调度器的自动管理方法 技术领域
本发明涉及指令调度器管理技术领域,具体为一种低延迟指令调度器的自动管理方法。
背景技术
指令调度是一种指令并行执行的技术,编译器或者机器硬件通过调整指令的顺序来提高每拍内机器执行指令的数量,所述拍为编译器在编译源程序时所模拟的机器执行指令的时钟周期。现有编译技术中通常采用表调度算法来实现指令调度,通常采用一个候选指令队列。具体的,在进行指令调度时,首先对需要调度的指令构建数据依赖图,该数据依赖图由若干个节点组成,每个节点代表一条指令,该数据依赖图可以用来表示指令之间的依赖关系。然后计算各条指令的优先级,接着逐拍对数据依赖图中的指令进行调度。指令调度是编译器挖掘程序潜在的指令级并行的有效手段。它是在不改变程序语义,满足目标机器的相关性和资源依赖性的前提下,通过重新调整指令顺序来提高一个周期内目标机器能够执行的指令数目。指令调度是现代高性能编译器的一项关键技术,它决定各操作的相对执行顺序,具体执行时间及使用哪些硬件资源等。从代码块划分角度来看,指令调度可以分为局部指令调度和全局指令调度,其中局部指令调度是指基本块内的指令调度,而全局调度是指基本块间的指令调度。
现有系统芯片架构由多个子模块包含中央处理器组成,由外部总线连接,如有中央指令支配控制器,对于发配命令者之动作大多为:1.写指令进去2.告知此中央指令支配控制器写了多少个指令通知硬件执行,两个动作,而读取命令者之动作则为:1.读走指令2.告知中央指令支配控制器读走了多少个命令;现有技术效能低、功耗大,有待进一步改进。
发明内容
本发明的目的在于提供一种低延迟指令调度器的自动管理方法,以解决上述背景技术中提出的问题。
为实现上述目的,本发明提供如下技术方案:一种低延迟指令调度器的自动管理方法,包括中央处理器、指令调度器和多个硬件模块,其特征在于:所述中央处理器通过总线连接指令调度器,所述指令调度器分别连接多个硬件模块,所述指令调度器内设有自动管理模块,所述自动管理模块包括自动指标管理模块和自动指标维护模块;所述自动指标管理模块连接自动指标维护模块。
优选的,包括以下步骤:
A、中央处理器写入命令;
B、写入命令时也同时会把这指令动作送给自动管理模块中的自动指标维护模块;
C、当命令写入完成,此自动指标维护模块将会自动更新内部寄存器。
优选的,多个硬件模块包括第一硬件模块、第二硬件模块、第三硬件模块和第N硬件模块,N为大于3的整数。
与现有技术相比,本发明的有益效果是:本发明中增加了自动指标管理模块,此自动指标管理模块监控不同的指令队列每一次写入的指令数据量,如某指令队列所设定指令长度为16字节,当每次写入指令数据进入内存时,持续累计计算已写入字节,当指令写入内存完成即写满16字节,自动指标维护模块将自动更新寄存器内的指令个数;由于在系统级芯片上,微处理器通知此中央指令支配控制器写入多少指令的动作将耗费至少30个微处理器计算周期,透过这样的自动管理模块,省略了告知此动作,将可降低微处理器工作量,使微处理器空出更多运算能力处理其他工作,并减少发送至总线的通信量,进而提升系统效能。而硬件子模块读取指令,也是相同模式,节省步骤,让硬件子模块更快速地去处理其他数据,省去中央总线的延迟,并且减少总线的交通流量。
附图说明
图1为本发明结构示意图。
具体实施方式
下面将结合本发明实施例中的附图,对本发明实施例中的技术方 案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
请参阅图1,本发明提供一种技术方案:一种低延迟指令调度器的自动管理方法,包括中央处理器1、指令调度器2和多个硬件模块,其特征在于:所述中央处理器1通过总线连接指令调度器2,所述指令调度器2分别连接多个硬件模块,所述指令调度器2内设有自动管理模块3,所述自动管理模块3包括自动指标管理模块4和自动指标维护模块5;所述自动指标管理模块4连接自动指标维护模块5;其中,多个硬件模块包括第一硬件模块6、第二硬件模块7、第三硬件模块8和第N硬件模块,N为大于3的整数。
本发明的管理方法包括以下步骤:
A、中央处理器写入命令;
B、写入命令时也同时会把这指令动作送给自动管理模块中的自动指标维护模块;
C、当命令写入完成,此自动指标维护模块将会自动更新内部寄存器。
综上所述,本发明中增加了自动指标管理模块,此自动指标管理模块监控不同的指令队列每一次写入的指令数据量,如某指令队列所 设定指令长度为16字节,当每次写入指令数据进入内存时,持续累计计算已写入字节,当指令写入内存完成即写满16字节,自动指标维护模块将自动更新寄存器内的指令个数;由于在系统级芯片上,微处理器通知此中央指令支配控制器写入多少指令的动作将耗费至少30个微处理器计算周期,透过这样的自动管理模块,省略了告知此动作,将可降低微处理器工作量,使微处理器空出更多运算能力处理其他工作,并减少发送至总线的通信量,进而提升系统效能。而硬件子模块读取指令,也是相同模式,节省步骤,让硬件子模块更快速地去处理其他数据,省去中央总线的延迟,并且减少总线的交通流量。
尽管已经示出和描述了本发明的实施例,对于本领域的普通技术人员而言,可以理解在不脱离本发明的原理和精神的情况下可以对这些实施例进行多种变化、修改、替换和变型,本发明的范围由所附权利要求及其等同物限定。

Claims (3)

  1. 一种低延迟指令调度器的自动管理方法,其特征在于:包括中央处理器(1)、指令调度器(2)和多个硬件模块,其特征在于:所述中央处理器(1)通过总线连接指令调度器(2),所述指令调度器(2)分别连接多个硬件模块,所述指令调度器(2)内设有自动管理模块(3),所述自动管理模块(3)包括自动指标管理模块(4)和自动指标维护模块(5);所述自动指标管理模块(4)连接自动指标维护模块(5)。
  2. 根据权利要求1所述的一种低延迟指令调度器的自动管理方法,其特征在于:包括以下步骤:
    A、中央处理器写入命令;
    B、写入命令时也同时会把这指令动作送给自动管理模块中的自动指标维护模块;
    C、当命令写入完成,此自动指标维护模块将会自动更新内部寄存器。
  3. 根据权利要求1所述的一种低延迟指令调度器的自动管理方法,其特征在于:多个硬件模块包括第一硬件模块(6)、第二硬件模块(7)、第三硬件模块(8)和第N硬件模块,N为大于3的整数。
PCT/CN2018/099753 2018-02-06 2018-08-09 一种低延迟指令调度器的自动管理方法 WO2019153684A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810117641.3 2018-02-06
CN201810117641.3A CN108334326A (zh) 2018-02-06 2018-02-06 一种低延迟指令调度器的自动管理方法

Publications (1)

Publication Number Publication Date
WO2019153684A1 true WO2019153684A1 (zh) 2019-08-15

Family

ID=62928428

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/099753 WO2019153684A1 (zh) 2018-02-06 2018-08-09 一种低延迟指令调度器的自动管理方法

Country Status (2)

Country Link
CN (1) CN108334326A (zh)
WO (1) WO2019153684A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108334326A (zh) * 2018-02-06 2018-07-27 江苏华存电子科技有限公司 一种低延迟指令调度器的自动管理方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1996008770A2 (en) * 1994-09-16 1996-03-21 Philips Electronics N.V. Register status protection during read-modify-write operation
CN101211321A (zh) * 2006-12-28 2008-07-02 英特尔公司 分层存储器读取/写入微指令调度器
CN101710272A (zh) * 2009-10-28 2010-05-19 北京龙芯中科技术服务中心有限公司 指令调度装置和方法
CN101894013A (zh) * 2010-07-16 2010-11-24 中国科学院计算技术研究所 处理器内指令级流水线控制方法及其系统
CN108334326A (zh) * 2018-02-06 2018-07-27 江苏华存电子科技有限公司 一种低延迟指令调度器的自动管理方法

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6035389A (en) * 1998-08-11 2000-03-07 Intel Corporation Scheduling instructions with different latencies
CN101334766B (zh) * 2008-06-30 2011-05-11 东软飞利浦医疗设备系统有限责任公司 一种并行微处理器及其实现方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1996008770A2 (en) * 1994-09-16 1996-03-21 Philips Electronics N.V. Register status protection during read-modify-write operation
CN101211321A (zh) * 2006-12-28 2008-07-02 英特尔公司 分层存储器读取/写入微指令调度器
CN101710272A (zh) * 2009-10-28 2010-05-19 北京龙芯中科技术服务中心有限公司 指令调度装置和方法
CN101894013A (zh) * 2010-07-16 2010-11-24 中国科学院计算技术研究所 处理器内指令级流水线控制方法及其系统
CN108334326A (zh) * 2018-02-06 2018-07-27 江苏华存电子科技有限公司 一种低延迟指令调度器的自动管理方法

Also Published As

Publication number Publication date
CN108334326A (zh) 2018-07-27

Similar Documents

Publication Publication Date Title
TWI494850B (zh) 通透地提供給作業系統之非對稱多核心處理器系統
JP5774707B2 (ja) 異種マルチプロセッサコンピューティングプラットフォームにおけるアプリケーションのスケジューリング
US8190863B2 (en) Apparatus and method for heterogeneous chip multiprocessors via resource allocation and restriction
Tang et al. Controlled kernel launch for dynamic parallelism in GPUs
Wang et al. Kernel fusion: An effective method for better power efficiency on multithreaded GPU
US8898435B2 (en) Optimizing system throughput by automatically altering thread co-execution based on operating system directives
TWI489266B (zh) 指定應用程式執行緒的效能狀態之指令
US20130007423A1 (en) Predicting out-of-order instruction level parallelism of threads in a multi-threaded processor
Luo et al. A performance and energy consumption analytical model for GPU
Tan et al. Analysis and performance results of computing betweenness centrality on IBM Cyclops64
US20140143524A1 (en) Information processing apparatus, information processing apparatus control method, and a computer-readable storage medium storing a control program for controlling an information processing apparatus
Abeydeera et al. SAM: Optimizing multithreaded cores for speculative parallelism
Xu et al. Taming the" Monster": Overcoming program optimization challenges on SW26010 through precise performance modeling
WO2019153681A1 (zh) 一种智能指令调度器
Gottschlag et al. Mechanism to mitigate avx-induced frequency reduction
WO2019153684A1 (zh) 一种低延迟指令调度器的自动管理方法
US9684541B2 (en) Method and apparatus for determining thread execution parallelism
WO2019153683A1 (zh) 一种可配置且具弹性的指令调度器
US11803224B2 (en) Power management method, multi-processing unit system and power management module
US20230195593A1 (en) System, Method And Apparatus For High Level Microarchitecture Event Performance Monitoring Using Fixed Counters
Zhu et al. Onac: optimal number of active cores detector for energy efficient gpu computing
Khairy et al. SIMR: Single Instruction Multiple Request Processing for Energy-Efficient Data Center Microservices
Thomas et al. Application aware scalable architecture for GPGPU
Weber et al. Decoupled access-execute on ARM big. LITTLE
Huangfu et al. Warp-Based load/store reordering to improve gpu time predictability

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18905307

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18905307

Country of ref document: EP

Kind code of ref document: A1