CN102063288A - DSP (Digital Signal Processing) chip-oriented instruction scheduling method - Google Patents

DSP (Digital Signal Processing) chip-oriented instruction scheduling method Download PDF

Info

Publication number
CN102063288A
CN102063288A CN2011100024549A CN201110002454A CN102063288A CN 102063288 A CN102063288 A CN 102063288A CN 2011100024549 A CN2011100024549 A CN 2011100024549A CN 201110002454 A CN201110002454 A CN 201110002454A CN 102063288 A CN102063288 A CN 102063288A
Authority
CN
China
Prior art keywords
step
instruction
delay
instructions
set
Prior art date
Application number
CN2011100024549A
Other languages
Chinese (zh)
Inventor
汤睿
范高生
Original Assignee
四川九洲电器集团有限责任公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 四川九洲电器集团有限责任公司 filed Critical 四川九洲电器集团有限责任公司
Priority to CN2011100024549A priority Critical patent/CN102063288A/en
Publication of CN102063288A publication Critical patent/CN102063288A/en

Links

Abstract

The invention relates to a DSP (Digital Signal Processing) chip oriented instruction scheduling method, which comprises the following steps of: step A. constructing topological sorting among basic block statements of an assembler; step B. calculating the delay of each instruction on the basis of the topological sorting obtained in the step A; step C. traversing a data dependence graph from root nodes to leaf nodes; and step D. finding out an instruction with the maximum delay in a candidate instruction set in the step C and executing time less than or equal to the current time and arranging the instruction in the current scheduling time slot. By means of modeling, the method realizes the optimization of a DSP chip which is not limited to a specific DSP chip on the market.

Description

一种面向DSP芯片的指令调度方法 A method for scheduling instructions in the DSP chip

技术领域 FIELD

[0001] 本发明涉及计算机科学与技术中编译器技术,尤其涉及一种面向DSP(数字信号处理:Digital Signal Processing,简称DSP)芯片的指令调度方法。 [0001] The present invention relates to computer science and technology in compiler technology, particularly to a facing DSP (Digital Signal Processing: Digital Signal Processing, abbreviated DSP) chip instruction dispatch method.

背景技术 Background technique

[0002] 随着计算机技术多媒体技术的不断发展,音视频技术成为了当前计算机科学与技术研究领域的一个主流方向。 [0002] With the development of computer technology, multimedia technology, audio and video technology has become a mainstream direction of the current in the field of computer science and technology research. 但音视频数据量大,对芯片的计算能力有很高要求,所以要实现音视频技术的诸多高端应用,必须想办法提高芯片的处理能力。 However, a large amount of audio and video data, computing power chips have high requirements, so many high-end applications to achieve audio and video technology, we must find ways to increase the processing capacity of the chip. 而现有技术中的DSP芯片的汇编代码一般限于某一具体功能。 While the prior art DSP assembly code chips are generally limited to a specific function.

发明内容 SUMMARY

[0003] 针对现有技术中存在的DSP芯片一般限于某一具体功能,因此有必要提供一种面向DSP芯片的指令调度方法。 [0003] generally limited to a particular feature for the prior art DSP chip, it is necessary to provide a method for instruction dispatch DSP chip.

[0004] 本发明克服了现有技术的不足,提供提高程序运行速度的方法。 [0004] The present invention overcomes the disadvantages of the prior art by providing a method of improving the running speed. 本发明提供一种面向DSP芯片的指令调度方法,其包含以下步骤: The present invention provides a method of scheduling instructions for DSP chip, comprising the steps of:

步骤A.构造汇编程序基本块语句之间的拓扑排序; Step A. assembler configured between basic blocks topological sorting statement;

步骤B.在步骤A得到的拓扑排序的基础上,计算汇编语言程序基本块中每条指令的延迟值delay ; On the basis of topological sorting step B. Step A obtained on, the assembly language program calculated delay value of the delay in the basic block of each instruction;

步骤C.从根节点到叶节点对数据依赖图进行遍历,在遍历的过程中选择指令进行调度,产生候选指令集;所述候选指令集分为两个集合:一个集合记录步骤B中具有最大delay值的指令;一个集合记录最早执行时间小于或者等于当前时间的指令集合; Step C. from the root node to the leaf node of the data dependency graph traversal, the select instruction during traversal scheduling, a candidate set of instructions is generated; the candidate set of instructions are divided into two sets: one set having the maximum recording step B command delay value; a first set recording instruction execution time is less than or equal to the set of current time;

步骤D.将步骤C中候选指令集合中delay值最大,并且其执行时间小于或等于当前时间的指令找出来,安排到当前的调度时间槽中,然后再根据数据依赖图,更新候选指令集合; Step D. Step C candidate instructions set the maximum delay value, and is equal to or less than the execution time of an instruction to find out the current time, to arrange the current scheduled time slot, then according to the data dependence graph, updating the candidate set of instructions;

步骤E.输出调度以后的汇编语言代码。 After step E. output scheduler assembly language code.

[0005] 优选地,所述步骤A中构造汇编程序基本块语句之间的拓扑排序基于线性调度算法。 [0005] Preferably, the step A compilation configured between topological sort program statements linear basic blocks based scheduling algorithm.

[0006] 优选地,所述步骤B中计算汇编语言程序基本块中每条指令的延迟值delay,其计算公式如下: [0006] Preferably, the step B is calculated in a basic block assembly language program delay value of the delay for each instruction, which is calculated as follows:

Figure CN102063288AD00031

其中exectime (η)为执行第η条指令所需要的周期数,其中exectime (η)为执行第η 条指令所需要的周期数,late_delay (η, m) =Iatency (linst (η), linst (m) +delay (m)) +1 ; latency (linst (η), linst (m))用于计算两条指令之间需要保留的时钟周期数,也就是当两条相邻的语句之间具有数据依赖时,机器为了避免数据冒险而必须延迟的时间。 Wherein exectime (η) for the number of cycles required for instruction execution of [eta], where [eta] of the number of instruction execution cycles required exectime (η), late_delay (η, m) = Iatency (linst (η), linst ( m) + delay (m)) +1; latency (linst (η), linst (m)) calculated for the number of clock cycles between two instructions need to be retained, that is, when the sentence having between two adjacent when data-dependent machine data in order to avoid risky and time must be delayed.

优选地,在步骤D执行完成后进行判断,如果指令调度完毕则执行步骤E,否则重复执行步骤C。 Preferably, after completion judgment step D performed, if the instruction dispatch finished executing step E, otherwise repeat Step C.

[0007] 本发明的有益效果为:通过改变音视频处理方法生成的汇编代码来提高音视频处理方法的效率,优化DSP芯片的处理程序,通过建模的方式,实现了通用DSP芯片的优化,不限于市面上某一具体的DSP芯片。 [0007] Advantageous effects of the present invention are: to improve the efficiency of audio and video processing method by changing the audio and video processing method for generating assembly code optimization processing program DSP chips, by modeling the way, a general-purpose DSP chip optimized, It is not limited to a specific DSP chip market. 在发明中充分利用了所有DSP芯片中都存在或者相似的超长指令集VLIW的功能。 In the invention, a very long instruction full use of all the DSP chips are present in the set of VLIW or similar features.

附图说明 BRIEF DESCRIPTION

[0008] 图1为面向DSP芯片的指令调度方法的步骤示意图。 [0008] FIG. 1 is a schematic view of the step for instruction scheduling method DSP chip. 具体实施方式 Detailed ways

[0009] 下面结合附图对本发明作进一步阐述。 [0009] The following drawings further illustrate the present invention for binding.

[0010] 首先调用GCC (用于Iinux系统下编程的编译器)编译器生成计算机语言代码对应的汇编代码文件,然后以汇编文件为基础,进行下述一系列的面向DSP芯片的指令调度方法,生成同名的汇编代码文件,最后交给链接器生成可执行文件。 [0010] The first call GCC (Iinux system for programming the compiler) computer language compiler to generate assembly language file corresponding to the code, and then the assembler file basis, the following method of scheduling a series of instructions for the DSP chips, generate assembly code file of the same name, is the last link to an executable file.

[0011] 如图1所示的面向DSP芯片的指令调度方法的步骤示意图,其包含以下步骤: 步骤A.构造汇编程序基本块语句之间的拓扑排序; A step for DSP chip instruction scheduling method shown in [0011] FIG. 1 is a schematic, comprising the following steps: A. a compilation configured topological sort between basic blocks program statements;

步骤B.在步骤A得到的拓扑排序的基础上,计算汇编语言程序基本块中每条指令的延迟值delay ; On the basis of topological sorting step B. Step A obtained on, the assembly language program calculated delay value of the delay in the basic block of each instruction;

步骤C.从根节点到叶节点对数据依赖图进行遍历,在遍历的过程中选择指令进行调度,产生候选指令集;所述候选指令集cands分为两个集合:一个集合mcands记录步骤B中具有最大delay值的指令;一个集合ecands记录最早执行时间etime小于或者等于当前时间curtime的指令集合; Step C. from the root node to the leaf node of the data dependency graph traversal, the select instruction dispatch during the traversal, the candidate generation instruction set; cands the candidate set of instructions is divided into two sets: a set of records in Step B mcands command having the largest delay value; a first set ecands record the execution time of the current time is less than or equal etime curtime set of instructions;

步骤D.将步骤C中候选指令集合中delay值最大,并且其执行时间小于或等于当前时间的指令找出来,安排到当前的调度时间槽中,然后再根据数据依赖图,更新候选指令集 Step D. Step C The candidate set of instructions in the maximum delay value, and is equal to or less than the execution time of an instruction to find out the current time, to arrange the current scheduled time slot, then according to the data dependence graph, updating the candidate set of instructions

步骤E.输出调度以后的汇编语言代码。 After step E. output scheduler assembly language code.

[0012] 优选地,所述步骤A中构造汇编程序基本块语句之间的拓扑排序基于线性调度算法。 [0012] Preferably, the step A compilation configured between topological sort program statements linear basic blocks based scheduling algorithm.

[0013] 优选地,所述步骤B中计算汇编语言程序基本块中每条指令的延迟值delay,其计算公式如下: [0013] Preferably, the step B is calculated in a basic block assembly language program delay value of the delay for each instruction, which is calculated as follows:

Figure CN102063288AD00041

其中exectime (η)为执行第η条指令所需要的周期数,late_delay (n, m) =Iatency (1 inst(n), linst (m))+delay (m)+l ;latency (linst (η),linst (m))用于计算两条指令之间需要保留的时钟周期数,也就是当两条相邻的语句之间具有数据依赖时,机器为了避免数据冒险而必须延迟的时间; Wherein the number exectime (η) for the implementation of instruction cycles required to [eta], late_delay (n, m) = Iatency (1 inst (n), linst (m)) + delay (m) + l; latency (linst (η ), linst (m)) calculated for the number of clock cycles between two instructions need to be retained, that is, when the sentence has between two adjacent data dependency, the machine must in order to avoid risk of data delay time;

优选地,在步骤D执行完成后进行判断,如果指令调度完毕则执行步骤E,否则重复执行步骤C。 Preferably, after completion judgment step D performed, if the instruction dispatch finished executing step E, otherwise repeat Step C.

[0014] 本发明并不局限于前述的具体实施方式。 [0014] The present invention is not limited to the foregoing specific embodiments. 本发明扩展到任何在本说明书中披露的新特征或任何新的组合,以及披露的任一新的方法或过程的步骤或任何新的组合。 The present invention extends to any novel features disclosed in this specification, or any novel combination, or any novel combination, and any steps disclosed a new method or process.

Claims (4)

1. 一种面向DSP芯片的指令调度方法,其包含以下步骤:步骤A.构造汇编程序基本块语句之间的拓扑排序;步骤B.在步骤A得到的拓扑排序的基础上,计算汇编语言程序基本块中每条指令的延迟值delay ;步骤C.从根节点到叶节点对数据依赖图进行遍历,在遍历的过程中选择指令进行调度,产生候选指令集;所述候选指令集分为两个集合:一个集合记录步骤B中具有最大delay值的指令;一个集合记录最早执行时间小于或者等于当前时间的指令集合;步骤D.将步骤C中候选指令集合中delay值最大,并且其执行时间小于或等于当前时间的指令找出来,安排到当前的调度时间槽中,然后再根据数据依赖图,更新候选指令集合;步骤E.输出调度以后的汇编语言代码。 CLAIMS 1. A method for instruction dispatch DSP chip, comprising the following steps: A. a compilation configured topological sort between basic blocks program statements; Step B. On the basis of topological sorting of Step A is calculated on the assembly language program basic block delay value of the delay of each instruction; C. step from the root node to the leaf node of the data dependency graph traversal, the select instruction during traversal scheduling, a candidate set of instructions is generated; the candidate set of instructions are divided into two sets: a set of instructions recorded in step B having the largest delay value; a first set recording instruction execution time is less than or equal to the set of current time; step D. step C of candidate delay in the maximum value of a set of instructions, and the execution time instructions less than or equal to find out the current time, to arrange the current scheduled time slot, then according to the data dependence graph, updating the candidate set of instructions; after step E. output scheduler assembly language code.
2.如权利要求1所述的面向DSP芯片的指令调度方法,其特征在于所述步骤A中构造汇编程序基本块语句之间的拓扑排序基于线性调度算法。 2. The method of claim 1 instruction scheduling for DSP chip as claimed in claim, wherein said step A compilation configured between topological sort program statements linear basic blocks based scheduling algorithm.
3.如权利要求2所述的面向DSP芯片的指令调度方法,其特征在于所述步骤B中计算汇编语言程序基本块中每条指令的延迟值delay,其计算公式如下: 3. The method for instruction dispatch DSP chip according to claim 2, wherein the step B is calculated delay value of the delay assembler language programs each instruction in the basic block, which is calculated as follows:
Figure CN102063288AC00021
其中exectime (η)为执行第η条指令所需要的周期数,late_delay (n, m) =Iatency (1 inst(n), linst (m))+delay (m)+l ;latency (linst (η),linst (m))用于计算两条指令之间需要保留的时钟周期数。 Wherein the number exectime (η) for the implementation of instruction cycles required to [eta], late_delay (n, m) = Iatency (1 inst (n), linst (m)) + delay (m) + l; latency (linst (η ), linst (m)) calculated for the number of clock cycles between two instructions need to be retained.
4.如权利要求3所述的面向DSP芯片的指令调度方法,其特征在于所述步骤D执行完成后进行判断,如果指令调度完毕则执行步骤E,否则重复执行步骤C。 4. The method for instruction dispatch DSP chip according to claim 3, characterized in that the determination is completed after performing the step D, if the instruction dispatch finished executing step E, otherwise repeat Step C.
CN2011100024549A 2011-01-07 2011-01-07 DSP (Digital Signal Processing) chip-oriented instruction scheduling method CN102063288A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2011100024549A CN102063288A (en) 2011-01-07 2011-01-07 DSP (Digital Signal Processing) chip-oriented instruction scheduling method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2011100024549A CN102063288A (en) 2011-01-07 2011-01-07 DSP (Digital Signal Processing) chip-oriented instruction scheduling method

Publications (1)

Publication Number Publication Date
CN102063288A true CN102063288A (en) 2011-05-18

Family

ID=43998579

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2011100024549A CN102063288A (en) 2011-01-07 2011-01-07 DSP (Digital Signal Processing) chip-oriented instruction scheduling method

Country Status (1)

Country Link
CN (1) CN102063288A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103108044A (en) * 2013-02-04 2013-05-15 南京大学 Web service combination method based on dependence graph reducing and quality of service (QoS) holding
CN105843660A (en) * 2016-03-21 2016-08-10 同济大学 Code optimization scheduling method for encoder
CN108107872A (en) * 2017-12-28 2018-06-01 北京翼辉信息技术有限公司 Network-based DSP (digital signal processor) application online debugging system and debugging method
CN105843660B (en) * 2016-03-21 2019-04-02 同济大学 A kind of code optimization dispatching method of compiler

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5202993A (en) * 1991-02-27 1993-04-13 Sun Microsystems, Inc. Method and apparatus for cost-based heuristic instruction scheduling
CN1485735A (en) * 2002-08-22 2004-03-31 松下电器产业株式会社 Display device and driving method thereof
CN1670699A (en) * 2004-03-19 2005-09-21 中国科学院计算技术研究所 A micro-dispatching method supporting directed cyclic graph

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5202993A (en) * 1991-02-27 1993-04-13 Sun Microsystems, Inc. Method and apparatus for cost-based heuristic instruction scheduling
CN1485735A (en) * 2002-08-22 2004-03-31 松下电器产业株式会社 Display device and driving method thereof
CN1670699A (en) * 2004-03-19 2005-09-21 中国科学院计算技术研究所 A micro-dispatching method supporting directed cyclic graph

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103108044A (en) * 2013-02-04 2013-05-15 南京大学 Web service combination method based on dependence graph reducing and quality of service (QoS) holding
CN103108044B (en) * 2013-02-04 2015-08-19 南京大学 Reduction Based on the dependency graph and retaining the QoS Web services composition
CN105843660A (en) * 2016-03-21 2016-08-10 同济大学 Code optimization scheduling method for encoder
CN105843660B (en) * 2016-03-21 2019-04-02 同济大学 A kind of code optimization dispatching method of compiler
CN108107872A (en) * 2017-12-28 2018-06-01 北京翼辉信息技术有限公司 Network-based DSP (digital signal processor) application online debugging system and debugging method
CN108107872B (en) * 2017-12-28 2019-03-22 北京翼辉信息技术有限公司 A kind of network-based DSP application on-line debugging system and adjustment method

Similar Documents

Publication Publication Date Title
Gordon et al. Exploiting coarse-grained task, data, and pipeline parallelism in stream programs
JP5859639B2 (en) Automatic load balancing for heterogeneous core
US20100299671A1 (en) Virtualized thread scheduling for hardware thread optimization
US8601458B2 (en) Profile-driven data stream processing
JP4283131B2 (en) Processor and compilation method
US7197747B2 (en) Compiling method, apparatus, and program
JP3664473B2 (en) Optimization method of programs and compilers using the same
US20070169057A1 (en) Mechanism to restrict parallelization of loops
US6760906B1 (en) Method and system for processing program for parallel processing purposes, storage medium having stored thereon program getting program processing executed for parallel processing purposes, and storage medium having stored thereon instruction set to be executed in parallel
US20070055961A1 (en) Systems and methods for re-ordering instructions
CN101329638B (en) Method and system for analyzing parallelism of program code
US8789032B1 (en) Feedback-directed inter-procedural optimization
CN1490717A (en) Method and system for multiprocessor simulation in multiprocessor host system
Biswas et al. Introduction of local memory elements in instruction set extensions
Paulin et al. Network processors: a perspective on market requirements, processor architectures and embedded S/W tools
US9182957B2 (en) Method and system for automated improvement of parallelism in program compilation
WO2006036504A2 (en) System, method and apparatus for dependency chain processing
JP4339907B2 (en) Multiprocessor for optimum code generation method and compiling device
Wang et al. Budget-driven scheduling algorithms for batches of MapReduce jobs in heterogeneous clouds
Suleman et al. Feedback-directed pipeline parallelism
JP2013501298A (en) Computer thread of mapping onto heterogeneous resources
CN103049245A (en) Software performance optimization method based on central processing unit (CPU) multi-core platform
US8490066B2 (en) Profiler for optimizing processor architecture and application
US7856629B2 (en) Compiler apparatus
Nagarajan et al. Static placement, dynamic issue (SPDI) scheduling for EDGE architectures

Legal Events

Date Code Title Description
C06 Publication
C10 Entry into substantive examination
C12 Rejection of a patent application after its publication