JPH10207854A

JPH10207854A - Compiler instruction parallelizing system

Info

Publication number: JPH10207854A
Application number: JP2436297A
Authority: JP
Inventors: Takashi Miyamoto; 敬士宮本
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1997-01-23
Filing date: 1997-01-23
Publication date: 1998-08-07

Abstract

PROBLEM TO BE SOLVED: To generate a high-speed object code by increasing the parallelism of the object code (object code for a computer which has arithmetic units capable of operating in parallel). SOLUTION: A dependency and parallel operation suppression graph generation part 321 generates a dependency and parallel operation suppression graph as a directional graph representing dependency relation and parallel operation suppression relation between instructions. A path latency and parallel operation suppression number calculation part 322 calculates a path latency and parallel operation suppression number on the basis of the dependency and parallel operation suppression graph generated by the dependency and parallel operation suppression graph generation part 321 and adds the calculation result to the dependency and parallel operation suppression graph. A scheduling part 323 schedules codes in parallel according to the dependency and parallel operation suppression graph to which the calculation result of the path latency and parallel operation suppression number is added.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、コンパイラ命令並
列化方式に関し、特に並列に動作し得る複数の演算ユニ
ットを持つ計算機（コンピュータ）用の目的コードを生
成するコンパイラにおけるコンパイラ命令並列化方式に
関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a compiler instruction parallelizing method, and more particularly to a compiler instruction parallelizing method in a compiler for generating an object code for a computer (computer) having a plurality of operation units that can operate in parallel.

【０００２】[0002]

【従来の技術】従来より、この種のコンパイラ命令並列
化方式は、上記のような計算機（複数命令を並列に実行
可能な計算機であり、同時に動作し得る複数の演算ユニ
ットを持つ計算機）用の目的コードの効率を高めるため
に用いられている。2. Description of the Related Art Conventionally, this kind of compiler instruction parallelization method is used for a computer as described above (a computer which can execute a plurality of instructions in parallel and has a plurality of operation units which can operate simultaneously). Used to increase the efficiency of purpose codes.

【０００３】従来のこの種のコンパイラ命令並列化方式
としては、例えば、以下の〜に示す特許公報に記載
されたものがあった。[0003] As a conventional parallel instruction system of this kind, there are, for example, the ones described in the following patent gazettes.

【０００４】特開平６−２５０８４７号公報には、
並列論理型言語の命令をＶＬＩＷ（ＶｅｒｙＬｏｎｇ
ＩｎｓｔｒｕｃｔｉｏｎＷｏｒｄ）方式の並列計算
機で実行できるようなＶＬＩＷ命令に変換する技術が記
載されている。この公報に記載されたコンパイラ命令並
列化方式では、並列論理型言語がコンパイラにより抽象
命令列に変換された後に、並列論理型言語の特徴が生か
されて並列にスケジューリングされ、さらに命令間の依
存関係に基づいて並列にスケジューリングが行われる。Japanese Patent Application Laid-Open No. Hei 6-250847 discloses that
VLIW (Very Long)
There is described a technique for converting a VLIW instruction into a VLIW instruction that can be executed by a parallel computer of an Instruction Word) system. In the compiler instruction parallelization method described in this publication, after a parallel logic type language is converted into an abstract instruction sequence by a compiler, scheduling is performed in parallel by taking advantage of the features of the parallel logic type language, and furthermore, the dependency between instructions Scheduling is performed in parallel based on

【０００５】特開平６−２９５２４６号公報には、
後向きコード圧縮フェーズによりスケジュールを行うこ
とによって、レジスタの使用量を削減してシステムの処
理の遅延を防止し、処理の効率化を図る技術が記載され
ている。この公報に記載されたコンパイラ命令並列化方
式では、前向きリストスケジューリングという、依存関
係に基づいて並列にスケジューリングを行う方式が適用
された後に、後向きコード圧縮フェーズにより命令が移
動されて、レジスタの使用量の削減が行われる。[0005] JP-A-6-295246 discloses that
A technique is described in which scheduling is performed in a backward code compression phase to reduce the amount of registers used, to prevent delays in system processing, and to improve processing efficiency. In the compiler instruction parallelization method described in this publication, after a method called forward list scheduling, in which scheduling is performed in parallel based on dependencies, is applied, instructions are moved in a backward code compression phase, and register usage is reduced. Is reduced.

【０００６】特開平６−１３１１９５号公報には、
既存マシン用に作成されたアセンブルテキストが表現す
る命令列中の、各命令の依存関係を解析する命令解析手
段による解析結果に基づいて、命令の並べ替えを行う技
術が記載されている。この公報に記載されたコンパイラ
命令並列化方式では、アセンブルテキストが表す命令列
をもとに命令依存グラフが作成されることによって命令
間の依存関係が解析され、この結果に基づいて並列にス
ケジューリングが行われる。[0006] JP-A-6-131195 discloses that
A technique is described in which instructions are rearranged based on an analysis result of an instruction analysis unit that analyzes a dependency of each instruction in an instruction string represented by an assemble text created for an existing machine. In the compiler instruction parallelization method described in this publication, an instruction dependency graph is created based on an instruction sequence represented by an assemble text to analyze dependencies between instructions, and scheduling is performed in parallel based on the result. Done.

【０００７】[0007]

【発明が解決しようとする課題】上述した従来のコンパ
イラ命令並列化方式には、複数命令を並列に実行するこ
とが可能であり同時に動作し得る複数の演算ユニットを
持つ計算機（ある種のマイクロプロセッサ等）におい
て、依存関係（ある命令が書き込んだレジスタを他の命
令で読み出すなどの原因により発生する、命令間の実行
順序の関係）および資源競合（同一の演算ユニットを複
数の命令が使用できないこと）以外の原因により発生す
る命令間の並列動作が不可能になる関係が生じる場合
に、必ずしも目的コードの最適なスケジューリングを行
うことができないという問題点があった。The above-described conventional compiler instruction parallelization system includes a computer (a certain type of microprocessor) having a plurality of arithmetic units capable of executing a plurality of instructions in parallel and operating simultaneously. ) And resource conflicts (the same arithmetic unit cannot be used by multiple instructions), such as dependencies (relationships between instructions that occur due to reading registers written by one instruction with other instructions). In the case where there occurs a relationship in which parallel operations between instructions that occur due to causes other than the above occur, it is not always possible to optimally schedule the target code.

【０００８】このような問題点が生じる理由は、目的コ
ードのスケジューリングが命令間の依存関係および資源
競合のみに基づいて行なわれているためである。The reason that such a problem occurs is that the scheduling of the object code is performed based only on the dependencies between instructions and resource conflicts.

【０００９】なお、依存関係および資源競合以外の原因
により発生する命令間の並列動作が不可能になるという
関係（以下、これを「並列動作抑制関係」という）は、
本発明の属する技術分野に含まれるマイクロプロセッサ
ではしばしば発生するものである。例えば、２個の演算
ユニットを持つペンティアム（Ｐｅｎｔｉｕｍ）プロセ
ッサ（アメリカ合衆国インテル（Ｉｎｔｅｌ）社製）で
は、他命令と並列に実行できない命令や、特定の演算ユ
ニットでその命令を実行する時に限って他命令と並列に
実行できるという命令がある。これらの命令における制
限は、命令間の依存関係や資源競合では扱うことができ
ない。[0009] The relationship in which parallel operations between instructions caused by causes other than dependencies and resource conflicts become impossible (hereinafter referred to as "parallel operation suppression relationship") is as follows.
It often occurs in a microprocessor included in the technical field to which the present invention belongs. For example, in a Pentium processor (manufactured by Intel Corporation of the United States) having two operation units, an instruction that cannot be executed in parallel with another instruction, or another instruction only when the instruction is executed by a specific operation unit There is an instruction that can be executed in parallel. Restrictions on these instructions cannot be handled by dependencies or resource conflicts between the instructions.

【００１０】本発明の目的は、上述の点に鑑み、並列に
動作し得る複数の演算ユニットを持つ計算機用の目的コ
ードを生成するコンパイラにおいて、目的コードの並列
性を高めることにより高速な目的コードを生成すること
ができるコンパイラ命令並列化方式を提供することにあ
る。SUMMARY OF THE INVENTION In view of the above, it is an object of the present invention to provide a compiler for generating a target code for a computer having a plurality of operation units which can operate in parallel. It is an object of the present invention to provide a compiler instruction parallelizing method capable of generating a compiler instruction.

【００１１】[0011]

【課題を解決するための手段】本発明のコンパイラ命令
並列化方式は、命令間の依存関係および並列動作抑制関
係を表現する有向グラフである依存・並列動作抑制グラ
フを生成する依存・並列動作抑制グラフ生成部と、前記
依存・並列動作抑制グラフ生成部により生成される依存
・並列動作抑制グラフに基づいてパスレイテンシおよび
並列動作抑制数を計算し、その計算結果を依存・並列動
作抑制グラフに追加するパスレイテンシ・並列動作抑制
数計算部と、前記パスレイテンシ・並列動作抑制数計算
部によりパスレイテンシおよび並列動作抑制数の計算結
果が追加された依存・並列動作抑制グラフに基づいてコ
ードの並列なスケジューリングを行うスケジューリング
部とを有する。SUMMARY OF THE INVENTION A compiler instruction parallelization method according to the present invention is a dependency / parallel operation suppression graph for generating a dependence / parallel operation suppression graph which is a directed graph expressing a dependency relationship between instructions and a parallel operation suppression relationship. The path latency and the number of parallel operation suppressions are calculated based on the generation unit and the dependency / parallel operation suppression graph generated by the dependency / parallel operation suppression graph generation unit, and the calculation result is added to the dependence / parallel operation suppression graph. A parallel scheduling of codes based on a dependency / parallel operation suppression graph in which a path latency / parallel operation suppression number calculation unit and a path latency / parallel operation suppression number calculation result are added by the path latency / parallel operation suppression number calculation unit. And a scheduling unit for performing

【００１２】なお、本発明のコンパイラ命令並列化方式
は、特に、以下のおよびに示す構成要素を有するよ
うに構成することができる。Incidentally, the compiler instruction parallelizing method of the present invention can be constituted so as to have the following components.

【００１３】前記依存・並列動作抑制グラフ生成部
により生成された依存・並列動作抑制グラフに対して、
有向木のルートから有向辺に従ってノードを辿ることに
より各命令のパスレイテンシを決定し、各ノードから出
ている無向辺の数を計算することにより並列動作抑制数
を決定する前記パスレイテンシ・並列動作抑制数計算部The dependency / parallel operation suppression graph generated by the dependence / parallel operation suppression graph generation unit:
Determining the path latency of each instruction by tracing a node from the root of the directed tree in accordance with the directed side, and calculating the number of undirected sides emanating from each node to determine the number of parallel operation suppressions・ Parallel operation suppression number calculator

【００１４】最初にパスレイテンシが大きい命令か
ら順にスケジューリングし、同じパスレイテンシの命令
については並列動作抑制数が大きい命令から順にスケジ
ューリングする前記スケジューリング部[0014] The scheduling unit that schedules instructions in order from the one with the highest path latency and schedules the instructions with the same path latency in the order from the instruction with the largest parallel operation suppression number.

【００１５】[0015]

【発明の実施の形態】次に、本発明の実施の形態につい
て、図面を参照して詳細に説明する。Next, an embodiment of the present invention will be described in detail with reference to the drawings.

【００１６】[0016]

【Example】

（１）本発明の第１の実施例図１（ａ）および（ｂ）は、本発明のコンパイラ命令並
列化方式の第１の実施例の構成を示すブロック図であ
る。(1) First Embodiment of the Present Invention FIGS. 1A and 1B are block diagrams showing the configuration of a first embodiment of the compiler instruction parallelization system of the present invention.

【００１７】図１（ａ）に示すように、本実施例のコン
パイラ命令並列化方式は、フロントエンド２と、バック
エンド３とを含むコンパイラ１００によって実現され
る。コンパイラ１００は、入力コード１を入力とし、目
的コード４を出力とする。As shown in FIG. 1A, the compiler instruction parallelization method of this embodiment is realized by a compiler 100 including a front end 2 and a back end 3. The compiler 100 receives an input code 1 as an input, and outputs a target code 4 as an output.

【００１８】フロントエンド２は、入力コード１である
プログラムの字句解析および構文・意味解析を行い、中
間言語コードを出力する。The front end 2 performs lexical analysis and syntax / semantic analysis of the program as the input code 1, and outputs an intermediate language code.

【００１９】バックエンド３は、中間言語最適化部３１
と、コードスケジューリング部３２と、目的コード生成
部３３とを含む。The back end 3 includes an intermediate language optimization unit 31
And a code scheduling unit 32 and a target code generation unit 33.

【００２０】中間言語最適化部３１は、中間言語コード
に対して広域最適化および局所最適化を行う。The intermediate language optimizing unit 31 performs wide area optimization and local optimization on the intermediate language code.

【００２１】コードスケジューリング部３２は、中間言
語コードの命令を並列実行性が高くなるように並べ替え
る。このコードスケジューリング部３２が、本発明の主
要部に該当する。The code scheduling unit 32 rearranges the instructions of the intermediate language code so as to increase the parallel execution. The code scheduling unit 32 corresponds to a main part of the present invention.

【００２２】目的コード生成部３３は、中間言語コード
を目的コード４に変換する。The target code generator 33 converts the intermediate language code into the target code 4.

【００２３】次に、図１（ｂ）を参照して、コードスケ
ジューリング部３２の詳細な構成を説明する。Next, the detailed configuration of the code scheduling unit 32 will be described with reference to FIG.

【００２４】コードスケジューリング部３２は、依存・
並列動作抑制グラフ生成部３２１と、パスレイテンシ・
並列動作抑制数計算部３２２と、スケジューリング部３
２３とを含む。The code scheduling unit 32
The parallel operation suppression graph generation unit 321 and the path latency
Parallel operation suppression number calculation section 322 and scheduling section 3
23.

【００２５】依存・並列動作抑制グラフ生成部３２１
は、中間言語コードを構成する命令間の依存性を解析
し、この解析結果に基づき有向グラフを構築する。さら
に、依存・並列動作抑制グラフ生成部３２１は、並列動
作抑制関係も解析し、先に構築した有向グラフに並列動
作抑制関係を追加する。このようにして構築された有向
グラフを依存・並列動作抑制グラフと呼ぶ。The dependency / parallel operation suppression graph generator 321
Analyzes the dependencies between the instructions that make up the intermediate language code, and builds a directed graph based on this analysis. Further, the dependency / parallel operation suppression graph generator 321 also analyzes the parallel operation suppression relationship, and adds the parallel operation suppression relationship to the previously constructed directed graph. The directed graph constructed in this manner is called a dependency / parallel operation suppression graph.

【００２６】パスレイテンシ・並列動作抑制数計算部３
２２は、依存・並列動作抑制グラフを用いて、パスレイ
テンシと並列動作抑制数とを計算する。Path Latency / Parallel Operation Suppression Number Calculator 3
22 calculates the path latency and the number of parallel operation suppressions using the dependence / parallel operation suppression graph.

【００２７】スケジューリング部３２３は、依存・並列
動作抑制グラフとパスレイテンシおよび並列動作抑制数
とを用いて、中間言語コードの命令の並べ替えを行う。The scheduling unit 323 rearranges instructions of the intermediate language code using the dependence / parallel operation suppression graph, the path latency, and the number of parallel operation suppressions.

【００２８】図２は、コードスケジューリング部３２の
処理を示す流れ図である。この処理は、依存関係有向グ
ラフ表現ステップ２０１と、並列動作抑制関係解析ステ
ップ２０２と、パスレイテンシ計算ステップ２０３と、
並列動作抑制数計算ステップ２０４と、中間言語コード
命令並べ替えステップ２０５とからなる。FIG. 2 is a flowchart showing the processing of the code scheduling unit 32. This process includes a dependency directed graph expression step 201, a parallel operation suppression relation analysis step 202, a path latency calculation step 203,
It comprises a parallel operation suppression number calculation step 204 and an intermediate language code instruction rearranging step 205.

【００２９】図３は、依存・並列動作抑制グラフ生成部
３２１が生成する有向グラフ（依存・並列動作抑制グラ
フ）を説明するための図である。FIG. 3 is a diagram for explaining a directed graph (dependent / parallel operation suppression graph) generated by the dependence / parallel operation suppression graph generation unit 321.

【００３０】図４は、スケジューリング部３２３による
スケジューリング結果である出力コードの一例を示す図
である。FIG. 4 is a diagram showing an example of an output code as a result of scheduling by the scheduling section 323.

【００３１】図５は、本実施例のコンパイラ命令並列化
方式の動作を説明するための図であり、本発明のコンパ
イラ命令並列化方式を使用しないと仮定した時のスケジ
ューリング結果である出力コードの一例を示す図であ
る。FIG. 5 is a diagram for explaining the operation of the compiler instruction parallelizing method of the present embodiment. The output code of the output code which is the scheduling result when the compiler instruction parallelizing method of the present invention is not used is assumed. It is a figure showing an example.

【００３２】次に、このように構成された本実施例のコ
ンパイラ命令並列化方式の動作について説明する。Next, the operation of the compiler instruction parallelizing system of the present embodiment configured as described above will be described.

【００３３】第１に、図１（ａ）に示すコンパイラ１０
０の概略的な動作について説明する。First, the compiler 10 shown in FIG.
The general operation of 0 will be described.

【００３４】コンパイラ１００内のフロントエンド２
は、入力コード１を中間言語コードに変換する。この中
間言語コードは、バックエンド３に渡される。Front end 2 in compiler 100
Converts the input code 1 into an intermediate language code. This intermediate language code is passed to the back end 3.

【００３５】バックエンド３においては、まず、中間言
語最適化部３１が、中間言語コードに対して局所最適化
および広域最適化を行う。In the back end 3, first, the intermediate language optimizing unit 31 performs local optimization and wide area optimization on the intermediate language code.

【００３６】次に、コードスケジューリング部３２は、
並列実行性を高めるための命令の並べ替えを行う。Next, the code scheduling unit 32
Instructions are rearranged to improve parallel execution.

【００３７】さらに、目的コード生成部３３は、中間言
語コードを目的コード４に変換し、コンパイラ１００の
出力コードである目的コード４を生成する。Further, the object code generation unit 33 converts the intermediate language code into the object code 4 and generates the object code 4 which is the output code of the compiler 100.

【００３８】第２に、図１（ａ）および（ｂ）と共に図
２，図３，および図４を参照して、本発明の中心となる
コードスケジューリング部３２における動作について説
明する。Second, the operation of the code scheduling unit 32, which is the main part of the present invention, will be described with reference to FIGS. 2, 3 and 4 together with FIGS. 1 (a) and 1 (b).

【００３９】コードスケジューリング部３２では、最初
に依存・並列動作抑制グラフ生成部３２１が、依存・並
列動作抑制グラフを生成する。In the code scheduling unit 32, first, the dependence / parallel operation suppression graph generation unit 321 generates a dependence / parallel operation suppression graph.

【００４０】すなわち、依存・並列動作抑制グラフ生成
部３２１は、依存・並列動作抑制グラフを生成するため
に、まず命令間の依存関係を解析し、その依存関係を有
向グラフ（初期の依存・並列動作抑制グラフ）で表現す
る（ステップ２０１）。That is, in order to generate a dependency / parallel operation suppression graph, the dependency / parallel operation suppression graph generator 321 first analyzes the dependencies between instructions, and converts the dependencies into directed graphs (initial dependency / parallel operations). (Inhibition graph) (step 201).

【００４１】図３に示された依存・並列動作抑制グラフ
の一例により、依存・並列動作抑制グラフによって表現
される依存関係を説明する。依存・並列動作抑制グラフ
では、中間言語コードの各命令は有向グラフのノードで
表現される。図３中の「ａ」および「Ｂ」等は命令を表
すノードである。また、依存・並列動作抑制グラフで
は、依存関係は有向グラフの有向辺で表現される。図３
では、例えば命令Ｂは命令ａに依存しているので、ノー
ドａからノードＢへの有向辺が存在する。The dependency expressed by the dependency / parallel operation suppression graph will be described with reference to an example of the dependence / parallel operation suppression graph shown in FIG. In the dependency / parallel operation suppression graph, each instruction of the intermediate language code is represented by a node of the directed graph. “A” and “B” in FIG. 3 are nodes representing instructions. In the dependency / parallel operation suppression graph, the dependency is expressed by a directed edge of the directed graph. FIG.
For example, since the instruction B depends on the instruction a, there is a directed edge from the node a to the node B.

【００４２】次に、依存・並列動作抑制グラフ生成部３
２１は、依存・並列動作抑制グラフを生成するために、
並列動作抑制関係を解析し、その解析結果を依存・並列
動作抑制グラフに追加する（ステップ２０２）。Next, the dependency / parallel operation suppression graph generator 3
21 is to generate a dependency / parallel operation suppression graph,
The parallel operation suppression relationship is analyzed, and the analysis result is added to the dependency / parallel operation suppression graph (step 202).

【００４３】並列動作抑制関係とは、先に述べたよう
に、依存関係および資源競合以外の原因により２個の命
令の並列動作が不可能となる関係をいう。一例として
は、「パイプライン動作で一部共通回路を使用すること
により、２個の命令が別々の演算ユニットで実行される
にもかかわらず、並列実行できないという関係」が考え
られる。並列動作抑制関係は、依存・並列動作抑制グラ
フにおいて無向辺で表現される。再び図３で説明する
と、例えば命令Ｂと命令ｅとは並列動作抑制関係にある
ので、ノードＢとノードｅとは無向辺で結ばれている。As described above, the parallel operation suppression relationship refers to a relationship in which parallel operation of two instructions becomes impossible due to causes other than the dependency relationship and resource conflict. As an example, "a relationship in which two instructions are executed in separate operation units but cannot be executed in parallel even though two instructions are executed by using a common circuit in a pipeline operation" is considered. The parallel operation suppression relationship is represented by an undirected side in the dependency / parallel operation suppression graph. Referring again to FIG. 3, for example, since the instruction B and the instruction e have a parallel operation suppression relationship, the node B and the node e are connected by an undirected side.

【００４４】パスレイテンシ・並列動作抑制数計算部３
２２は、まず、依存・並列動作抑制グラフに基づいてパ
スレイテンシを計算し、その計算結果を依存・並列動作
抑制グラフに追加する（ステップ２０３）。Path Latency / Parallel Operation Suppression Number Calculator 3
22 first calculates the path latency based on the dependence / parallel operation suppression graph, and adds the calculation result to the dependence / parallel operation suppression graph (step 203).

【００４５】パスレイテンシとは、依存・並列動作抑制
グラフにおいて、各命令が最短で実行可能となるサイク
ル数（各ノードについて、そのノードが含まれる有向木
のルートに該当する命令からの最短実行可能サイクル
数）をいう。以後説明の都合上全ての命令の実行に要す
るサイクル数は１であると仮定しているが、これ以外の
場合でも本発明のコンパイラ命令並列化方式が適用可能
であることはいうまでもない。The path latency is defined as the number of cycles at which each instruction can be executed in the shortest time in the dependency / parallel operation suppression graph (for each node, the shortest execution time from the instruction corresponding to the root of the directed tree including the node) Number of possible cycles). Hereinafter, for convenience of explanation, it is assumed that the number of cycles required for execution of all instructions is 1, but it is needless to say that the compiler instruction parallelization method of the present invention can be applied to other cases.

【００４６】図３を参照すると、例えば命令Ｂは命令ａ
に依存しているので、命令ａの実行が完了しないと命令
Ｂが実行可能とならない。命令ａは依存している命令が
ないので、この依存・並列動作抑制グラフでは最初に実
行可能であり、この場合には、命令ａのパスレイテンシ
として０が与えられる。前述の仮定により命令ａの実行
に要するサイクル数は１であるから、命令Ｂのパスレイ
テンシは１となる。図３では、パスレイテンシを、各ノ
ードの近くにおけるカッコで囲まれない数で表してい
る。Referring to FIG. 3, for example, an instruction B is an instruction a
Therefore, the instruction B cannot be executed unless the execution of the instruction a is completed. Since the instruction a has no dependent instruction, it can be executed first in this dependence / parallel operation suppression graph. In this case, 0 is given as the path latency of the instruction a. Since the number of cycles required to execute the instruction a is 1 based on the above assumption, the path latency of the instruction B is 1. In FIG. 3, the path latency is represented by a number near parentheses and not enclosed in parentheses.

【００４７】続いて、パスレイテンシ・並列動作抑制数
計算部３２２は、依存・並列動作抑制グラフに基づいて
並列動作抑制数を計算し、その計算結果を依存・並列動
作抑制グラフに追加する（ステップ２０４）。Subsequently, the path latency / parallel operation suppression number calculation unit 322 calculates the parallel operation suppression number based on the dependence / parallel operation suppression graph, and adds the calculation result to the dependence / parallel operation suppression graph (step 204).

【００４８】並列動作抑制数とは、各命令について、そ
の命令と並列動作抑制関係にある命令の数をいう。した
がって、依存・並列動作抑制グラフでは、命令に対応す
るノードから出ている無向辺の数が並列動作抑制数に該
当する。図３を参照すると、例えば命令Ｂの並列動作抑
制数は１である。図３では、並列動作抑制数を、各ノー
ドの近くにおけるカッコで囲まれた数で表している。た
だし、カッコで囲まれた数が存在しない命令（ノード）
については、その命令の並列動作抑制数は０である。The number of parallel operation suppressions refers to the number of instructions having a parallel operation suppression relationship with each instruction. Therefore, in the dependence / parallel operation suppression graph, the number of undirected sides emerging from the node corresponding to the instruction corresponds to the parallel operation suppression number. Referring to FIG. 3, for example, the parallel operation suppression number of the instruction B is one. In FIG. 3, the number of parallel operation suppressions is represented by the number enclosed in parentheses near each node. However, instructions (nodes) that do not have a number enclosed in parentheses
, The parallel operation suppression number of the instruction is zero.

【００４９】スケジューリング部３３３は、以上のよう
にして最終的に構築された依存・並列動作抑制グラフに
基づいて、中間言語コードの命令の並べ替え（所定のス
ケジューリング）を行う（ステップ２０５）。The scheduling unit 333 rearranges the instructions of the intermediate language code (predetermined scheduling) based on the dependency / parallel operation suppression graph finally constructed as described above (step 205).

【００５０】スケジューリング部３３３の動作につい
て、図３と図４とを参照して説明する。説明の都合上、
目的コード４は、並列に動作する２個の演算ユニット
（これらを演算ユニットおよび演算ユニットとす
る）を持つ計算機向けの目的コードであると仮定する。
また、命令が実行される演算ユニットは命令毎に決まっ
ており、図３の依存・並列動作抑制グラフに含まれる命
令については、命令ａ，ｅ，ｆ，およびｈが演算ユニッ
トで実行され、命令Ｂ，Ｃ，Ｄ，およびＧが演算ユニ
ットで実行されるものと仮定する。The operation of the scheduling section 333 will be described with reference to FIGS. For the sake of explanation,
It is assumed that the object code 4 is an object code for a computer having two operation units operating in parallel (these are referred to as an operation unit and an operation unit).
Further, the operation unit in which the instruction is executed is determined for each instruction. For the instructions included in the dependence / parallel operation suppression graph of FIG. 3, the instructions a, e, f, and h are executed by the operation unit. Assume that B, C, D, and G are executed in an arithmetic unit.

【００５１】本発明では、並列なスケジューリングの方
法として、従来の技術であるリストスケジューリングを
基本とするが、本発明の特徴である並列動作抑制数を考
慮したスケジューリングが行われる。リストスケジュー
リングでは、依存・並列動作抑制グラフにおいて、スケ
ジューリング可能な命令（依存されている命令でスケジ
ューリングが未了なものがないような命令）の中でパス
レイテンシが最大の命令がスケジューリングの候補とさ
れ、逆向きに（後に実行される命令から順に）スケジュ
ーリングされた命令列が決定される。In the present invention, the parallel scheduling method is based on the conventional list scheduling, but the scheduling is performed in consideration of the parallel operation suppression number which is a feature of the present invention. In the list scheduling, in the dependence / parallel operation suppression graph, among the schedulable instructions (instructions whose scheduling is not completed among the dependent instructions), the instruction having the largest path latency is determined as a candidate for scheduling. , An instruction sequence scheduled in the reverse direction (in order from the instruction executed later) is determined.

【００５２】図４は、本実施例のコンパイラ命令並列化
方式によってスケジューリングされた命令列を示す図で
ある。図４の命令列は、最初のクロックで命令ａと命令
Ｄとが実行され、次のクロックで命令ｆと命令Ｇとが実
行され、さらに以下同様に各命令が実行されることを想
定してスケジューリングされた命令列である。このよう
な命令列を決定するにあたっては、前述のリストスケジ
ューリングの方法に従い、後に実行される命令、すなわ
ち命令ｅと命令Ｃとから始め、逆向きにスケジューリン
グされることになる。FIG. 4 is a diagram showing an instruction sequence scheduled by the compiler instruction parallelization method of this embodiment. The instruction sequence shown in FIG. 4 assumes that the instruction a and the instruction D are executed at the first clock, the instruction f and the instruction G are executed at the next clock, and thereafter, the respective instructions are executed similarly. This is a scheduled instruction sequence. In determining such an instruction sequence, in accordance with the above-described list scheduling method, scheduling is started in the order of instructions to be executed later, that is, the instruction e and the instruction C, and the scheduling is performed in the reverse direction.

【００５３】図３に戻り、図４に示す命令列がスケジュ
ーリングされる過程を示す。最初は、どの命令もスケジ
ューリングされていないため、スケジューリングの候補
となり得る命令は、命令Ｃ，ｅ，ｆ，およびｈ（有向辺
の開始ノードとなっていない命令）となる。このうちパ
スレイテンシが最大なものは命令Ｃであるので、まずこ
の命令Ｃがスケジュール（演算ユニットに対するスケ
ジュール）される。残りの３個の命令ｅ，ｆ，およびｈ
は全てパスレイテンシが１であり、また全て演算ユニッ
トで実行されるので、命令Ｃと資源競合を起こさない
ため、通常のリストスケジューリングの方法でこれらの
命令の中から任意に１個の命令が選択されることにな
る。Returning to FIG. 3, the process of scheduling the instruction sequence shown in FIG. 4 will be described. At first, since no instruction is scheduled, the instructions that can be candidates for scheduling are the instructions C, e, f, and h (instructions that are not the start nodes of the directed sides). Since the instruction C having the highest path latency is the instruction C, the instruction C is first scheduled (scheduled for the arithmetic unit). The remaining three instructions e, f, and h
Have a path latency of 1 and are all executed by the arithmetic unit, so that there is no resource contention with the instruction C. Therefore, one instruction can be arbitrarily selected from these instructions by a normal list scheduling method. Will be done.

【００５４】本発明の方法では、このようにパスレイテ
ンシが等しく、既にスケジュール済みの命令と資源競合
を起こさない命令がスケジュール候補となった場合に
は、並列動作抑制数が最大な命令が選択される。ただ
し、既にスケジュール済みの命令と並列動作抑制関係と
なる命令は除かれる。図３では、命令ｅの並列動作抑制
数が最大であるため、演算ユニットにはこの命令ｅが
実行されるようにスケジュールされる。According to the method of the present invention, when an instruction having the same path latency and causing no resource conflict with an already-scheduled instruction is a candidate for a schedule, the instruction having the maximum number of parallel operations suppressed is selected. You. However, instructions that have a parallel operation suppression relationship with already scheduled instructions are excluded. In FIG. 3, since the number of parallel operation suppressions of the instruction e is the maximum, the arithmetic unit is scheduled to execute the instruction e.

【００５５】以下同様にして、図４に示すスケジューリ
ング結果を得るためのスケジューリングが行われる。In the same manner, scheduling for obtaining the scheduling result shown in FIG. 4 is performed.

【００５６】図５に、本発明の方法によらず、演算ユニ
ットで命令ｅ，ｆ，およびｈの３個の命令から１個の
命令が選択される時点で、命令ｈが選択されるとした時
の残りのスケジューリング結果を示す。本発明の方法で
スケジュールした結果である図４のスケジューリング結
果に比べると、明らかに実行サイクル数が多くなってお
り、生成される目的コードの性能が低下していることが
分かる。FIG. 5 shows that the instruction h is selected when one of three instructions e, f and h is selected in the arithmetic unit, regardless of the method of the present invention. The remaining scheduling results of the time are shown. Compared to the scheduling result of FIG. 4, which is the result of the scheduling according to the method of the present invention, it can be seen that the number of execution cycles is clearly larger and the performance of the generated target code is lower.

【００５７】（２）本発明の第２の実施例図６は、本発明のコンパイラ命令並列化方式の第２の実
施例の構成を示すブロック図である。なお、コードスケ
ジューリング部７の構成は、図１（ｂ）に示すコードス
ケジューリング部３２の構成と同様である。(2) Second Embodiment of the Present Invention FIG. 6 is a block diagram showing the configuration of a second embodiment of the compiler instruction parallelization system of the present invention. The configuration of the code scheduling unit 7 is the same as the configuration of the code scheduling unit 32 shown in FIG.

【００５８】本実施例のコンパイラ命令並列化方式は、
フロントエンド２と、バックエンド５と、スケジュール
前目的コード６と、コードスケジューリング部７とを含
むコンパイラ２００によって実現される。コンパイラ２
００は、入力コード１を入力とし、目的コード４を出力
とする。The compiler instruction parallelization method of this embodiment is as follows.
This is realized by a compiler 200 including a front end 2, a back end 5, a pre-schedule target code 6, and a code scheduling unit 7. Compiler 2
00, the input code 1 is input and the object code 4 is output.

【００５９】フロントエンド２は、入力コード１である
プログラムの字句解析および構文・意味解析を行い、中
間言語コードを出力する。The front end 2 performs lexical analysis and syntax / semantic analysis of the program as the input code 1, and outputs an intermediate language code.

【００６０】バックエンド５は、中間言語最適化部３１
と、目的コード生成部３３とを含み、中間言語コードに
基づいてスケジュール前目的コード６を出力する。The back end 5 includes an intermediate language optimization unit 31
And a target code generation unit 33, and outputs the pre-schedule target code 6 based on the intermediate language code.

【００６１】コードスケジューリング部７は、スケジュ
ール前目的コード６の命令を並列実行性が高くなるよう
に並べ替える。The code scheduling unit 7 rearranges the instructions of the pre-schedule target code 6 so as to increase the parallel executability.

【００６２】図６に示されるように、コードスケジュー
リング部７は、図１に示された第１の実施例におけるバ
ックエンド３（本実施例ではバックエンド５）に含まれ
ず、バックエンド５によって出力されるスケジュール前
目的コード６を入力する位置に存在する。この点で、第
２の実施例の構成は第１の実施例の構成と異なる。As shown in FIG. 6, the code scheduling unit 7 is not included in the back end 3 (the back end 5 in this embodiment) in the first embodiment shown in FIG. It exists at the position where the pre-schedule purpose code 6 to be input is input. In this point, the configuration of the second embodiment is different from the configuration of the first embodiment.

【００６３】次に、このように構成された本実施例のコ
ンパイラ命令並列化方式の動作について説明する。Next, the operation of the compiler instruction parallelizing system according to the present embodiment thus configured will be described.

【００６４】第１の実施例においては、コードスケジュ
ーリング部３２は中間言語最適化部３１が出力する中間
言語コードに対してコードスケジューリング（上述した
ような本発明に特有のコードスケジューリング）を行っ
ていた。これに対して、本実施例（第２の実施例）のコ
ンパイラ命令並列化方式においては、コードスケジュー
リング部７はバックエンド５が出力するスケジュール前
目的コード６に対して所定のコードスケジューリングを
行う。その他の動作については、第１の実施例と第２の
実施例とにおいて、同じである。In the first embodiment, the code scheduling unit 32 performs code scheduling (the above-described code scheduling unique to the present invention) on the intermediate language code output from the intermediate language optimizing unit 31. . On the other hand, in the compiler instruction parallelization method of the present embodiment (second embodiment), the code scheduling unit 7 performs predetermined code scheduling on the pre-schedule target code 6 output from the back end 5. Other operations are the same in the first embodiment and the second embodiment.

【００６５】[0065]

【発明の効果】以上説明したように、本発明によると、
並列動作抑制関係を反映させた依存・並列動作抑制グラ
フを構築すると共に、パスレイテンシおよび並列動作抑
制数を計算することによって、並列コードスケジューリ
ングの効率が向上し、目的コードの実行性能が向上する
という効果が生じる。As described above, according to the present invention,
By constructing a dependency / parallel operation suppression graph that reflects the parallel operation suppression relationship and calculating the path latency and the number of parallel operation suppressions, the efficiency of parallel code scheduling is improved, and the execution performance of the target code is improved. The effect occurs.

【００６６】このような効果が生じる理由は、依存・並
列動作抑制グラフに基づいて並列動作抑制数の大きい命
令を先にスケジュールすることによって、スケジューリ
ング処理の後の段階で並列動作抑制関係によるスケジュ
ーリング制約が発生するのを防ぐことができるためであ
る。The reason why such an effect occurs is that an instruction having a large parallel operation suppression number is scheduled first based on the dependence / parallel operation suppression graph, so that the scheduling constraint due to the parallel operation suppression relation at a later stage of the scheduling process. This is because it is possible to prevent the occurrence of.

[Brief description of the drawings]

【図１】本発明のコンパイラ命令並列化方式の第１の実
施例の構成を示すブロック図である（特に、（ｂ）はコ
ードスケジューリング部の詳細な構成を示すブロック図
である）。FIG. 1 is a block diagram showing a configuration of a first embodiment of a compiler instruction parallelization system according to the present invention (particularly, FIG. 1 (b) is a block diagram showing a detailed configuration of a code scheduling unit).

【図２】図１（ｂ）に示すコードスケジューリング部の
処理を示す流れ図である。FIG. 2 is a flowchart showing processing of a code scheduling unit shown in FIG. 1 (b).

【図３】図１（ｂ）に示すコードスケジューリング部が
生成する有向グラフ（依存・並列動作抑制グラフ）の一
例を示す図である。FIG. 3 is a diagram illustrating an example of a directed graph (a dependency / parallel operation suppression graph) generated by a code scheduling unit illustrated in FIG.

【図４】図１（ｂ）中のスケジューリング部によるスケ
ジューリング結果である出力コードの一例を示す図であ
る。FIG. 4 is a diagram showing an example of an output code as a result of scheduling by a scheduling unit in FIG. 1 (b).

【図５】図１（ａ）に示すコンパイラ命令並列化方式の
動作を説明するための図であり、本発明のコンパイラ命
令並列化方式を使用しないと仮定した時のスケジューリ
ング結果である出力コードの一例を示す図である。FIG. 5 is a diagram for explaining the operation of the compiler instruction parallelization method shown in FIG. 1 (a), and is a scheduling result of an output code which is a scheduling result when the compiler instruction parallelization method of the present invention is not used; It is a figure showing an example.

【図６】本発明のコンパイラ命令並列化方式の第２の実
施例の構成を示すブロック図である。FIG. 6 is a block diagram showing a configuration of a second embodiment of the compiler instruction parallelization system of the present invention.

[Explanation of symbols]

１入力コード２フロントエンド３，５バックエンド４目的コード６スケジュール前目的コード７，３２コードスケジューリング部３１中間言語最適化部３３目的コード生成部１００，２００コンパイラ３２１依存・並列動作抑制グラフ生成部３２２パスレイテンシ・並列動作抑制数計算部３２３スケジューリング部 DESCRIPTION OF SYMBOLS 1 Input code 2 Front end 3, 5 Back end 4 Object code 6 Object code before schedule 7, 32 Code scheduling part 31 Intermediate language optimization part 33 Object code generation part 100, 200 Compiler 321 Dependence and parallel operation suppression graph generation part 322 Path Latency / Parallel Operation Suppression Number Calculation Unit 323 Scheduling Unit

Claims

[Claims]

1. A dependency / parallel operation suppression graph generation unit that generates a dependence / parallel operation suppression graph that is a directed graph expressing a dependency relationship between instructions and a parallel operation suppression relationship, and the dependency / parallel operation suppression graph generation unit A path latency / parallel operation suppression number calculating unit that calculates a path latency and a number of parallel operation suppressions based on the generated dependency / parallel operation suppression graph, and adds the calculation result to the dependence / parallel operation suppression graph; A compiler instruction having a dependency in which the calculation results of the path latency and the number of parallel operation suppressions are added by the parallel operation suppression number calculation unit; and a scheduling unit for performing parallel scheduling of codes based on the parallel operation suppression graph. Parallelization method.

2. A path latency of each instruction is determined by tracing a node from a root of a directed tree to a directed edge with respect to the dependency / parallel operation suppression graph generated by the dependence / parallel operation suppression graph generation unit. The path latency / parallel operation suppression number calculation unit that determines the number of parallel operation suppressions by calculating the number of undirected sides emerging from each node. 2. The compiler instruction parallelizing method according to claim 1, further comprising: the scheduling unit that schedules an instruction having a path latency in order from an instruction having a large parallel operation suppression number.

3. The compiler instruction parallelizing method according to claim 1, further comprising the scheduling unit for performing scheduling on the intermediate language code.

4. The compiler instruction parallelizing method according to claim 1, further comprising the scheduling section for performing scheduling on a pre-schedule object code.