JP2001075816A

JP2001075816A - Device and method for converting program, device for preparing program, and program recording medium

Info

Publication number: JP2001075816A
Application number: JP25395999A
Authority: JP
Inventors: Yorihisa Fujinami; 順久藤波
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 1999-09-08
Filing date: 1999-09-08
Publication date: 2001-03-23

Abstract

PROBLEM TO BE SOLVED: To apply superoptimization to a program including such loop processing as to perform repetitive floating-point operation for an array variable. SOLUTION: When a program is inputted, the command line argument of the program is first subjected to a syntax analysis, a DAG(directed acyclic graph) representing an assignment statement is generated, and all operations for a loop by two times are made displayable. Next, all instruction strings corresponding to the loop within one time are generated and defined as candidates for setup instruction strings that can be arranged outside the loop. All candidates for intra-loop instruction strings realizing intra-loop processing for the setup instruction strings are also generated. The combination of the whole setup instruction strings that can be executed with the minimum number of clocks and each candidate of the intra-loop instruction strings is displayed as a solution of program conversion. For the decision of the number of clocks, it is applied only to an intra-loop instruction string, and a setup instruction string is made to be outside an evaluation object.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、元のプログラムと
同じ働きをするマシン命令列に変換するプログラム変換
装置及び方法、プログラム製造装置、並びにプログラム
記録媒体に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a program conversion apparatus and method for converting a program into a machine instruction sequence having the same function as an original program, a program manufacturing apparatus, and a program recording medium.

【０００２】更に詳しくは、本発明は、組み合わせ可能
なあらゆる命令列を全探索して命令列を最適化する、す
なわちプログラムの「超最適化」を行うプログラム変換
装置及び方法、プログラム製造装置、並びにプログラム
記録媒体に係り、特に、配列変数に対して繰り返し浮動
小数点演算を行うループ処理に対して超最適化を行うプ
ログラム変換装置及び方法、プログラム製造装置、並び
にプログラム記録媒体に関する。More specifically, the present invention relates to a program conversion apparatus and method, a program manufacturing apparatus, and a program conversion method for optimizing an instruction sequence by performing a full search for all possible instruction sequences, that is, performing "super-optimization" of a program. The present invention relates to a program recording medium, and more particularly, to a program conversion apparatus and method, a program manufacturing apparatus, and a program recording medium for performing super-optimization on a loop process of repeatedly performing a floating-point operation on an array variable.

【０００３】[0003]

【従来の技術】昨今における半導体設計技術及び製造技
術の急速な進歩により、米インテル社が市販する”Ｐｅ
ｎｔｉｕｍ”シリーズを始めとする、強力な演算能力を
提供するマイクロプロセッサが登場している。マイクロ
プロセッサは、ワークステーションやパーソナル・コン
ピュータ（ＰＣ）などの計算機システムの中央演算装置
（ＣＰＵ）として利用される。2. Description of the Related Art With the rapid progress of semiconductor design technology and manufacturing technology in recent years, "Pe" sold by Intel
Microprocessors have emerged that provide powerful computing capabilities, including the "ntium" series. Microprocessors are used as central processing units (CPUs) in computer systems such as workstations and personal computers (PCs). You.

【０００４】プロセッサは、一般に、自己が処理するこ
とができる多数の命令（ｉｎｓｔｒｕｃｔｉｏｎ）、す
なわち命令セットを備えている（但し、命令の種類や数
はプロセッサの機種に応じて区々である）。計算機シス
テム上で実行されるソフトウェア・プログラムは、実際
には、命令セットに含まれる１以上の命令の組み合わせ
として表現される。A processor generally has a number of instructions, ie, an instruction set, which can be processed by the processor itself (however, the type and number of instructions vary depending on the type of the processor). A software program executed on a computer system is actually represented as a combination of one or more instructions included in an instruction set.

【０００５】実行順序に並べられた命令は「命令列」と
も呼ばれる。プログラマがコーディングした直後のプロ
グラム・コードは、一般に、冗長性が存在していたり、
遅延を生じる命令並び等が含まれていることがあり、そ
のままプロセッサに投入して実行すると非効率となるこ
とが多い。そこで、プログラム・コード中から冗長コー
ド等の無駄を削除し、プロセッサに適した効率的な命令
列を生成することが一般に行われるが、このような処理
のことを「最適化（ｏｐｔｉｍｉｚａｔｉｏｎ）」と言
う。最適化には、プログラム実行時間の短縮と、所要記
憶容量の節約という２つの意味を持つが、本明細書中で
は特に前者を扱うものとする。[0005] Instructions arranged in the execution order are also called "instruction strings". Program code immediately after being coded by a programmer generally has redundancy,
An instruction sequence that causes a delay may be included, and if it is input to a processor as it is and executed, it is often inefficient. Therefore, it is common practice to eliminate waste such as redundant code from the program code and generate an efficient instruction sequence suitable for the processor. Such a process is referred to as “optimization”. To tell. The optimization has two meanings, that is, a reduction in the program execution time and a reduction in the required storage capacity.

【０００６】近年のマイクロプロセッサの動作は、パイ
プライン動作（すなわち、命令の読み出し、解読、デー
タの読み出し、命令の実行といった具合に処理を細分化
して、複数の命令をオーバーラップさせて処理する方
式）や動的スケジューリング、分岐予測（分岐する方向
を予め予測して、予測に基づいたアドレスの命令をパイ
プラインに投入することで、パイプライン・バブルを抑
止する方式）などにより、非常に複雑化してきた。これ
に伴なって、プログラムの最適化を行うためのアルゴリ
ズムにも、複雑さが要求されるようになってきている。In recent years, the operation of a microprocessor is a pipeline operation (that is, a method of processing by subdividing processing such as instruction reading, decoding, data reading, and instruction execution by overlapping a plurality of instructions). ), Dynamic scheduling, and branch prediction (a method in which the direction of branching is predicted in advance and an instruction at an address based on the prediction is input to the pipeline to suppress pipeline bubbles), and the like, which is extremely complicated. I've been. Along with this, complexity is also required for algorithms for optimizing programs.

【０００７】旧来の最適化は、ヒューリスティックな手
法、すなわちプログラマの経験則に頼って行われてき
た。最近では、命令列を全探索して最適化する「超最適
化」に関する試みが幾つかなされている。「超最適化」
とは、組み合わせ可能な全ての命令列を生成し、各々の
命令列についての命令実行のシミュレーション結果に基
づいて最適な命令列を決定するという方法である。超最
適化のための全探索は、計算機資源を駆使して実行され
るので、熟練したアセンブリ言語プログラムでなければ
思いつかないような（意外性のある）命令列が生成され
る（この点、元のプログラム・コードから容易に予想が
つく命令列しか生成される旧来の「最適化」とは相違す
る）。[0007] Traditional optimization has been performed using a heuristic approach, ie, a programmer's rule of thumb. Recently, several attempts have been made with respect to “super-optimization” of traversing and optimizing an instruction sequence. "Super optimization"
Is a method of generating all instruction sequences that can be combined and determining an optimal instruction sequence based on a simulation result of instruction execution for each instruction sequence. Since the full search for hyper-optimization is performed using computer resources, an (unexpected) instruction sequence that cannot be thought of without a skilled assembly language program is generated. (This is different from the conventional "optimization" in which only a sequence of instructions that can be easily predicted from the program code is generated.)

【０００８】例えば、１９８７年にＨｅｎｒｙＭａｓ
ｓａｌｉｎ氏が提案した超最適化器（ｓｕｐｅｒｏｐｔ
ｉｍｉｚｅｒ）は、与えられた関数を計算する命令数が
最小となるプログラム（マシン命令列）を見つけるもの
である。すなわち、マシン語で記述されたプログラムを
入力として受け取り、可能なプログラムを全て調べて、
元のプログラムと同じ機能を持つものを探し出す。但
し、探索空間は、マシン命令の部分集合の列、すなわ
ち、命令セットの中から問題に応じて人手で選ぶように
しているので、完全な全探索ではない。この超最適化器
は、まず長さ１の命令列、次に長さ２の命令列、…とい
った具合に、短い方から順に命令列を生成していく。命
令列が元のプログラムと同じ働きをするかどうかの判定
は、命令列を実際に実行して、人手で選んだ幾つかの入
力に対して同じ結果を出力するかどうかをテストするこ
とで行う。For example, in 1987 Henry Mas
The super-optimizer proposed by Mr. salin (superopt
“imizer” is for finding a program (machine instruction sequence) that minimizes the number of instructions for calculating a given function. That is, it receives a program written in machine language as input, checks all possible programs,
Find one that has the same function as the original program. However, since the search space is manually selected from a sequence of a subset of machine instructions, that is, an instruction set, depending on the problem, it is not a complete full search. This super-optimizer generates an instruction sequence of length 1, first an instruction sequence of length 2, and so on, and so on, in order from the shorter one. Determining whether an instruction sequence works the same as the original program is done by actually executing the instruction sequence and testing whether some manually selected inputs produce the same result. .

【０００９】Ｍａｓｓａｌｉｎ氏の超最適化器について
は、例えば、Ｍａｓｓａｌｉｎ，Ｈ．著”ＡＬｏｏｋ
ａｔｔｈｅＳｍａｌｌｅｓｔＰｒｏｇｒａｍ”
（ＰｒｏｃｅｅｄｉｎｇｏｆｔｈｅＳｅｃｏｎｄ
ＩｎｄｕｓｔｒｉａｌＣｏｎｆｅｒｅｎｃｅｏｎ
ＡｒｃｈｉｔｅｃｔｕｒａｌＳｕｐｐｏｒｔｆｏｒ
ＰｒｏｇｒａｍｍｉｎｇＬａｎｇｕａｇｅｓａｎ
ｄＯｐｅｒａｔｉｎｇＳｙｓｔｅｍ．１９８７，ｐ
ｐ．１２２−１２６，Ａｌｓｏａｐｐｅａｒｅｄｉ
ｎＳＩＧＰＬＡＮＮＯＴＩＣＥＳ．Ｖｏｌ．２２，
Ｎｏ．１０）に記載されている。[0009] Massalin's hyper-optimizer is described, for example, in Massalin, H .; Written by "A Look"
at the Smallest Program "
(Proceeding of the Second
Industrial Conference on
Architectural Support for
Programming Languages an
d Operating System. 1987, p.
p. 122-126, Also applied i
n SIGPLAN NOTICES. Vol. 22,
No. 10).

【００１０】また、１９９２年にＴｏｒｂｊｏｒｎＧ
ｒａｎｌｕｎｄ氏らが提案したＧＮＵＳｕｐｅｒｏｐ
ｔｉｍｉｚｅｒでは、Ｃ言語で記述された関数を超最適
化器に投入し、生成された命令列を直接実行する代わり
にシミュレーションを行うことで、その移植性を高め
た。また、値が未定義のレジスタを入力とする命令を生
成しないなどの探索ルールを適用するなどの改良を加え
た。In 1992, Torbjorn G.
GNU Superop proposed by ranlund et al.
In the timizer, the portability is improved by inputting a function described in C language to a hyper-optimizer and performing a simulation instead of directly executing a generated instruction sequence. Improvements were also made, such as applying search rules such as not generating instructions that take registers with undefined values as inputs.

【００１１】ＧＮＵＳｕｐｅｒｏｐｔｉｍｉｚｅｒに
ついては、例えば、Ｇｒａｎｌｕｎｄ，Ｔ．及び、Ｋｅ
ｎｎｅｒ，Ｒ共著の”ＥｌｉｍｉｎａｔｉｎｇＢｒａ
ｎｃｈｅｓｕｓｉｎｇａＳｕｐｅｒｏｐｔｉｍｉ
ｚｅｒａｎｄｔｈｅＧＮＵＣＣｏｍｐｉｌｅ
ｒ”（ＰｒｅｃｅｄｉｎｇｓｏｆｔｈｅＳＩＧＰ
ＬＡＮ‘９２ＣｏｎｆｅｒｅｎｃｅｏｎＰｒｏｇ
ｒａｍｍｉｎｇＬａｎｇｕａｇｅＤｅｓｉｇｎａ
ｎｄＩｎｐｌｅｍｅｎｔａｔｉｏｎ，１９９２，ｐ
ｐ．３４１−３５２）に記載されている。The GNU Superoptimizer is described, for example, in Granlund, T .; And Ke
"Eliminating Bra", co-authored by Nner and R
nches using a Superoptimi
zero and the GNU C Compile
r "(Precedings of the SIGP
LAN'92 Conference on Prog
Ramming Language Design a
nd Implementation, 1992, p
p. 341-352).

【００１２】また、本出願人に既に譲渡されている特願
平１０−２５３６２７号明細書では、命令実行パイプラ
インを考慮して、浮動小数点命令の超最適化を行うシス
テム（以下、このシステムのことを”ＳｕｐｅｒＦＬ
Ｄ”とも呼ぶ）について提案している。上述のＨ．Ｍａ
ｓｓａｌｉｎ氏やＴ．Ｇｒａｎｌｕｎｄ氏が提案する超
最適化器はいずれも、命令数が最も少ない命令列が最適
（すなわち実行時間が最短）になる、という前提で命令
列の全探索を行うものであった。これに対し、Ｓｕｐｅ
ｒＦＬＤは、かかる前提には立たず、候補となる全ての
命令列に対して命令実行パイプラインの実行タイミング
のシミュレーションを行うことで、実行時間が最短とな
る命令を実際に見つけ出すこととした。また、Ｓｕｐｅ
ｒＦＬＤでは、命令列が正しいかどうか、すなわち元の
プログラムと同じ働きをするかどうかの判定は、記号実
行により行う。Also, in the specification of Japanese Patent Application No. 10-253627 already assigned to the present applicant, a system for super-optimizing a floating-point instruction in consideration of an instruction execution pipeline (hereinafter referred to as a system of this system). That is "SuperFL"
D "). H. Ma described above
Ssalin and T.S. All of the super-optimizers proposed by Granlund perform a full search of the instruction sequence on the premise that the instruction sequence with the smallest number of instructions is optimal (that is, the execution time is the shortest). In contrast, Supe
The rFLD does not meet such a premise, and simulates the execution timing of the instruction execution pipeline for all candidate instruction sequences, thereby actually finding the instruction with the shortest execution time. In addition, Supe
In the rFLD, the determination whether the instruction sequence is correct, that is, whether the instruction sequence has the same function as the original program is performed by symbol execution.

【００１３】[0013]

【発明が解決しようとする課題】上記［従来の技術］の
欄に記載したいずれの超最適化器も、扱える命令列は、
条件分岐やループを含まないことを前提としていた。The instruction sequence that can be handled by any of the hyper-optimizers described in the section of [Prior Art] is:
It was assumed that conditional branches and loops were not included.

【００１４】しかしながら、プログラム命令実行上、真
の最適化を考えた場合、ループが最もクリティカルとな
る。何故ならば、ループとはすなわち同じ処理を何度も
繰り返し実行することを意味するため、最適化の効果が
繰り返し回数の分だけ加算され、非常に影響力が大きい
からである。However, when real optimization is considered in execution of a program instruction, a loop is most critical. This is because the loop means that the same processing is repeatedly executed many times, and therefore, the effect of the optimization is added by the number of times of the repetition, and the influence is very large.

【００１５】特に、近年のパイプライン化されたプロセ
ッサでは、ループ処理における前回の繰り返しの最後を
次回の繰り返しの最初とオーバーラップさせて実行でき
るように命令列を配置する最適化、すなわち「ソフトウ
ェア・パイプライニング」によってループの実行時間を
短縮することができる。In particular, in recent pipelined processors, optimization of arranging instruction sequences so that the end of the previous iteration in the loop processing can be executed so as to overlap the beginning of the next iteration, ie, "software / software""Pipelining" can reduce the loop execution time.

【００１６】本発明は上述したような技術的課題を勘案
したものであり、その目的は、元のプログラムと同じ働
きをする最適化されたマシン命令列に変換することがで
きる、優れたプログラム変換装置及び方法、プログラム
製造装置、並びにプログラム記録媒体を提供することに
ある。The present invention has been made in view of the above-mentioned technical problems, and has as its object to provide an excellent program conversion which can be converted into an optimized machine instruction sequence having the same function as the original program. An object is to provide an apparatus and a method, a program manufacturing apparatus, and a program recording medium.

【００１７】本発明の更なる目的は、組み合わせ可能な
あらゆる命令列を全探索して命令列を最適化する、すな
わち「超最適化」を行うことができる、優れたプログラ
ム変換装置及び方法、プログラム製造装置、並びにプロ
グラム記録媒体を提供することにある。It is a further object of the present invention to provide an excellent program conversion apparatus, method, and program capable of optimizing an instruction sequence by fully searching for any instruction sequence that can be combined, that is, performing "super-optimization". A manufacturing apparatus and a program recording medium are provided.

【００１８】本発明の更なる目的は、配列変数に対して
繰り返し浮動小数点演算を行うループ処理に対して超最
適化を行うことができる、優れたプログラム変換装置及
び方法、プログラム製造装置、並びにプログラム記録媒
体を提供することにある。It is a further object of the present invention to provide an excellent program conversion apparatus and method, a program manufacturing apparatus, and a program, which can super-optimize a loop processing for repeatedly performing floating-point operations on array variables. It is to provide a recording medium.

【００１９】[0019]

【課題を解決するための手段】本発明は、上記課題を参
酌してなされたものであり、その第１の側面は、ループ
処理を含むプログラムを元のプログラムと同じ働きをす
る命令列に変換するためのプログラム変換方法又は装置
であって、ループ外に配置可能なセットアップ命令列の
候補を生成するステップ又は手段と、ループ内の処理を
実現するループ内命令列の候補を生成するステップ又は
手段と、ループ内命令列の候補が所定時間内で実行でき
たか否かを判定するステップ又は手段と、前記の判定す
るステップの肯定的な結果に応答してセットアップ命令
列の候補とループ内命令列の候補の組み合わせをプログ
ラム変換の解として出力するステップ又は手段と、を具
備することを特徴とするプログラム変換方法又は装置で
ある。SUMMARY OF THE INVENTION The present invention has been made in consideration of the above problems, and a first aspect of the present invention is to convert a program including a loop process into an instruction sequence having the same function as the original program. And / or means for generating a candidate for a setup instruction sequence that can be arranged outside a loop, and a step or means for generating a candidate for an in-loop instruction sequence that realizes processing in a loop A step or means for determining whether or not a candidate for the in-loop instruction sequence has been executed within a predetermined time; and a candidate for the set-up instruction sequence and the in-loop instruction sequence in response to a positive result of the determining step. Or a step of outputting a combination of the candidates as a solution of program conversion.

【００２０】また、本発明の第２の側面は、ループ処理
を含むプログラムを元のプログラムと同じ働きをする命
令列に変換するためのプログラム変換方法又は装置であ
って、（ａ）ループ実行時間の上限を設定するステップ
又は手段と、（ｂ）ループ外に配置可能なセットアップ
命令列の候補を生成するステップ又は手段と、（ｃ）ル
ープ内の処理を実現するループ内命令列の候補を生成す
るステップ又は手段と、（ｄ）ループ内命令列の候補が
前記ループ実行時間の上限以内で実行できたか否かを判
定するステップ又は手段と、（ｅ）前記ステップ又は手
段（ｄ）における肯定的な結果に応答して、セットアッ
プ命令列の候補とループ内命令列の候補の組み合わせを
プログラム変換の解として出力するステップ又は手段
と、を繰り返し実行するステップ又は手段と、を具備す
ることを特徴とするプログラム変換方法又は装置であ
る。According to a second aspect of the present invention, there is provided a program conversion method or apparatus for converting a program including loop processing into an instruction sequence having the same function as the original program, comprising: (B) generating a setup instruction sequence candidate that can be arranged outside the loop, and (c) generating a loop instruction sequence candidate that realizes processing in the loop. (D) determining whether or not the candidate for the instruction sequence in the loop has been executed within the upper limit of the loop execution time; and (e) affirmative in the step or means (d). Or a step of outputting a combination of a candidate for the setup instruction sequence and a candidate for the instruction sequence in the loop as a solution of the program conversion in response to the A step or means that a program conversion method or apparatus characterized by comprising a.

【００２１】本発明の第２の側面に係るプログラム変換
方法又は装置は、さらに、（ｆ）前記ループ時間の上限
内ではプログラム変換の解が得られなかったことに応答
して、前記ループ実行時間の上限に所定の単位時間を加
算して、前記ステップ又は手段（ｂ）以降の処理を繰り
返し実行するステップ又は手段を備えていてもよい。[0021] The program conversion method or apparatus according to the second aspect of the present invention further comprises: (f) responding to the fact that no solution for the program conversion is obtained within the upper limit of the loop time. A predetermined unit time may be added to the upper limit of step (b), and the step or step (b) and subsequent steps may be repeatedly executed.

【００２２】また、本発明の第３の側面は、プロセッサ
において実行されるループ処理を含むプログラムを元の
プログラムと同じ働きをする命令列に変換するためのプ
ログラム変換方法又は装置であって、（Ａ）ループ実行
クロック数の上限を設定するステップ又は手段と、
（Ｂ）ループ外に配置可能なセットアップ命令列の候補
を生成するステップ又は手段と、（Ｃ）ループ実行後に
期待されるプロセッサのレジスタの状態を求めて保存す
るステップ又は手段と、（Ｄ）ループ内の処理を実現す
るループ内命令列の候補を生成するステップ又は手段
と、（Ｅ）ループ内命令列の候補を実行した結果と前記
ステップ又は手段（Ｃ）において保存したレジスタの状
態とを比較して、ループ内命令列の候補が前記ループ実
行クロック数の上限以内で実行できたか否かを判定する
ステップ又は手段と、（Ｆ）前記ステップ又は手段
（Ｅ）における肯定的な結果に応答して、セットアップ
命令列の候補とループ内命令列の候補の組み合わせをプ
ログラム変換の解として出力するステップ又は手段と、
（Ｇ）前記ループ実行クロック数の上限ではプログラム
変換の解が得られなかったことに応答して、前記ループ
実行クロック数の上限を１だけ増分して、前記ステップ
又は手段（Ｂ）以降の処理を繰り返し実行するステップ
又は手段と、を具備することを特徴とするプログラム変
換方法又は装置である。According to a third aspect of the present invention, there is provided a program conversion method or apparatus for converting a program including loop processing executed by a processor into an instruction sequence having the same function as an original program, A) a step or means for setting an upper limit for the number of loop execution clocks;
(B) a step or means for generating a setup instruction sequence candidate that can be arranged outside the loop; (C) a step or means for obtaining and storing a state of a register of a processor expected after execution of the loop; and (D) a loop. And (E) comparing the result of executing the candidate of the instruction sequence in the loop with the state of the register stored in the step or means (C). And (F) responding to an affirmative result in the step or means (E) to determine whether or not the candidate for the instruction sequence in the loop could be executed within the upper limit of the number of loop execution clocks. Outputting a combination of a candidate for the setup instruction sequence and a candidate for the instruction sequence in the loop as a solution of the program conversion,
(G) In response to a program conversion solution not being obtained at the upper limit of the number of loop execution clocks, the upper limit of the number of loop execution clocks is incremented by one, and the processing after the step or means (B) is performed. And a step or means for repeatedly executing the method.

【００２３】また、本発明の第４の側面は、ループ処理
を含むプログラムを元のプログラムと同じ働きをする命
令列に変換するためのプログラム変換方法又は装置であ
って、元のプログラムのコマンドライン引数を構文解析
して、代入文を表すＤＡＧ（ＤｉｒｅｃｔｅｄＡｃｙ
ｃｌｉｃＧｒａｐｈ：向きがあるが輪がないグラフ）
を生成して、ループ２回分の操作を全て表示可能にする
ステップ又は手段と、ループ１回分以内に相当する全て
の命令列を生成して、ループ外に配置可能なセットアッ
プ命令列の候補とするステップ又は手段と、セットアッ
プ命令列に対するループ内処理を実現するループ内命令
列の候補を全て生成するステップ又は手段と、所定クロ
ック数の上限以内で実行できる全てのセットアップ命令
列とループ内命令列の各候補の組み合わせをプログラム
変換の解とするステップ又は手段と、を具備することを
特徴とするプログラム変換方法又は装置である。According to a fourth aspect of the present invention, there is provided a program conversion method or apparatus for converting a program including a loop process into an instruction sequence having the same function as the original program. The argument is parsed, and a DAG (Directed Acy) representing an assignment statement is analyzed.
(clic Graph: a graph that has a direction but no ring)
And a step or means for displaying all operations for two loops, and all instruction sequences corresponding to one loop or less are generated and set as candidates for a setup instruction sequence that can be arranged outside the loop. Steps or means, steps or means for generating all the candidates for the in-loop instruction sequence for realizing the in-loop processing for the setup instruction sequence, and all the setup instruction sequences and the in-loop instruction sequence that can be executed within the upper limit of the predetermined number of clocks A step or means for converting a combination of each candidate into a program conversion solution.

【００２４】また、本発明の第５の側面は、ループ処理
を含むプログラムを基に同じ働きをする命令列からなる
プログラムを製造するためのプログラム製造装置であっ
て、ループ外に配置可能なセットアップ命令列の候補を
生成する手段と、ループ内の処理を実現するループ内命
令列の候補を生成する手段と、ループ内命令列の候補が
所定時間内で実行できたか否かを判定する手段と、前記
の判定するステップの肯定的な結果に応答してセットア
ップ命令列の候補とループ内命令列の候補の組み合わせ
をプログラム変換の解として出力する手段と、を具備す
ることを特徴とするプログラム製造装置である。According to a fifth aspect of the present invention, there is provided a program manufacturing apparatus for manufacturing a program consisting of an instruction sequence having the same function based on a program including a loop process, wherein the setup apparatus can be arranged outside a loop. Means for generating a candidate for an instruction sequence, means for generating a candidate for an in-loop instruction sequence that realizes processing in a loop, and means for determining whether or not the candidate for an in-loop instruction sequence has been executed within a predetermined time. Means for outputting a combination of a candidate for a setup instruction sequence and a candidate for an in-loop instruction sequence as a solution for program conversion in response to a positive result of the determining step. Device.

【００２５】また、本発明の第６の側面は、ループ処理
を含むプログラムを基に同じ働きをする命令列からなる
プログラムを製造するためのプログラム製造装置であっ
て、（ａ）ループ実行時間の上限を設定する手段と、
（ｂ）ループ外に配置可能なセットアップ命令列の候補
を生成する手段と、（ｃ）ループ内の処理を実現するル
ープ内命令列の候補を生成する手段と、（ｄ）ループ内
命令列の候補が前記ループ実行時間の上限以内で実行で
きたか否かを判定する手段と、（ｅ）前記手段（ｄ）に
おける肯定的な結果に応答して、セットアップ命令列の
候補とループ内命令列の候補の組み合わせを製造プログ
ラムとして出力する手段と、を具備することを特徴とす
るプログラム製造装置である。According to a sixth aspect of the present invention, there is provided a program manufacturing apparatus for manufacturing a program comprising an instruction sequence having the same function based on a program including a loop process, wherein (a) the loop execution time is reduced. Means for setting an upper limit,
(B) means for generating a candidate for a setup instruction sequence that can be arranged outside a loop; (c) means for generating a candidate for an instruction sequence in a loop that realizes processing in a loop; Means for determining whether or not the candidate has been executed within the upper limit of the loop execution time; and (e) responding to the positive result in the means (d), the candidate for the set-up instruction sequence and the Means for outputting a combination of candidates as a manufacturing program.

【００２６】本発明の第６の側面に係るプログラム製造
装置は、さらに、（ｆ）前記ループ時間の上限内ではプ
ログラム変換の解が得られなかったことに応答して、前
記ループ実行時間の上限に所定の単位時間を加算して、
前記手段（ｂ）に再投入する手段を備えていてもよい。The program manufacturing apparatus according to a sixth aspect of the present invention further comprises: (f) responding to the fact that no solution for program conversion was obtained within the upper limit of the loop time, And add a predetermined unit time to
A means for re-entering the means (b) may be provided.

【００２７】また、本発明の第７の側面は、プロセッサ
において実行されるループ処理を含むプログラムを基に
同じ働きをする命令列からなるプログラムを製造するた
めのプログラム製造装置であって、（Ａ）ループ実行ク
ロック数の上限を設定する手段と、（Ｂ）ループ外に配
置可能なセットアップ命令列の候補を生成する手段と、
（Ｃ）ループ実行後に期待されるプロセッサのレジスタ
の状態を求めて保存する手段と、（Ｄ）ループ内の処理
を実現するループ内命令列の候補を生成する手段と、
（Ｅ）ループ内命令列の候補を実行した結果と前記手段
（Ｃ）において保存したレジスタの状態とを比較して、
ループ内命令列の候補が前記ループ実行クロック数の上
限以内で実行できたか否かを判定する手段と、（Ｆ）前
記手段（Ｅ）における肯定的な結果に応答して、セット
アップ命令列の候補とループ内命令列の候補の組み合わ
せを製造プログラムとして出力する手段と、（Ｇ）前記
ループ実行クロック数の上限ではプログラム変換の解が
得られなかったことに応答して、前記ループ実行クロッ
ク数の上限を１だけ増分して、前記手段（Ｂ）に再投入
する手段と、を具備することを特徴とするプログラム製
造装置である。According to a seventh aspect of the present invention, there is provided a program manufacturing apparatus for manufacturing a program including an instruction sequence having the same function based on a program including a loop process executed in a processor, comprising: Means for setting the upper limit of the number of loop execution clocks; means for generating (B) candidates for a setup instruction sequence that can be arranged outside the loop;
(C) means for obtaining and storing the state of the register of the processor expected after execution of the loop, (D) means for generating a candidate for a sequence of instructions in the loop for realizing processing in the loop,
(E) comparing the result of executing the instruction sequence candidate in the loop with the state of the register stored in the means (C),
Means for determining whether or not the candidate for the instruction sequence in the loop has been executed within the upper limit of the number of loop execution clocks; and (F) a candidate for the setup instruction sequence in response to a positive result in the means (E). And (G) responding to the fact that no program conversion solution was obtained at the upper limit of the number of loop execution clocks, Means for incrementing the upper limit by one and re-entering the means (B).

【００２８】また、本発明の第８の側面は、ループ処理
を含むプログラムを基に同じ働きをする命令列からなる
プログラムを製造するためのプログラム製造装置であっ
て、元のプログラムのコマンドライン引数を構文解析し
て、代入文を表すＤＡＧ（ＤｉｒｅｃｔｅｄＡｃｙｃ
ｌｉｃＧｒａｐｈ：向きがあるが輪がないグラフ）を
生成して、ループ２回分の操作を全て表示可能にする手
段と、ループ１回分以内に相当する全ての命令列を生成
して、ループ外に配置可能なセットアップ命令列の候補
とする手段と、セットアップ命令列に対するループ内処
理を実現するループ内命令列の候補を全て生成する手段
と、所定クロック数の上限以内で実行できる全てのセッ
トアップ命令列とループ内命令列の各候補の組み合わせ
を製造プログラムとして出力する手段と、を具備するこ
とを特徴とするプログラム製造装置である。According to an eighth aspect of the present invention, there is provided a program manufacturing apparatus for manufacturing a program comprising an instruction sequence having the same function based on a program including a loop process, wherein a command line argument of the original program is provided. Is parsed into a DAG (Directed Acyc) representing an assignment statement.
(Lic Graph: a graph having a direction but no ring) to generate all the operations for two loops, and to generate all instruction sequences corresponding to within one loop, and Means for setting up a settable set of instruction sequences; means for generating all possible set of instruction sequences in a loop for realizing in-loop processing of the set up instruction sequence; and all settable instruction sequences that can be executed within the upper limit of a predetermined number of clocks And a means for outputting a combination of each candidate of the instruction sequence in the loop as a manufacturing program.

【００２９】また、本発明の第９の側面は、プロセッサ
において実行されるループ処理を含むプログラムを元の
プログラムと同じ働きをする命令列に変換するための処
理をコンピュータ・システム上で実行せしめるコンピュ
ータ・プロクラムを有形的且つコンピュータ可読な形式
で記録するプログラム記録媒体であって、前記コンピュ
ータ・プログラムは、（Ａ）ループ実行クロック数の上
限を設定するステップと、（Ｂ）ループ外に配置可能な
セットアップ命令列の候補を生成するステップと、
（Ｃ）ループ実行後に期待されるプロセッサのレジスタ
の状態を求めて保存するステップと、（Ｄ）ループ内の
処理を実現するループ内命令列の候補を生成するステッ
プと、（Ｅ）ループ内命令列の候補を実行した結果と前
記ステップ（Ｃ）において保存したレジスタの状態とを
比較して、ループ内命令列の候補が前記ループ実行クロ
ック数の上限以内で実行できたか否かを判定するステッ
プと、（Ｆ）前記ステップ（Ｅ）における肯定的な結果
に応答して、セットアップ命令列の候補とループ内命令
列の候補の組み合わせをプログラム変換の解として出力
するステップと、（Ｇ）前記ループ実行クロック数の上
限ではプログラム変換の解が得られなかったことに応答
して、前記ループ実行クロック数の上限を１だけ増分し
て、前記ステップ（Ｂ）以降の処理を繰り返し実行する
ステップと、を具備することを特徴とするプログラム記
録媒体である。A ninth aspect of the present invention is a computer which causes a computer system to execute a process for converting a program including a loop process executed by a processor into an instruction sequence having the same function as the original program. A program recording medium for recording a program in a tangible and computer-readable form, wherein the computer program comprises: (A) a step of setting an upper limit of the number of loop execution clocks; Generating a candidate set of setup instructions;
(C) a step of obtaining and storing a state of a register of a processor expected after execution of the loop; (D) a step of generating a candidate for a sequence of instructions in the loop for realizing processing in the loop; and (E) an instruction of the loop. Comparing the result of executing the column candidate with the state of the register stored in step (C) to determine whether or not the in-loop instruction sequence candidate has been executed within the upper limit of the number of loop execution clocks; (F) outputting, in response to a positive result in step (E), a combination of a setup instruction sequence candidate and an in-loop instruction sequence candidate as a solution for program conversion; and (G) the loop In response to the fact that a program conversion solution was not obtained at the upper limit of the number of execution clocks, the upper limit of the number of loop execution clocks is incremented by one, and A program recording medium characterized by comprising the steps of repeatedly executing the processes after B), the.

【００３０】また、本発明の第１０の側面は、ループ処
理を含むプログラムを元のプログラムと同じ働きをする
命令列に変換するための処理をコンピュータ・システム
上で実行せしめるためのコンピュータ・プログラムを有
形的且つコンピュータ可読な形式で記録するプログラム
記録媒体であって、前記コンピュータ・プログラムは、
元のプログラムのコマンドライン引数を構文解析して、
代入文を表すＤＡＧ（ＤｉｒｅｃｔｅｄＡｃｙｃｌｉ
ｃＧｒａｐｈ：向きがあるが輪がないグラフ）を生成
して、ループ２回分の操作を全て表示可能にするステッ
プと、ループ１回分以内に相当する全ての命令列を生成
して、ループ外に配置可能なセットアップ命令列の候補
とするステップと、セットアップ命令列に対するループ
内処理を実現するループ内命令列の候補を全て生成する
ステップと、所定クロック数の上限以内で実行できる全
てのセットアップ命令列とループ内命令列の各候補の組
み合わせをプログラム変換の解とするステップと、を具
備することを特徴とするプログラム記録媒体である。According to a tenth aspect of the present invention, there is provided a computer program for causing a computer system to execute a process for converting a program including a loop process into an instruction sequence having the same function as the original program. A program recording medium for recording in a tangible and computer-readable form, wherein the computer program comprises:
Parse the command line arguments of the original program,
DAG (Directed Acyclic) representing an assignment statement
c Graph: a graph that is oriented but has no rings) to enable display of all operations for two loops, and to generate all instruction sequences corresponding to within one loop, and A step of providing a settable set of instruction sequences, a step of generating all possible set of instruction sequences in the loop for realizing the in-loop processing of the set-up instruction sequence, and a set of all settable instruction sequences that can be executed within the upper limit of the predetermined number of clocks And a step of using a combination of each candidate of the in-loop instruction sequence as a solution of the program conversion.

【００３１】[0031]

【作用】本発明に係るプログラム変換は、元のプログラ
ムと同じ働きをする命令列を全探索して最適化する超最
適化に関するものであり、特に、ループ処理を含むプロ
グラムを最適化できるようにデザインされたものであ
る。The program conversion according to the present invention relates to hyper-optimization for optimizing by fully searching for an instruction sequence having the same function as that of the original program. It was designed.

【００３２】本発明に係るプログラム変換処理に対して
ループの中身をコマンドライン引数として与えて起動す
ることにより、該ループを実現する実行時間が最短すな
わち最小のクロック数となるの命令列を出力することが
できる。By giving the contents of the loop as a command line argument to the program conversion process according to the present invention and starting it, an instruction sequence with the shortest execution time to realize the loop, that is, the minimum number of clocks is output. be able to.

【００３３】本発明に係るプログラム変換処理にプログ
ラムを投入すると、まず、プログラムのコマンドライン
引数を構文解析して、代入文を表すＤＡＧ（Ｄｉｒｅｃ
ｔｅｄＡｃｙｃｌｉｃＧｒａｐｈ：向きがあるが輪
がないグラフ）を生成して、ループ２回分の操作を全て
表示可能にする。When a program is input to the program conversion process according to the present invention, first, the command line arguments of the program are parsed and a DAG (Direc) representing an assignment statement is analyzed.
(Ted Acyclic Graph: a graph having a direction but no ring) is generated, and all operations for two loops can be displayed.

【００３４】次いで、ループ１回分以内に相当する全て
の命令列を生成して、ループ外に配置可能なセットアッ
プ命令列の候補とする。Next, all instruction sequences corresponding to one loop or less are generated and set as candidates for a setup instruction sequence that can be arranged outside the loop.

【００３５】次いで、セットアップ命令列に対するルー
プ内処理を実現するループ内命令列の候補を全て生成す
る。Next, all the candidates for the in-loop instruction sequence for realizing the in-loop processing for the setup instruction sequence are generated.

【００３６】そして、所定クロック数の上限以内で実行
できる全てのセットアップ命令列とループ内命令列の各
候補の組み合わせをプログラム変換の解として表示す
る。Then, all combinations of the setup instruction sequence and the candidates of the in-loop instruction sequence that can be executed within the upper limit of the predetermined number of clocks are displayed as the solution of the program conversion.

【００３７】また、クロック数の上限以内で解が見つか
らなければ、クロック数の上限を１だけ増分して、セッ
トアップ命令列及びループ内命令列それぞれの候補を新
たに生成して、上述と同様の処理手順を繰り返し実行す
る。If a solution is not found within the upper limit of the number of clocks, the upper limit of the number of clocks is incremented by one, and new candidates for the setup instruction sequence and the instruction sequence in the loop are newly generated. Repeat the procedure.

【００３８】本発明は、配列変数に対して繰り返し浮動
小数点演算を行うようなループ処理を含んだプログラム
を、命令列の全探索により超最適化することができる。
プログラムの実行効率にクリティカルであるループに対
して超最適化を適用することができるので、超最適化の
有用性をさらに高めることができる。According to the present invention, a program including a loop process for repeatedly performing a floating-point operation on an array variable can be super-optimized by performing a full search of an instruction sequence.
Since super-optimization can be applied to a loop that is critical to the execution efficiency of a program, the utility of the super-optimization can be further enhanced.

【００３９】本発明に係るプログラム変換が解として出
力する命令列は、ループ本体の他、ループ処理の外（好
ましくはループ処理の前）に配置されるセットアップ命
令列と、制御変数の更新命令とを含んでいる。The instruction sequence output as a solution by the program conversion according to the present invention includes, in addition to the loop body, a setup instruction sequence arranged outside the loop processing (preferably before the loop processing), a control variable update instruction, Contains.

【００４０】本発明に係るプログラム変換では、「命令
数が最も少ない命令列が最適」という前提には立たず、
生成した命令列について実行すなわち「命令実行シミュ
レーション」を行うことで、実行時間すなわち実行クロ
ック数が最短となる命令列を実際に見つけ出すようにし
ている。この結果、より正確なプログラム最適化を実現
することができる。In the program conversion according to the present invention, the assumption that "the instruction sequence having the smallest number of instructions is optimal" does not stand.
By executing the generated instruction sequence, that is, performing “instruction execution simulation”, an instruction sequence with the shortest execution time, that is, the number of execution clocks, is actually found. As a result, more accurate program optimization can be realized.

【００４１】但し、クロック数の判定には、ループ内命
令列にのみ適用し、セットアップ命令列を評価対象外と
した。セットアップ命令列は、ループ処理前に一度だけ
実行するに過ぎないので、判定精度には影響しないもの
と思料する。However, the determination of the number of clocks was applied only to the instruction sequence in the loop, and the setup instruction sequence was excluded from the evaluation target. Since the setup instruction sequence is executed only once before the loop processing, it is considered that it does not affect the determination accuracy.

【００４２】また、命令列が正しいかどうか、すなわち
元のプログラムと同じ働きをするかどうかの判定は、セ
ットアップ命令列の候補を生成した時点でループ実行後
に期待されるプロセッサのレジスタの状態を求めて保存
しておき、このレジスタ状態と、ループ内命令列を生成
して実行シミュレーションしたときの結果とを比較する
ことで行う。Further, whether the instruction sequence is correct, that is, whether or not the same function as the original program is performed, is determined by calculating the state of the register of the processor expected after the execution of the loop at the time when the candidate for the setup instruction sequence is generated. This is performed by comparing this register state with the result of generating an instruction sequence in a loop and performing an execution simulation.

【００４３】本発明の第９及び第１０の各側面に係るプ
ログラム記録媒体は、例えば、様々なプログラム・コー
ドを実行可能な汎用コンピュータ・システムに対して、
コンピュータ・プログラムを有形的且つコンピュータ可
読な形式で提供する媒体である。該提供媒体は、ＣＤ
（ＣｏｍｐａｃｔＤｉｓｃ）やＦＤ（ＦｌｏｐｐｙＤ
ｉｓｃ）、ＭＯ（Ｍａｇｎｅｔｏ−Ｏｐｔｉｃａｌｄ
ｉｓｃ）などの着脱自在で可搬性の記憶媒体など、その
形態は特に限定されない。The program recording medium according to the ninth and tenth aspects of the present invention can be used, for example, for a general-purpose computer system capable of executing various program codes.
A medium that provides a computer program in a tangible and computer-readable form. The providing medium is a CD
(Compact Disc) and FD (FloppyD)
isc), MO (Magneto-Optical d)
The form is not particularly limited, such as a removable and portable storage medium such as isc).

【００４４】このようなプログラム提供媒体は、コンピ
ュータ・システム上で所定のコンピュータ・プログラム
の機能を実現するための、コンピュータ・プログラムと
提供媒体との構造上又は機能上の協働的関係を定義した
ものである。換言すれば、本発明の第９又は第１０の各
側面に係るプログラム記録媒体を介して所定のコンピュ
ータ・プログラムをコンピュータ・システムにインスト
ールすることによって、コンピュータ・システム上では
協働的作用が発揮され、本発明の第３及び第４の各側面
に係るプログラム変換方法及び装置と同様の作用効果を
得ることができる。Such a program providing medium defines a structural or functional cooperative relationship between the computer program and the providing medium for realizing the functions of a predetermined computer program on a computer system. Things. In other words, by installing a predetermined computer program into a computer system via the program recording medium according to the ninth or tenth aspect of the present invention, a cooperative action is exerted on the computer system. The same operation and effect as those of the program conversion method and apparatus according to the third and fourth aspects of the present invention can be obtained.

【００４５】本発明のさらに他の目的、特徴や利点は、
後述する本発明の実施例や添付する図面に基づくより詳
細な説明によって明らかになるであろう。Still other objects, features and advantages of the present invention are:
It will become apparent from the following more detailed description based on the embodiments of the present invention and the accompanying drawings.

【００４６】[0046]

【発明の実施の形態】以下、図面を参照しながら本発明
の実施例を詳解する。Embodiments of the present invention will be described below in detail with reference to the drawings.

【００４７】本発明で言う「プログラム変換」は、例え
ば、マシン語やアセンブリ言語で記述されたプログラム
・コードすなわち命令列を入力として超最適化処理を適
用するための方法のことを指す。プログラムの最適化と
は、命令列から冗長部分を取り除き、元の命令列と同じ
働きをするが実行時間のより短い命令列に置き換えるこ
とを言うが、特に「超最適化」と言う場合には、組み合
わせ可能なあらゆる命令列を全探索してプログラムの最
適化を行うことを意味する。The term "program conversion" as used in the present invention refers to a method for applying a super-optimizing process with a program code written in a machine language or an assembly language, that is, an instruction sequence as an input. Program optimization refers to removing redundant parts from an instruction sequence and replacing it with an instruction sequence that performs the same function as the original instruction sequence but has a shorter execution time. This means that the program is optimized by traversing all possible instruction sequences.

【００４８】勿論、本明細書で扱うプログラム変換は、
マシン語やアセンブリ言語に対してだけではなく、Ｃ言
語のような高級プログラミング言語で記述されたソース
・プログラム・コードをコンピュータ・システムにおい
て解読（デコード）可能なオブジェクト・プログラム・
コードに変換する「コンパイル」においても適用可能で
ある。Of course, the program conversion dealt with in this specification is
An object program that can decode (decode) a source program code written in a high-level programming language such as C language in a computer system as well as machine language or assembly language.
The present invention is also applicable to "compilation" for converting to code.

【００４９】また、本発明で言う「プログラム変換装
置」あるいは「プログラム製造装置」は、専用のハード
ウェア装置としてデザインすることも可能であるが、プ
ログラム超最適化処理用のソフトウェア・プログラムを
汎用コンピュータ・システム上に導入し且つ起動すると
いう形態で体現することもできる。The "program conversion device" or "program manufacturing device" referred to in the present invention can be designed as a dedicated hardware device. However, a software program for program super-optimization processing is stored in a general-purpose computer. -It can also be embodied in the form of being installed and activated on the system.

【００５０】図１には、本発明を適用可能なコンピュー
タ・システム１００のハードウェア構成を模式的に示し
ている。以下、同図中の各ブロックについて解説する。FIG. 1 schematically shows a hardware configuration of a computer system 100 to which the present invention can be applied. Hereinafter, each block in the figure will be described.

【００５１】プロセッサ１０１は、コンピュータ・シス
テム１００全体の動作を統括的に制御するメイン・コン
トローラであり、オペレーティング・システム（ＯＳ）
の制御下で、各種のアプリケーション・プログラムを実
行する。The processor 101 is a main controller that controls the overall operation of the computer system 100, and includes an operating system (OS).
Various application programs are executed under the control of.

【００５２】ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭ
ｅｍｏｒｙ）１０２は、プロセッサ１０１が実行プログ
ラム・コードをロードしたり、作業データを一時格納す
るために使用される、書き込み可能なメモリである。Ｒ
ＡＭ１０２は、通常、複数個のＤＲＡＭ（Ｄｙｎａｍｉ
ｃＲＡＭ）チップで構成される。また、ＲＯＭ（Ｒｅ
ａｄＯｎｌｙＭｅｍｏｒｙ）１０３は、製造時に格
納データが恒久的に書き込まれる読み出し専用の不揮発
メモリである。ＲＯＭ１０３上には、例えば、システム
１００の電源投入時に実行する自己診断テスト・プログ
ラム（ＰＯＳＴ）や、ハードウェア入出力操作を実行す
るためのコード群（ＢＩＯＳ）が格納されている。RAM (Random Access M)
The memory 102 is a writable memory used by the processor 101 to load an execution program code and temporarily store work data. R
The AM 102 generally includes a plurality of DRAMs (Dynami
c RAM) chip. In addition, ROM (Re
The ad-only memory (103) is a read-only nonvolatile memory in which stored data is permanently written at the time of manufacturing. The ROM 103 stores, for example, a self-diagnosis test program (POST) executed when the power of the system 100 is turned on, and a code group (BIOS) for executing hardware input / output operations.

【００５３】入力部１０４は、ユーザからのコマンド入
力などを受容する装置である。キャラクタ・ベースでコ
マンド入力を行うキーボードや、座標指示形式でコマン
ド入力を行うマウスやタッチパネルなどの装置がこれに
含まれる。The input unit 104 is a device for receiving a command input or the like from a user. This includes devices such as a keyboard for inputting commands on a character basis and a mouse or touch panel for inputting commands in a coordinate designation format.

【００５４】表示部１０５は、処理画像やコマンド入力
のためのメニューなど、作業画面をユーザに提示するた
めの装置であり、ＣＲＴ（ＣａｔｈｏｄｅＲａｙＴ
ｕｂｅ：陰極線管）ディスプレイやＬＣＤ（Ｌｉｑｕｉ
ｄＣｒｙｓｔａｌＤｉｓｐｌａｙ：液晶表示ディス
プレイ）がこれに該当する。The display unit 105 is a device for presenting a work screen, such as a processed image and a menu for inputting a command, to the user, and is a CRT (Cathode Ray T).
ube: cathode ray tube) display and LCD (Liquid)
d Crystal Display: a liquid crystal display).

【００５５】外部記憶装置１０６は、例えば、ハード・
ディスク・ドライブ（ＨＤＤ）のような、比較的大容量
で再書き込み可能且つ不揮発の記憶装置であり、データ
・ファイルを蓄積したり、プログラム・ファイルをシス
テム１００にインストールするために使用される。プロ
グラム・ファイルの例は、プロセッサ１０１において実
行されるプログラム超最適化処理プログラムや、超最適
化処理を適用するマシン語コード若しくはアセンブリ言
語コード、あるいは超最適化処理により生成されたされ
たマシン語コードなどである。The external storage device 106 includes, for example, a hardware
A relatively large-capacity, rewritable and non-volatile storage device such as a disk drive (HDD), which is used to store data files and install program files in the system 100. Examples of the program file include a program super-optimization processing program executed in the processor 101, a machine language code or an assembly language code to which the super-optimization processing is applied, or a machine language code generated by the super-optimization processing. And so on.

【００５６】メディア・ドライブ１０７は、カートリッ
ジ式のメディアを装填して、メディア表面上の担持デー
タを読み書きするための装置である。ここで言うメディ
アとして、ＭＯ（Ｍａｇｎｅｔｏ−Ｏｐｔｉｃａｌｄ
ｉｓｃ）、ＣＤ−ＲＯＭ、ＤＶＤ（ＤｉｇｉｔａｌＶ
ｅｒｓａｔｉｌｅＤｉｓｃ）あるいはメモリ・スティ
ックなどの、システム１００から着脱自在で可搬型のメ
ディアが挙げられる。本実施例に係る超最適化処理を実
現するソフトウェア・プログラムや、超最適化処理を適
用するマシン語コードなどを含むプログラム・ファイル
は、これらメディアを媒介として複数のコンピュータ・
システム間を移動し、メディア・ドライブ１０７によっ
て外部記憶装置１０６にインストールすることができ
る。あるいは、本実施例に係る超最適化処理を適用して
生成されたマシン語コードを、これら可搬型メディアに
書き込んで、無数のコンピュータ・システムにディスト
リビュートすることができる。The media drive 107 is a device for loading a cartridge type medium and reading / writing data carried on the surface of the medium. The media referred to here is MO (Magneto-Optical d)
isc), CD-ROM, DVD (Digital V
An example of the storage medium is a removable medium that can be detached from the system 100, such as an electronic disc (memory disc) or a memory stick. A software program for realizing the hyper-optimization process according to the present embodiment, and a program file including a machine language code to which the hyper-optimization process is applied are executed by a plurality of computer programs via these media.
It can be moved between systems and installed on the external storage device 106 by the media drive 107. Alternatively, the machine language code generated by applying the hyper-optimization processing according to the present embodiment can be written in these portable media and distributed to countless computer systems.

【００５７】ネットワーク・インターフェース１０８
は、コンピュータ・システム１００を所定の通信プロト
コル（例えば、ＴＣＰ／ＩＰ（Ｔｒａｎｓｍｉｓｓｉｏ
ｎＣｏｎｔｒｏｌＰｒｏｔｏｃｏｌ／Ｉｎｔｅｒｎ
ｅｔＰｒｏｔｏｃｏｌ）プロトコル）に従ってネット
ワーク接続するための装置である。ネットワーク上に
は、他の複数のコンピュータ・システム（以下では、
「リモート・システム」とも呼ぶ：図示しない）が存在
する。本実施例のコンピュータ・システム１００は、超
最適化処理を実現するソフトウェア・プログラムや、超
最適化処理を適用するマシン語コード若しくはアセンブ
リ言語コードなどを、ネットワーク経由でリモート・シ
ステムから受信することができる。あるいは、本実施例
に係る超最適化処理を適用して生成されたマシン語コー
ドを、ネットワーク経由で複数のコンピュータ・システ
ムにディストリビュートすることもできる。Network interface 108
Establishes a computer system 100 with a predetermined communication protocol (for example, TCP / IP (Transmission)
n Control Protocol / Intern
(E. Protocol). There are several other computer systems on the network (hereafter,
(Also called “remote system”: not shown). The computer system 100 of the present embodiment can receive, from a remote system via a network, a software program for realizing the super-optimization process, a machine language code or an assembly language code for applying the super-optimization process, and the like. it can. Alternatively, the machine language code generated by applying the hyper-optimization processing according to the present embodiment can be distributed to a plurality of computer systems via a network.

【００５８】なお、コンピュータ・システム１００を構
成するためには、図１に示した以外にも多くのハードウ
ェア・コンポーネントが必要である。但し、これらは当
業者には周知であり、また、本発明の要旨を構成するも
のではないので、本明細書中では省略している。また、
図面の錯綜を回避するため、図中の各ハードウェア・ブ
ロック間の接続も抽象化して図示している点を了承され
たい。（例えば、プロセッサ１１としてのＣＰＵは、一
般には、各周辺機器を自己の外部ピンにローカル接続せ
ず、バスを介してＩ／Ｏ接続する。）In order to configure the computer system 100, many hardware components other than those shown in FIG. 1 are required. However, since these are well known to those skilled in the art and do not constitute the gist of the present invention, they are omitted in this specification. Also,
It should be noted that the connections between the hardware blocks in the figure are also illustrated in an abstract manner in order to avoid complicating the drawings. (For example, the CPU as the processor 11 generally does not locally connect each peripheral device to its own external pin, but does I / O connection via a bus.)

【００５９】図２には、本発明の実施に供されるプログ
ラム変換装置１０の構成を図解した高レベルの機能ブロ
ック図を示している。同図に示すように、プログラム変
換装置１０は、構文解析部１と、探索実行部２と、命令
生成部３と、命令実行シミュレータ４という４個の機能
モジュールで構成される。これらの各機能モジュール
は、実際には、プロセッサ１０１が実行する超最適化処
理プログラムと他のハードウェア構成要素との協働的作
用により実現される。以下、各機能モジュールについて
説明する。FIG. 2 is a high-level functional block diagram illustrating the configuration of the program conversion device 10 used in the embodiment of the present invention. As shown in FIG. 1, the program conversion device 10 includes four functional modules: a syntax analysis unit 1, a search execution unit 2, an instruction generation unit 3, and an instruction execution simulator 4. Each of these functional modules is actually realized by a cooperative operation of the hyper-optimization processing program executed by the processor 101 and other hardware components. Hereinafter, each functional module will be described.

【００６０】構文解析部１は、投入されたプログラムを
プログラミング言語の文脈自由文法に従って解析する
（すなわち、プログラムの終端記号列を導出して構文木
として表現する）機能モジュールである。本実施例の構
文解析部１は、特に、コマンドライン引数で指定された
目的の関数を構文解析して、入力された代入文を表すＤ
ＡＧ（ＤｉｒｅｃｔｅｄＡｃｙｃｌｉｃＧｒａｐ
ｈ：向きがあるが輪がないグラフ）を生成するようにな
っている。The syntactic analysis unit 1 is a functional module that analyzes an input program according to a context-free grammar of a programming language (ie, derives a terminal symbol string of the program and expresses it as a syntax tree). In particular, the parsing unit 1 of this embodiment parses a target function specified by a command line argument, and expresses a D representing an input assignment statement.
AG (Directed Acyclic Group)
h: a graph that has a direction but no rings).

【００６１】探索実行部２は、組み合わせ可能なあらゆ
る命令列を全探索して、元のプログラムと同じ働きをす
る最適なマシン命令列を探し出すための機能モジュール
であり、超最適化処理のための中核的な役割を持つ。よ
り具体的には、探索実行部２は、構文解析部１により生
成されたＤＡＧに対応して命令生成部３（後述）を再帰
的に呼出して命令列を生成せしめる。The search execution unit 2 is a functional module for performing a full search for all possible instruction sequences and searching for an optimal machine instruction sequence having the same function as the original program. Has a core role. More specifically, the search execution unit 2 recursively calls the instruction generation unit 3 (described later) corresponding to the DAG generated by the syntax analysis unit 1 to generate an instruction sequence.

【００６２】また、探索実行部２は、命令実行シミュレ
ータ４（後述）により供給されたクロック数に基づい
て、クロック数の上限値を１，２，３，…という具合に
逐次増加していき、深さ優先探索処理を行う。再帰の各
段階では、ＤＡＧの各ノードの実行に必要な命令、ある
いは対応するノードのない命令（例えば、ＦＸＣＨ命令
など）を１つ生成する。Further, the search execution section 2 sequentially increases the upper limit of the number of clocks in the order of 1, 2, 3,... Based on the number of clocks supplied by the instruction execution simulator 4 (described later). Perform depth-first search processing. At each stage of the recursion, one instruction required to execute each node of the DAG or an instruction without a corresponding node (for example, an FXCH instruction) is generated.

【００６３】全てのノードに対する命令が生成され、ク
ロック数の上限以内で同じ働きをする命令列が見つかれ
ば、探索実行部２は、その命令列を超最適化された命令
列として出力する。また、与えられたクロック数の上限
以内で解すなわち同じ働きをする命令列が見つからなけ
れば、クロック数の上限を１ずつインクリメントして命
令列の探索を継続して行う。When instructions for all nodes are generated and an instruction sequence having the same function is found within the upper limit of the number of clocks, search execution section 2 outputs the instruction sequence as a super-optimized instruction sequence. If no solution is found within the given upper limit of the number of clocks, that is, an instruction sequence having the same function is found, the upper limit of the number of clocks is incremented by one and the search for the instruction sequence is continued.

【００６４】なお、探索を高速化するために、クロック
数、クリティカル・パス、コード・サイズによる枝刈り
を行って、無駄な命令の生成を省くことができる。Note that in order to speed up the search, pruning is performed by the number of clocks, the critical path, and the code size, so that generation of useless instructions can be omitted.

【００６５】命令生成部３は、探索実行部２から呼び出
され、ＤＡＧの各ノードの実行に必要な命令を生成し、
再帰処理を行う。オペランドが揃っていないなど、実行
できない場合には、命令生成部３は何も処理しない。命
令の候補が複数存在する場合には、順番に生成処理を行
う。参照カウントによる枝刈りを行ってもよい。The instruction generation unit 3 is called from the search execution unit 2 and generates an instruction necessary for execution of each node of the DAG.
Perform recursive processing. When the instruction cannot be executed, such as when the operands are not prepared, the instruction generating unit 3 does nothing. When there are a plurality of instruction candidates, the generation processing is performed in order. Pruning by reference counting may be performed.

【００６６】命令実行シミュレータ４は、プロセッサの
状態、経過時間、生成した命令列等を保持している。例
えば、超最適化処理の対象（すなわち超最適化されたプ
ログラムを実行する）プロセッサが米インテル社の”Ｐ
ｅｎｔｉｕｍ”である場合、演算に使用されるレジスタ
の保持している値に対応するノードへのポインタ、計算
終了時刻などを保持する。The instruction execution simulator 4 holds the state of the processor, the elapsed time, the generated instruction sequence, and the like. For example, a processor to be subjected to the hyper-optimization process (that is, executing a hyper-optimized program) is an Intel "P"
In the case of “entium”, a pointer to a node corresponding to a value held in a register used for an operation, a calculation end time, and the like are held.

【００６７】また、命令実行シミュレータ４は、命令生
成部３により生成された命令を実行し、命令生成部３か
ら呼び出される手続きにより状態を更新し、経過時間を
計算する。命令実行シミュレータ４は、命令を実行する
上で必要となったクロック数を探索実行部２に出力する
ので、探索実行部２はクロック数が最小となる命令列を
探索することができる。枝刈り処理等によるバックトラ
ックのための命令を取り除くときは、命令実行シミュレ
ータ４の状態は元に戻される。The instruction execution simulator 4 executes the instruction generated by the instruction generation unit 3, updates the state by a procedure called from the instruction generation unit 3, and calculates the elapsed time. The instruction execution simulator 4 outputs the number of clocks required for executing the instruction to the search execution unit 2, so that the search execution unit 2 can search for an instruction sequence with the minimum number of clocks. When removing an instruction for backtracking by pruning or the like, the state of the instruction execution simulator 4 is restored.

【００６８】次いで、本実施例に係るプログラム変換装
置１０における動作手順について説明する。Next, an operation procedure in the program conversion device 10 according to the present embodiment will be described.

【００６９】構文解析部１は、コマンドライン引数で指
定された目的の関数を構文解析して、入力された代入文
を表すＤＡＧ（ＤｉｒｅｃｔｅｄＡｃｙｃｌｉｃＧ
ｒａｐｈ：向きがあるが輪がないグラフ）を生成する。
例えば、代入文として以下の式が入力されたとする。The syntax analyzer 1 parses a target function specified by a command line argument, and indicates a DAG (Directed Acyclic G) representing an input assignment statement.
raph: a graph that has an orientation but no rings).
For example, assume that the following expression is input as an assignment statement.

【００７０】[0070]

【数１】Ａ：＝Ｂ＋Ｃ＋ＤＥ：＝Ｆ＋Ｇ＋ＨA: = B + C + D E: = F + G + H

【００７１】それぞれの代入文からは、図３及び図４に
示すようなＤＡＧが生成される。但し、各図において、
○印はノードを示している。From each assignment statement, a DAG as shown in FIGS. 3 and 4 is generated. However, in each figure,
○ indicates a node.

【００７２】探索実行部２は、命令生成部３及び命令実
行シミュレータ４との協働的作用により、このようなＤ
ＡＧに基づいて超最適化処理を実行する。The search execution section 2 cooperates with the instruction generation section 3 and the instruction execution simulator 4 to make such a D
A super-optimization process is performed based on the AG.

【００７３】本実施例に係る超最適化処理は、特に、ル
ープ処理を含むプログラムを最適化するようにデザイン
されており、ループの中身をコマンドライン引数として
与えて起動すると、それを実現する実行時間最短の命令
列を出力するものである。出力される命令列は、ループ
本体の他に、ループ処理の外（好ましくはループ処理の
前）に配置されるセットアップ命令列と、制御変数の更
新命令を含んでいる。但し、本実施例に係るループの超
最適化処理は、以下の事柄を前提条件とする。すなわ
ち、The hyper-optimizing process according to the present embodiment is designed to optimize a program including a loop process. When the program is started by giving the contents of the loop as a command-line argument, the execution that realizes it is executed. It outputs the shortest instruction sequence. The output instruction sequence includes, in addition to the loop body, a setup instruction sequence arranged outside the loop processing (preferably before the loop processing) and a control variable update instruction. However, the super optimization processing of the loop according to the present embodiment is based on the following conditions. That is,

【００７４】（１）ループは１つの制御変数による繰り
返しとする。（２）単純変数の他に、配列変数を使用可能とする。配
列変数はループの制御変数を添え字とするものに限る。(1) The loop is repeated by one control variable. (2) In addition to simple variables, array variables can be used. Array variables must be subscripted by the control variables of the loop.

【００７５】図５には、本実施例に係るプログラムの超
最適化の処理手順をフローチャートの形式で示してい
る。以下、このフローチャートの各ステップについて説
明する。FIG. 5 is a flowchart showing the procedure of the program super-optimization according to the present embodiment. Hereinafter, each step of this flowchart will be described.

【００７６】まず、ステップＳ１１において、探索実行
部２は、クロックの上限値を１に初期設定する。First, in step S11, the search execution unit 2 initializes the upper limit of the clock to 1.

【００７７】次いで、ステップＳ１２では、探索実行部
２は、命令生成部３を呼び出して、セットアップ命令列
の候補を生成する。ここで言う「セットアップ命令列」
とは、ループの外に配置することが可能な命令列のこと
であり、換言すれば命令実行時に繰り返し実行する必要
のない部分である。命令生成部３は、ループ１回分以内
に相当する全ての命令列を、セットアップ命令列として
生成する。Next, in step S12, the search execution unit 2 calls the instruction generation unit 3 to generate a setup instruction sequence candidate. The "setup instruction sequence" mentioned here
Is a sequence of instructions that can be placed outside the loop, in other words, a portion that does not need to be repeatedly executed when executing the instruction. The instruction generation unit 3 generates all instruction strings corresponding to one loop or less as a setup instruction string.

【００７８】次いで、ステップＳ１３では、探索実行部
２は、命令実行シミュレータ４を呼び出して、ループ実
行後に期待されるプロセッサのレジスタの状態を求め
る。求められたレジスタの状態は、後続の処理のために
保管される。Next, in step S13, the search execution section 2 calls the instruction execution simulator 4 to obtain the state of the register of the processor expected after the execution of the loop. The determined register state is saved for further processing.

【００７９】次いで、ステップＳ１４では、探索実行部
２は、再び命令生成部３を呼び出して、セットアップ命
令列に対するループ内処理を実現するループ内命令列の
候補を全て生成せしめる。Next, in step S14, the search execution unit 2 calls the instruction generation unit 3 again to generate all the candidates for the in-loop instruction sequence for realizing the in-loop processing for the setup instruction sequence.

【００８０】次いで、ステップＳ１５では、探索実行部
２は、再び命令実行シミュレータ４を呼び出して、生成
した命令列について命令実行シミュレーションを行い、
現在設定されているクロック数の上限以内に命令列を実
行することができたか否かを判定するとともに、元のプ
ログラムと同じ働きをするか否かを判定する。後者の判
定には、命令実行シミュレーションの結果と、ステップ
Ｓ１３において保管しておいたレジスタの状態とを比較
することで行う。Next, in step S15, the search execution unit 2 calls the instruction execution simulator 4 again and performs an instruction execution simulation on the generated instruction sequence.
It is determined whether or not the instruction sequence has been executed within the currently set upper limit of the number of clocks, and whether or not the same operation as the original program is performed. The latter determination is made by comparing the result of the instruction execution simulation with the state of the register stored in step S13.

【００８１】これらの判定結果が肯定的であれば、超最
適化処理の解、すなわち実行クロック数が最短となる命
令列として、出力する（ステップＳ１６）。出力の形態
は特に問われない。例えば、ユーザに対する画面表示で
あっても、命令列の置換であっても、新たなプログラム
の生成であってもよい。他方、判定結果が否定的であれ
ば、ステップＳ１６はスキップされる。If these determination results are affirmative, a solution of the hyper-optimization process, that is, an instruction sequence with the shortest execution clock number is output (step S16). The form of output is not particularly limited. For example, it may be a screen display for the user, a replacement of the instruction sequence, or a generation of a new program. On the other hand, if the judgment result is negative, step S16 is skipped.

【００８２】次いで、ステップＳ１７では、ステップＳ
１４において生成されたすべてのループ内命令列候補に
ついて評価（すなわちシミュレーション）したか否かを
判断する。未処理の候補が残っていれば、ステップＳ１
５に復帰して上記と同様の処理を繰り返し実行する。Next, in step S17, step S
It is determined whether all the in-loop instruction sequence candidates generated in 14 have been evaluated (ie, simulated). If unprocessed candidates remain, step S1
5 and the same processing as above is repeatedly executed.

【００８３】次いで、ステップＳ１８では、ステップＳ
１２において生成されたすべてのループ内命令列候補に
ついて評価（すなわちシミュレーション）したか否かを
判断する。未処理の候補が残っていれば、ステップＳ１
３に復帰して上記と同様の処理を繰り返し実行する。Next, in step S18, step S
It is determined whether all the in-loop instruction sequence candidates generated in 12 have been evaluated (ie, simulated). If unprocessed candidates remain, step S1
3 and the same processing as described above is repeatedly executed.

【００８４】次いで、ステップＳ１９では、既にプログ
ラム変換の解が得られているか否かを判定する。Then, in a step S19, it is determined whether or not a solution of the program conversion has already been obtained.

【００８５】該判定結果が否定的、すなわち実行クロッ
ク数が最短となる命令列が未だ見つかっていなければ、
ステップＳ２０に進み、クロック数の上限を１だけ増分
する。そして、ステップＳ１２に復帰し、新たにセット
アップ命令列及びループ内命令列の候補を生成し、上記
と同様の処理を繰り返し実行して、実行クロック数が最
短となる命令列の探索を試みる。If the result of the determination is negative, that is, if the instruction sequence with the shortest execution clock number has not been found yet,
Proceeding to step S20, the upper limit of the number of clocks is incremented by one. Then, the process returns to step S12 to generate a new candidate for the setup instruction sequence and the in-loop instruction sequence, and repeatedly executes the same processing as described above to try to find the instruction sequence with the shortest execution clock number.

【００８６】ステップＳ１９における判定結果が肯定的
となった時点で、この処理ルーチン全体を終了する。When the result of the determination in step S19 is affirmative, the entire processing routine ends.

【００８７】以下に、本実施例に係る超最適化処理を実
現するプログラムの擬似コードを示しておく。The following is a pseudo code of a program for realizing the super-optimizing process according to the present embodiment.

【００８８】[0088]

【数２】クロック数の上限を１，２，３，…と拡張しな
がら繰り返すセットアップ命令列の各候補について繰り返すループ内の命令列の各候補について繰り返すループがクロック数の上限以内で実行できれば解として
出力する[Equation 2] Repeat while extending the upper limit of the number of clocks to 1, 2, 3, ... Repeat for each candidate of the setup instruction sequence Repeat for each candidate of the instruction sequence in the loop Solve if the loop can be executed within the upper limit of the number of clocks Output as

【００８９】ソフトウェア・パイプライニングでは、２
回分の繰り返しが重なって実行されるため、ループに入
る前に最初の繰り返しの一部を実行しておく必要があ
る。そして、実行してしまった操作に対応する、次の繰
り返しにおける操作は、ループ内で実行することにな
る。In software pipelining, 2
Because the iterations are repeated, it is necessary to execute a part of the first iteration before entering the loop. Then, the operation in the next repetition corresponding to the executed operation is executed in a loop.

【００９０】本実施例に係る超最適化処理によれば、コ
マンドライン引数の構文解析で得られた、代入文を表す
ＤＡＧを複製して、繰り返し２回分の操作を全て表せる
ようにしておく。そして、繰り返し１回分以内に相当す
る全ての命令列を生成してセットアップ命令列の候補に
する。また、セットアップ命令列の候補に対するループ
内の命令列を全て生成する。各命令列の候補について命
令実行シミュレーションを行い、クロック数の上限以内
で実行できるものを全探索の解として表示する。According to the hyper-optimization processing according to the present embodiment, the DAG representing the assignment statement obtained by the syntax analysis of the command line argument is duplicated so that all the operations for two repetitions can be represented. Then, all instruction sequences corresponding to one repetition or less are generated and set as setup instruction sequence candidates. Further, all the instruction sequences in the loop for the setup instruction sequence candidates are generated. An instruction execution simulation is performed for each instruction sequence candidate, and those that can be executed within the upper limit of the number of clocks are displayed as solutions of the full search.

【００９１】プログラムの最適化を行うためには、生成
された命令列が元のプログラムと同じ働きをすることを
保証しなければならない。例えばループが正しく繰り返
されていることを保証するためには、ループの最初と最
後の状態が、レジスタに保持されている値が配列の次の
要素になっている以外は同じであることを確認する必要
がある。例えば、浮動小数点の配列変数Ｘ［ｉ］とＹ
［ｉ］の加算命令ＦＡＤＤを繰り返し実行する場合、加
算結果をロードするレジスタＳＴ［０］に対して以下の
２式が成立することを確認しなければならない。In order to optimize a program, it must be ensured that the generated instruction sequence performs the same function as the original program. For example, to ensure that the loop repeats correctly, make sure that the beginning and end of the loop are the same, except that the value held in the register is the next element in the array. There is a need to. For example, floating point array variables X [i] and Y
When the addition instruction FADD of [i] is repeatedly executed, it is necessary to confirm that the following two equations hold for the register ST [0] into which the addition result is loaded.

【００９２】[0092]

【数３】（Ａ）ＳＴ［０］＝Ｘ［ｉ］＋Ｙ［ｉ］：ループの最初（Ｂ）ＳＴ［０］＝Ｘ［ｉ＋１］＋Ｙ［ｉ＋１］：ループの最後(A) ST [0] = X [i] + Y [i]: Start of loop (B) ST [0] = X [i + 1] + Y [i + 1]: End of loop

【００９３】本実施例では、セットアップ命令列の各候
補について、実行終了後の状態のうちレジスタ値を配列
の次の要素にしたものを求め、これを保存しておく（図
５のステップＳ１３に相当）。そして、ループ内の命令
列が生成されたら、その命令実行後の状態と、保存した
状態とを比較することで判断するようにしている（図５
のステップＳ１５に相当）。両者が異なる場合は、直ち
にバックトラックする。In this embodiment, for each candidate of the set-up instruction sequence, a state in which the register value is made the next element of the array among the states after the execution is obtained and stored (step S13 in FIG. 5). Equivalent). Then, when an instruction sequence in the loop is generated, a determination is made by comparing the state after execution of the instruction with the saved state (FIG. 5).
Step S15). If they are different, backtrack immediately.

【００９４】なお、セットアップ命令列の直後の繰り返
しと、何度目かのループの繰り返しとでは実行時間（ク
ロック数）が異なる可能性がある。プロセッサによって
は状態が振動して、ループ間で実行に要するクロック数
が安定しない可能性もある。米インテル社のＰｅｎｔｉ
ｕｍプロセッサでは、ループをもう一度実行すると安定
するようにデザインされているので、生成したループ内
の命令列について、タイミング計算だけ再度実行するこ
とで、ループの実行に必要なクロック数を比較的正確に
求めることができる。The execution time (the number of clocks) may be different between the repetition immediately after the setup instruction sequence and the repetition of the loop several times. Depending on the processor, the state may oscillate, and the number of clocks required for execution between loops may not be stable. Intel's Penti
Since the um processor is designed to be stable when the loop is executed again, the instruction sequence in the generated loop is executed only for the timing calculation again, so that the number of clocks required for the execution of the loop can be relatively accurately calculated. You can ask.

【００９５】次いで、米インテル社のＰｅｎｔｉｕｍプ
ロセッサの浮動小数点命令を含む繰り返し演算命令を含
んだプログラムの例を挙げて、本実施例に係る超最適化
処理を具体的に説明する。Next, the hyper-optimization processing according to the present embodiment will be specifically described with an example of a program including a repetition operation instruction including a floating-point instruction of a Pentium processor manufactured by Intel Corporation of the United States.

【００９６】まず、下式に示すような、Ｃ言語で記述さ
れた単純なループについて考えてみる。但し、ｘ，ｙは
ｆｌｏａｔの配列であり、ｓｉｚｅは配列の要素数とす
る。First, consider a simple loop described in the C language as shown in the following equation. Here, x and y are arrays of floats, and size is the number of elements of the arrays.

【００９７】[0097]

【数４】 (Equation 4)

【００９８】プロセッサの３２ビット汎用レジスタの１
つであるＥＳＩレジスタにｘ＋ｓｉｚｅが、ＥＤＩレジ
スタにｙ＋ｓｉｚｅが、ＥＣＸレジスタに−ｓｉｚｅが
入っているとして、上記のＣ言語プログラムをＰｅｎｔ
ｉｕｍプロセッサのアセンブリ言語に素朴に翻訳する
と、下式のようになる。1 of the 32-bit general-purpose register of the processor
Pentify the above C language program assuming that x + size is in the ESI register, y + size is in the EDI register, and -size is in the ECX register.
A simple translation into the assembly language of the ium processor is:

【００９９】[0099]

【数５】 (Equation 5)

【０１００】上式において、１行目は”ｘ［ｉ］のロー
ド”、２行目は”ｘ［ｉ］にｙ［ｉ］を加算”、３行目
は”加算結果をｙ［ｉ］にストア”、４行目は”カウン
タ更新”、５行目は”０なら終了”を、それぞれ意味す
る。In the above equation, the first line is “load x [i]”, the second line is “add y [i] to x [i]”, and the third line is “addition result to y [i] The fourth line means "counter update", and the fifth line means "end if 0".

【０１０１】また、各行の右端に示している数値は、同
行の実行に要するクロック数である（但し、キャッシュ
・ミスや分岐予測ミスなどがないと仮定した場合）。ま
た、＋で加算される値はストールがある場合のクロック
数の増分である（以下同様）。The numerical value shown at the right end of each row is the number of clocks required for executing the row (provided that there is no cache miss or branch prediction mistake). The value added by + is the increment of the number of clocks when there is a stall (the same applies hereinafter).

【０１０２】ループの１行目の実行に１クロックが加算
されるのは「アドレス生成インターロック（ＡＧＩ）」
のためである。Ｐｅｎｔｉｕｍプロセッサは、直前のク
ロック・サイクルで変更されたレジスタをアドレッシン
グに使用すると１クロック分のストールを発生する。The reason why one clock is added to the execution of the first line of the loop is “address generation interlock (AGI)”.
For. The Pentium processor generates a stall for one clock when the register changed in the immediately preceding clock cycle is used for addressing.

【０１０３】また、同３行目では、ＦＡＤＤ（浮動小数
点同士の加算）命令の次のＦＳＴＰ（浮動少数分値のス
トア）命令は、加算の結果待ちのため、３クロック分の
ストールを起こす。これは、ＦＡＤＤ命令等の発行自体
は１クロックで済むが加算結果が出力されるまでに３ク
ロックを要し、且つ、ＦＳＴＰ命令は特殊で１クロック
前までに結果を必要とするからである。In the third line, the FSTP (store floating point value) instruction following the FADD (addition between floating point numbers) instruction causes a stall for three clocks due to the result of the addition. This is because issuance of the FADD instruction or the like requires only one clock, but requires three clocks before the addition result is output, and the FSTP instruction is special and requires the result one clock before.

【０１０４】また、同５行目のＪＮＺ命令は４行目のＩ
ＮＣ命令と同時に実行することができる。したがって、
ループの実行時間は９クロックとなる。［数５］に示す
命令列をパイプラインに投入して実行する様子を図６に
図解しておく。但し同図中において、横軸は所要クロッ
ク数に相当すると理解されたい。Also, the JNZ instruction on the fifth line is
It can be executed simultaneously with the NC instruction. Therefore,
The execution time of the loop is 9 clocks. FIG. 6 illustrates how the instruction sequence shown in [Equation 5] is input to the pipeline and executed. However, in the figure, it should be understood that the horizontal axis corresponds to the required number of clocks.

【０１０５】例えば、インテル株式会社「インテル・ア
ーキテクチャ最適化マニュアル，１９９７」（資料番号
２４２８１ＧＪ）では、［数５］に示したアセンブリ言
語コードを下式に示すような命令列に改良（すなわち最
適化）することを推奨している（但し、この改良案は、
プログラマの経験則に基づいた旧来の最適化手法であ
り、全ての命令列を全探索する超最適化とは異なる）。For example, in Intel Corporation “Intel Architecture Optimization Manual, 1997” (document number 24281GJ), the assembly language code shown in [Equation 5] is improved to an instruction sequence as shown in the following expression (ie, optimization). ) (However, this improvement is
This is an old optimization method based on the programmer's rule of thumb, which is different from hyper-optimization in which all the instruction sequences are fully searched.)

【０１０６】[0106]

【数６】 (Equation 6)

【０１０７】この最適化手法は、要言すれば、ＩＮＣ＋
ＪＮＺという加算の結果待ちが必要でない命令を先に実
行させることを意図したものである。すなわち、加算待
ちのためのストールを、カウンタの更新と分岐命令の実
行、カウンタ更新によるＡＧＩストールと重ねることに
より、ループの実行時間を９クロックから７クロックに
短縮することができる。［数６］に示す命令列をパイプ
ラインに投入して実行する様子を図７に図解しておく。
但し同図中において、横軸は時間の経過すなわち所要ク
ロック数に相当すると理解されたい。This optimization method is, in short, INC +
It is intended to execute an instruction that does not need to wait for the result of the addition called JNZ first. That is, by overlapping the stall for the addition wait with the AGI stall by updating the counter, executing the branch instruction, and updating the counter, the execution time of the loop can be reduced from 9 clocks to 7 clocks. FIG. 7 illustrates how the instruction sequence shown in [Equation 6] is input to the pipeline and executed.
However, in the figure, it is understood that the horizontal axis corresponds to the passage of time, that is, the required number of clocks.

【０１０８】しかしながら、［数６］に示した命令列
は、可能なあらゆる命令列を全探索して得た結果ではな
く、必ずしも最適とは言えない。However, the instruction sequence shown in [Equation 6] is not a result obtained by performing a full search of all possible instruction sequences, and is not necessarily optimal.

【０１０９】これに対し、［数５］に示したアセンブリ
言語プログラムを本実施例に係るプログラム変換装置１
０に投入して得た命令列を以下に示す。On the other hand, the assembly language program shown in [Equation 5] is converted into the program conversion device 1 according to the present embodiment.
The instruction sequence obtained by inputting the value to 0 is shown below.

【０１１０】[0110]

【数７】 (Equation 7)

【０１１１】上記の命令列のうち先頭から４行目まで
は、ループ処理の外に配置されたセットアップ命令列で
あり、クロック数判定の対象外とする。何故ならば、セ
ットアップ命令列は１回のみの実行であり、プログラム
最適化においてクリティカルではないからである。The fourth line from the top of the above-mentioned instruction sequence is a setup instruction sequence arranged outside the loop processing, and is excluded from the clock number determination. This is because the setup instruction sequence is executed only once and is not critical in program optimization.

【０１１２】上記の命令列では、最初の加算の結果をス
トアする前に次の加算の実行をはじめることで、加算の
結果待ちのために生ずるストールを排除するようにして
いる。この結果、ループ処理には６クロックしか要しな
い。すなわち［数６］に示した例よりもさらに最適化す
ることができる。In the above-described instruction sequence, the execution of the next addition is started before the result of the first addition is stored, so that a stall caused by waiting for the result of the addition is eliminated. As a result, only six clocks are required for loop processing. That is, it is possible to further optimize the example shown in [Equation 6].

【０１１３】［数７］に示す命令列をパイプラインに投
入して実行する様子を図８に図解しておく。但し同図中
において、横軸は時間の経過すなわち所要クロック数に
相当すると理解されたい。同図を参照すれば、パイプラ
イン上でストールの原因が可能な限り排除されている点
が理解できよう。FIG. 8 illustrates the manner in which the instruction sequence shown in [Equation 7] is input to the pipeline and executed. However, in the figure, it is understood that the horizontal axis corresponds to the passage of time, that is, the required number of clocks. Referring to the figure, it can be understood that the cause of the stall is eliminated as much as possible on the pipeline.

【０１１４】次いで、別の例として、以下に示すような
Ｃ言語で記述されたループについて考察してみる。該ル
ープ処理は線型方程式を解く際によく利用されるもので
ある。但し、ｄａはｆｌｏａｔ型の変数とする。Next, as another example, consider a loop described in C language as shown below. The loop processing is often used when solving a linear equation. Here, da is a float type variable.

【０１１５】[0115]

【数８】 (Equation 8)

【０１１６】このプログラムをＰｅｎｔｉｕｍプロセッ
サのアセンブリ言語に素朴に翻訳すると、下式のように
なる。When this program is plainly translated into the assembly language of the Pentium processor, the following expression is obtained.

【０１１７】[0117]

【数９】 (Equation 9)

【０１１８】上式において、命令”ＦＭＵＬ”は浮動小
数点値同士の乗算であり、また、命令”ＦＳＵＢＲ”は
浮動小数点同士の減算である。ループの３行目では、Ｆ
ＳＵＢＲの発行自体は１クロックで済むが、乗算結果待
ちのために２クロックをストールする。また、同４行目
では、減算結果待ちのために３クロックをストールする
以外に、ＦＳＴＰ命令は特殊で１クロック前までに結果
を必要とする。したがって、ループ全体の実行時間は１
１クロックとなる。In the above equation, the instruction “FMUL” is a multiplication between floating point values, and the instruction “FSUBR” is a subtraction between floating points. In the third line of the loop, F
Although the issuing of the SUBR itself requires only one clock, two clocks are stalled to wait for the multiplication result. In the fourth line, the FSTP instruction is special and requires a result one clock before, in addition to stalling three clocks to wait for the subtraction result. Therefore, the execution time of the entire loop is 1
This is one clock.

【０１１９】［数９］に示す命令列をパイプラインに投
入して実行する様子を図９に図解しておく。但し同図中
において、横軸は時間の経過すなわち所要クロック数に
相当すると理解されたい。FIG. 9 illustrates how the instruction sequence shown in [Equation 9] is input to the pipeline and executed. However, in the figure, it is understood that the horizontal axis corresponds to the passage of time, that is, the required number of clocks.

【０１２０】これに対し、［数９］に示したアセンブリ
言語プログラムを本実施例に係るプログラム変換装置１
０に投入して得た命令列を以下に示す。On the other hand, the program conversion apparatus 1 according to the present embodiment converts the assembly language program shown in [Equation 9] to
The instruction sequence obtained by inputting the value to 0 is shown below.

【０１２１】[0121]

【数１０】 (Equation 10)

【０１２２】上記の命令列のうち先頭から４行目まで
は、ループ処理の外に配置されたセットアップ命令列で
あり、クロック数判定の対象外である。この命令列によ
れば、ループ処理には６クロックしか費やさないので、
素朴なアセンブリ言語プログラムに比し、劇的に最適化
されている点が理解できよう。The fourth line from the top of the above-mentioned instruction sequence is a setup instruction sequence arranged outside the loop processing, and is not subject to the determination of the number of clocks. According to this instruction sequence, only 6 clocks are spent for loop processing,
You can see that it is dramatically optimized compared to a simple assembly language program.

【０１２３】［数１０］に示す命令列をパイプラインに
投入して実行する様子を図１０に図解しておく。但し同
図中において、横軸は時間の経過すなわち所要クロック
数に相当すると理解されたい。FIG. 10 illustrates how the instruction sequence shown in [Equation 10] is input to the pipeline and executed. However, in the figure, it is understood that the horizontal axis corresponds to the passage of time, that is, the required number of clocks.

【０１２４】本実施例に係るプログラム変換装置１０に
プログラムを投入することにより、配列変数に対して繰
り返し浮動小数点演算を行うループ処理に対して、パイ
プラインを考慮した超最適化を行うことができる。この
結果、ストールの可能性を限りなく排除した、高性能の
命令列を出力することができる。By inputting a program into the program conversion device 10 according to the present embodiment, it is possible to perform a super-optimization in consideration of a pipeline in a loop process for repeatedly performing a floating-point operation on an array variable. . As a result, it is possible to output a high-performance instruction sequence in which the possibility of a stall is eliminated as much as possible.

【０１２５】［追補］以上、特定の実施例を参照しなが
ら、本発明について詳解してきた。しかしながら、本発
明の要旨を逸脱しない範囲で当業者が該実施例の修正や
代用を成し得ることは自明である。[Supplement] The present invention has been described in detail with reference to the specific embodiments. However, it is obvious that those skilled in the art can modify or substitute the embodiment without departing from the spirit of the present invention.

【０１２６】上述した本発明の実施形態では、米インテ
ル社のマイクロプロセッサ”Ｐｅｎｔｉｕｍ”が持つ浮
動少数点命令を例に挙げて説明したが、勿論これ以外の
マイクロプロセッサや他の命令に対しても本発明が同様
の作用効果を奏することは言うまでもない。In the above-described embodiment of the present invention, the floating-point instruction of the microprocessor "Pentium" of Intel Corporation of the United States has been described as an example, but it is needless to say that other microprocessors and other instructions are also applicable. It goes without saying that the present invention has the same function and effect.

【０１２７】要するに、例示という形態で本発明を開示
してきたのであり、限定的に解釈されるべきではない。
本発明の要旨を判断するためには、冒頭に記載した特許
請求の範囲の欄を参酌すべきである。In short, the present invention has been disclosed by way of example, and should not be construed as limiting.
In order to determine the gist of the present invention, the claims described at the beginning should be considered.

【０１２８】[0128]

【発明の効果】以上詳記したように、本発明によれば、
元のプログラムと同じ働きをする最適化されたマシン命
令列に変換することができる、優れたプログラム変換装
置及び方法、プログラム製造装置、並びにプログラム記
録媒体を提供することができる。As described above in detail, according to the present invention,
It is possible to provide an excellent program conversion device and method, a program manufacturing device, and a program recording medium that can be converted into an optimized machine instruction sequence having the same function as the original program.

【０１２９】また、本発明によれば、組み合わせ可能な
あらゆる命令列を全探索して命令列を最適化する、すな
わち超最適化を行うことができる、優れたプログラム変
換装置及び方法、プログラム製造装置、並びにプログラ
ム記録媒体を提供することができる。Further, according to the present invention, an excellent program conversion apparatus and method, and a program manufacturing apparatus capable of optimizing an instruction sequence by performing a full search for all instruction sequences that can be combined, that is, performing super-optimization. , And a program recording medium.

【０１３０】また、本発明によれば、配列変数に対して
繰り返し浮動小数点演算を行うループ処理に対して超最
適化を行うことができる、優れたプログラム変換装置及
び方法、プログラム製造装置、並びにプログラム記録媒
体を提供することができる。すなわち、本発明によれ
ば、プログラムの実行効率にクリティカルであるループ
に対して超最適化を適用することができるので、超最適
化の有用性をさらに高めることができる。Further, according to the present invention, an excellent program conversion apparatus and method, a program manufacturing apparatus, and a program capable of super-optimizing a loop processing for repeatedly performing a floating-point operation on an array variable. A recording medium can be provided. That is, according to the present invention, since super-optimization can be applied to a loop that is critical to the execution efficiency of a program, the utility of the super-optimization can be further enhanced.

【０１３１】本発明によれば、配列変数に対して繰り返
し浮動小数点演算を行うループ処理に対して、パイプラ
インを考慮した超最適化を行うことができる。この結
果、ストールの可能性を限りなく排除した、高性能の命
令列を出力することができる。According to the present invention, it is possible to perform a super-optimization in consideration of a pipeline for a loop process for repeatedly performing a floating-point operation on an array variable. As a result, it is possible to output a high-performance instruction sequence in which the possibility of a stall is eliminated as much as possible.

[Brief description of the drawings]

【図１】本発明を適用可能なコンピュータ・システム１
００のハードウェア構成を模式的に示した図である。FIG. 1 is a computer system 1 to which the present invention can be applied.
FIG. 2 is a diagram schematically illustrating a hardware configuration of No. 00.

【図２】本発明の実施に供されるプログラム変換装置１
０の構成を図解した高レベルの機能ブロック図である。FIG. 2 is a program conversion apparatus 1 provided for implementing the present invention.
FIG. 2 is a high-level functional block diagram illustrating the configuration of FIG.

【図３】構文解析部１において生成されるＤＡＧ（Ｄｉ
ｒｅｃｔｅｄＡｃｙｃｌｉｃＧｒａｐｈ）の一例を示
した図である。FIG. 3 shows a DAG (Di
FIG. 4 is a diagram illustrating an example of a "restricted acyclic graph".

【図４】構文解析部１において生成されるＤＡＧ（Ｄｉ
ｒｅｃｔｅｄＡｃｙｃｌｉｃＧｒａｐｈ）の一例を示
した図である。FIG. 4 is a diagram illustrating a DAG (Di) generated in the syntax analysis unit 1.
FIG. 4 is a diagram illustrating an example of a "restricted acyclic graph".

【図５】本実施例に係る超最適化の処理手順を示したフ
ローチャートである。FIG. 5 is a flowchart showing a processing procedure of hyper-optimization according to the embodiment.

【図６】［数５］に示す命令列をパイプラインに投入し
て実行する様子を図解したものである。FIG. 6 is a diagram illustrating a state where an instruction sequence shown in [Equation 5] is input to a pipeline and executed.

【図７】［数６］に示す命令列をパイプラインに投入し
て実行する様子を図解したものである。FIG. 7 is a diagram illustrating a state in which an instruction sequence shown in [Equation 6] is input to a pipeline and executed.

【図８】［数７］に示す命令列をパイプラインに投入し
て実行する様子を図解したものである。FIG. 8 illustrates a state in which an instruction sequence shown in [Equation 7] is input to a pipeline and executed.

【図９】［数９］に示す命令列をパイプラインに投入し
て実行する様子を図解したものである。FIG. 9 illustrates a state in which the instruction sequence shown in [Equation 9] is input to a pipeline and executed.

【図１０】［数１０］に示す命令列をパイプラインに投
入して実行する様子を図解したものである。FIG. 10 is a diagram illustrating a state where an instruction sequence shown in [Equation 10] is input to a pipeline and executed.

[Explanation of symbols]

１…構文解析部，２…探索実行部３…命令生成部，４…命令実行シミュレータ１０…プログラム変換装置１００…コンピュータ・システム，１０１…プロセッサ１０２…ＲＡＭ，１０３…ＲＯＭ１０４…入力部，１０５…表示部１０６…外部記憶装置，１０７…メディア・ドライブ１０８…ネットワーク・インターフェース DESCRIPTION OF SYMBOLS 1 ... Syntax analysis part, 2 ... Search execution part 3 ... Instruction generation part, 4 ... Instruction execution simulator 10 ... Program conversion device 100 ... Computer system, 101 ... Processor 102 ... RAM, 103 ... ROM 104 ... Input part, 105 ... Display unit 106: external storage device, 107: media drive 108: network interface

Claims

[Claims]

1. A program conversion method for converting a program including a loop process into an instruction sequence having the same function as an original program, comprising: generating a setup instruction sequence candidate that can be arranged outside a loop; Generating a candidate for an in-loop instruction sequence that realizes processing in the loop; determining whether or not the candidate for the in-loop instruction sequence has been executed within a predetermined time; Outputting a combination of a candidate of a setup instruction sequence and a candidate of an instruction sequence in a loop as a solution of the program conversion in response to the result;
A program conversion method, comprising:

2. A program conversion method for converting a program including loop processing into an instruction sequence having the same function as an original program, comprising: (a) setting an upper limit of a loop execution time; and (b) (C) generating a candidate for an in-loop instruction sequence that implements processing in the loop; and (d) generating a candidate for an in-loop instruction sequence. A step of determining whether or not the execution has been completed within the upper limit of the loop execution time; and (e) a combination of a set-up instruction sequence candidate and a loop instruction sequence candidate in response to the positive result in the step (d). And outputting the result as a solution of the program conversion.

And (f) adding a predetermined unit time to the upper limit of the loop execution time in response to the fact that a solution of the program conversion is not obtained within the upper limit of the loop time. 3. The method according to claim 2, further comprising the step of: (b) repeatedly executing subsequent processing.

4. A program conversion method for converting a program including a loop process executed in a processor into an instruction sequence having the same function as an original program,
(A) a step of setting the upper limit of the number of loop execution clocks, (B) a step of generating a candidate for a setup instruction sequence that can be arranged outside the loop, and (C) a state of a register of the processor expected after execution of the loop. Obtaining and storing; (D) generating a candidate for an in-loop instruction sequence for realizing the processing in the loop; (E) storing the result of executing the candidate for the in-loop instruction sequence and saving in the step (C) (F) a positive result in the step (E) by comparing whether or not the candidate of the instruction sequence in the loop has been executed within the upper limit of the number of clocks for loop execution. (G) outputting a combination of a candidate for a setup instruction sequence and a candidate for an instruction sequence in a loop as a solution for program conversion in response to Responding to the inability to obtain a solution for program conversion at the upper limit of the number of loops, incrementing the upper limit of the number of loop execution clocks by one, and repeatedly executing the processing of step (B) and subsequent steps. A program conversion method, comprising:

5. A program conversion method for converting a program including loop processing into an instruction sequence having the same function as an original program, comprising: parsing a command line argument of the original program;
DAG (Directed Acyclic) representing an assignment statement
c Graph: a graph that is oriented but has no rings) to enable display of all operations for two loops, and to generate all instruction sequences corresponding to within one loop, and A step of allocating a setup instruction sequence candidate; a step of generating all the loop instruction sequence candidates for realizing the in-loop processing of the setup instruction sequence; all setup instruction sequences that can be executed within the upper limit of the predetermined number of clocks And a step of using a combination of each candidate of the instruction sequence in a loop as a solution for program conversion.

6. A program conversion device for converting a program including a loop process into an instruction sequence having the same function as the original program, comprising: means for generating a setup instruction sequence candidate that can be arranged outside a loop; Means for generating a candidate for an in-loop instruction sequence that realizes processing in a loop; means for determining whether or not a candidate for an in-loop instruction sequence has been executed within a predetermined time; Means for outputting, in response to a result, a combination of a candidate for a setup instruction sequence and a candidate for an instruction sequence in a loop as a solution for program conversion.

7. A program conversion apparatus for converting a program including a loop process into an instruction sequence having the same function as an original program, comprising: (a) means for setting an upper limit of a loop execution time; and (b) Means for generating a candidate for a setup instruction sequence that can be arranged outside the loop; and (c) means for generating a candidate for an in-loop instruction sequence that realizes processing in the loop;
(D) means for determining whether or not the candidate for the instruction sequence in the loop has been executed within the upper limit of the loop execution time, and (e) a setup instruction sequence in response to a positive result in the means (d). And a means for outputting a combination of the candidate of (1) and the candidate of the in-loop instruction sequence as a solution of the program conversion.

And (f) adding a predetermined unit time to the upper limit of the loop execution time in response to the fact that no solution for the program conversion is obtained within the upper limit of the loop time. 8. The program conversion device according to claim 7, further comprising: means for re-inputting the program to (b).

9. A program conversion apparatus for converting a program including loop processing executed in a processor into an instruction sequence having the same function as an original program,
(A) means for setting the upper limit of the number of loop execution clocks;
(B) means for generating a setup instruction sequence candidate that can be arranged outside the loop, (C) means for obtaining and storing the state of the register of the processor expected after execution of the loop,
(D) means for generating a candidate for an in-loop instruction sequence for realizing the processing in the loop; and (E) comparing the result of executing the candidate for the in-loop instruction sequence with the state of the register stored in the means (C). Means for judging whether or not the candidate for the in-loop instruction sequence has been executed within the upper limit of the number of loop execution clocks; and (F) a setup instruction in response to a positive result in the means (E). Means for outputting a combination of a candidate for a sequence and a candidate for an instruction sequence in a loop as a solution for program conversion; and (G) responding to the inability to obtain a solution for program conversion at the upper limit of the number of loop execution clocks. Means for increasing the upper limit of the number of loop execution clocks by one and re-entering the means (B).

10. A program conversion device for converting a program including a loop process into an instruction sequence having the same function as an original program, wherein a command line argument of the original program is parsed.
DAG (Directed Acyclic) representing an assignment statement
c Graph: a graph that has an orientation but no rings) to generate all the operations for two loops, and all the instruction sequences corresponding to one loop or less,
Means for setting up a set of instructions that can be placed outside the loop; means for generating all of the candidates for the set of in-loop instructions that realize the in-loop processing of the set-up instructions; Means for converting a combination of a setup instruction sequence and each candidate of an in-loop instruction sequence into a program conversion solution.

11. A program manufacturing apparatus for manufacturing a program comprising an instruction sequence having the same function based on a program including a loop process, comprising: means for generating a candidate for a setup instruction sequence that can be arranged outside a loop; Means for generating a candidate for an in-loop instruction sequence that realizes processing in a loop; means for determining whether or not a candidate for an in-loop instruction sequence has been executed within a predetermined time; Means for outputting a combination of a candidate for a setup instruction sequence and a candidate for an instruction sequence in a loop as a solution of the program conversion in response to such a result.

12. A program manufacturing apparatus for manufacturing a program consisting of an instruction sequence having the same function based on a program including a loop process, comprising: (a) means for setting an upper limit of a loop execution time; ) Means for generating a candidate for a setup instruction sequence that can be arranged outside the loop, (c) means for generating a candidate for an instruction sequence in a loop for realizing processing in the loop, and (d) candidate for an instruction sequence in the loop. Means for determining whether or not execution was possible within the upper limit of the loop execution time,
(E) means for outputting a combination of a set-up instruction sequence candidate and an in-loop instruction sequence candidate as a manufacturing program in response to a positive result in the means (d);
(F) responding to a negative result in the means (d), adding a predetermined value to an upper limit of the loop execution time, and reentering the upper limit in the means (b). Program manufacturing equipment.

13. A program manufacturing apparatus for manufacturing a program consisting of an instruction sequence having the same function based on a program including a loop process, comprising: (a) means for setting an upper limit of the number of loop execution clocks; b) means for generating a candidate for a setup instruction sequence that can be arranged outside the loop;
(C) means for generating a candidate for an in-loop instruction sequence that realizes processing in a loop; and (d) means for determining whether or not the candidate for an in-loop instruction sequence has been executed within the upper limit of the number of loop execution clocks. (E) means for outputting a combination of a setup instruction sequence candidate and a loop instruction sequence candidate in response to a positive result in the means (d) as a manufacturing program; and (f) the means (d). ), The upper limit of the number of loop execution clocks is set to 1
Means for incrementing by only the number and re-entering the means (b).

14. A program manufacturing apparatus for manufacturing a program comprising an instruction sequence having the same function based on a program including loop processing executed by a processor, wherein (A) setting an upper limit of the number of loop execution clocks. (B) means for generating candidates for a setup instruction sequence that can be arranged outside the loop, (C) means for obtaining and storing the state of the register of the processor expected after execution of the loop, and (D) loop (E) comparing the result of executing the candidate of the instruction sequence in the loop with the state of the register stored in the means (C), Means for determining whether or not the candidate of the internal instruction sequence has been executed within the upper limit of the number of loop execution clocks; and (F) in response to a positive result in the means (E), Means for outputting, as a manufacturing program, a combination of a candidate for a set-up instruction sequence and a candidate for an instruction sequence in a loop; And a means for incrementing the number and then re-entering the means (B).

15. A program manufacturing apparatus for manufacturing a program consisting of a sequence of instructions having the same function based on a program including a loop process, wherein a command line argument of an original program is parsed.
DAG (Directed Acyclic) representing an assignment statement
c Graph: a graph that has an orientation but no rings) to generate all the operations for two loops, and all the instruction sequences corresponding to one loop or less,
Means for setting up a set of instructions that can be placed outside the loop; means for generating all of the candidates for the set of in-loop instructions that realize the in-loop processing of the set-up instructions; Means for outputting a combination of each of the setup instruction sequence and each candidate of the in-loop instruction sequence as a production program.

16. A tangible and computer-readable computer program for causing a computer system to execute a process for converting a program including a loop process executed in a processor into an instruction sequence having the same function as an original program on a computer system. A program recording medium for recording in a format, wherein the computer program comprises:
Setting an upper limit for the number of loop execution clocks;
(B) a step of generating a setup instruction sequence candidate that can be arranged outside the loop; (C) a step of obtaining and storing a state of a register of the processor expected after execution of the loop; and (D) processing in the loop. Generating a candidate for the in-loop instruction sequence to be realized; and (E) comparing a result of executing the candidate for the in-loop instruction sequence with the state of the register saved in the step (C), and Determining whether or not the candidate has been executed within the upper limit of the number of loop execution clocks; and (F) responding to the positive result in step (E), setting up the candidate for the instruction sequence and the instruction sequence in the loop. And (F) responding to the negative result in the step (E), in response to the negative result in the step (E). Upper and incremented by 1, the program recording medium characterized by comprising the steps of repeatedly executing the processing of the step (B) or later.

17. A computer program for causing a computer system to execute a process for converting a program including a loop process into an instruction sequence having the same function as the original program in a tangible and computer-readable format. A program recording medium, wherein the computer program parses command line arguments of the original program,
DAG (Directed Acyclic) representing an assignment statement
c Graph: a graph that is oriented but has no rings) to enable display of all operations for two loops, and to generate all instruction sequences corresponding to within one loop, and A step of allocating a setup instruction sequence candidate; a step of generating all the loop instruction sequence candidates for realizing the in-loop processing of the setup instruction sequence; all setup instruction sequences that can be executed within the upper limit of the predetermined number of clocks And a step of using a combination of each candidate of the instruction sequence in the loop as a solution of the program conversion.