JP2006004123A

JP2006004123A - Optimization device, optimization method, and program

Info

Publication number: JP2006004123A
Application number: JP2004179072A
Authority: JP
Inventors: Toru Murayama; 亨村山
Original assignee: NEC Electronics Corp
Current assignee: NEC Electronics Corp
Priority date: 2004-06-17
Filing date: 2004-06-17
Publication date: 2006-01-05

Abstract

<P>PROBLEM TO BE SOLVED: To provide an optimization device, an optimization method and a program capable of reducing the power consumption of a computer system by arresting bit transition of data outputted to a bus. <P>SOLUTION: This program inputs a command code string just before and a command code string to generate, calculates the bit transition numbers of the command code string just before and the command code string to generate, rearranges the alignment of command codes of the command code string to generate according to the calculated bit transition number, and outputs the command code string just before and the command code string to generate. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は、最適化装置、最適化方法及びプログラムに関し、特に、複数の命令を並列に処理可能なコンピュータシステムで実行される命令コードを最適化する最適化装置、最適化方法及びプログラムに関する。 The present invention relates to an optimization device, an optimization method, and a program, and more particularly, to an optimization device, an optimization method, and a program for optimizing an instruction code executed in a computer system that can process a plurality of instructions in parallel.

近年、計算機システム（コンピュータシステム）は、携帯可能なノート型パーソナルコンピュータや携帯情報端末などより小型のものや、より処理速度の高速なものが普及している。このような計算機システムの小型化や高速化は、半導体プロセスの微細化や回路の動作周波数の高速化等によって実現されているが、これに伴い計算機システムの消費電力が増大しており、消費電力の低減が強く望まれている。 2. Description of the Related Art In recent years, computer systems (computer systems) that are smaller than computers such as portable notebook personal computers and personal digital assistants and that have higher processing speed have become widespread. Such downsizing and speeding up of computer systems have been realized by miniaturizing semiconductor processes and increasing the operating frequency of circuits, etc., but with this, the power consumption of computer systems has increased. Reduction of this is strongly desired.

図８に、一般的な計算機システムの構成を示す。この計算機システム８００は、図に示されるように、命令メモリ８０１、命令デコーダ８０２、データメモリ８０３、演算器８０４を備えている。命令メモリ８０１と命令デコーダ８０２とは、命令バス８０５を介して接続され、データメモリ８０３と演算器８０４とは、データバス８０６を介して接続されている。 FIG. 8 shows a configuration of a general computer system. As shown in the figure, the computer system 800 includes an instruction memory 801, an instruction decoder 802, a data memory 803, and an arithmetic unit 804. The instruction memory 801 and the instruction decoder 802 are connected via an instruction bus 805, and the data memory 803 and the arithmetic unit 804 are connected via a data bus 806.

命令メモリ８０１には、あらかじめ実行コードが格納されている。この実行コードは、計算機システム８００で実行可能なコードであり、命令コードが実行順に配置されている。 The instruction memory 801 stores an execution code in advance. This execution code is a code that can be executed by the computer system 800, and instruction codes are arranged in the execution order.

例えば、実行コードの処理が実行される場合、まず、実行コードに含まれる命令コードが順に命令メモリ８０１から命令デコーダ８０２へ命令バス８０５を介して転送される。次いで、命令デコーダ８０２は、命令メモリ８０１から転送された命令コードをデコードし、データメモリ８０３及び演算器８０４へ制御信号や、データメモリ８０３へのアクセス番地を出力する。次いで、演算器８０４は、命令デコーダ８０２からの制御信号やアクセス番地に従い、データメモリ８０３からデータバス８０６を介してデータを読み出し、所定の演算を行って、演算結果をデータバス８０６を介してデータメモリ８０３へ書き込む。こうして、実行コードの命令コードが実行される。 For example, when execution code processing is executed, first, instruction codes included in the execution code are sequentially transferred from the instruction memory 801 to the instruction decoder 802 via the instruction bus 805. Next, the instruction decoder 802 decodes the instruction code transferred from the instruction memory 801, and outputs a control signal and an access address to the data memory 803 to the data memory 803 and the arithmetic unit 804. Next, the arithmetic unit 804 reads data from the data memory 803 via the data bus 806 in accordance with the control signal and access address from the instruction decoder 802, performs a predetermined operation, and outputs the operation result to the data via the data bus 806. Write to memory 803. Thus, the instruction code of the execution code is executed.

このような計算機システムでは、クロックサイクル毎に、命令メモリ８０１から命令デコーダ８０２へ命令コードが転送され、この命令コードの転送の際に命令バス８０５へ信号を出力するスイッチング回路において電力（スイッチングパワー）が消費されてしまう。一般に、バスに信号を出力するスイッチング回路の消費電力は、転送するデータのビット遷移（ビット反転）によるスイッチング回数に比例する。 In such a computer system, an instruction code is transferred from the instruction memory 801 to the instruction decoder 802 every clock cycle, and power (switching power) is generated in a switching circuit that outputs a signal to the instruction bus 805 when the instruction code is transferred. Will be consumed. Generally, the power consumption of a switching circuit that outputs a signal to a bus is proportional to the number of times of switching due to bit transition (bit inversion) of data to be transferred.

そこで、転送するデータのビット遷移を抑え消費電力を低減する方法として、例えば、特許文献１や２が知られている。特許文献１では、計算機システムの内部において、前にバスに出力したデータを記憶しておき、ＮＯＰ命令などのようにバスを使用しない命令の場合に、記憶しておいた前のデータをバスに出力し、バスの状態を維持することにより消費電力を低減させている。 Thus, for example, Patent Documents 1 and 2 are known as methods for suppressing bit transition of data to be transferred and reducing power consumption. In Patent Document 1, the data previously output to the bus is stored inside the computer system, and in the case of an instruction that does not use the bus, such as a NOP instruction, the stored previous data is stored in the bus. The power consumption is reduced by outputting and maintaining the state of the bus.

また、特許文献２では、計算機システムの実行コードを発生する装置において、１つの命令に対して複数の命令コードを割り当て、命令コードを発生する際に、直前の命令コードと比較してビット遷移数（反転ビット数）が少なくなるように、複数割り当てられた命令コードの中から、１つの命令コードを選択し発生させることにより、計算機システムでこの命令コードを実行したときの消費電力を低減させている。 Also, in Patent Document 2, in an apparatus that generates an execution code of a computer system, when a plurality of instruction codes are assigned to one instruction and an instruction code is generated, the number of bit transitions is compared with the immediately preceding instruction code. By selecting and generating one instruction code from among a plurality of assigned instruction codes so that (number of inverted bits) is reduced, the power consumption when this instruction code is executed in the computer system is reduced. Yes.

しかしながら、特許文献１の方法では、計算機システムにデータを記憶する回路やバスへの出力を切替える回路等が新たに必要となってしまう。また、特許文献２の方法では、１つの命令に対して複数のコードを割り当てるため、命令ビット幅が増加し、命令バスのバス幅やバスのスイッチング回路等が増加したり、命令デコーダの変換テーブル等が増大してしまう。すなわち、これらの方法では、ハードウェア等の構成が冗長化するため、結果的に、消費電力の増大を招くという問題がある。 However, the method of Patent Document 1 requires a new circuit for storing data in the computer system, a circuit for switching output to the bus, and the like. Further, in the method of Patent Document 2, since a plurality of codes are assigned to one instruction, the instruction bit width increases, the bus width of the instruction bus, the bus switching circuit, etc. increase, or the conversion table of the instruction decoder Etc. will increase. That is, in these methods, since the configuration of hardware and the like is made redundant, there is a problem that power consumption is increased as a result.

一方、複数のプロセッサ（演算器）を同時に使用して、計算機システムの処理能力を向上させるマルチプロセッサシステムが知られている。マルチプロセッサシステムの処理方式としては、主に、複数のプロセッサが複数の異なるデータを並列処理するＭＩＭＤ（Multiple Instrucion/Multiple Data）が採用されている。 On the other hand, a multiprocessor system is known that simultaneously uses a plurality of processors (calculators) to improve the processing capacity of a computer system. As a processing method of the multiprocessor system, MIMD (Multiple Instrucion / Multiple Data) in which a plurality of processors process a plurality of different data in parallel is mainly adopted.

マルチプロセッサシステム用のオブジェクトコードを生成するコンパイラでは、各プロセッサにおいて異なる命令コードを並列に処理するようなスケジューリングが行われている。また、コンパイラでは、例えば、命令やスレッドの同期をとるためにＮＯＰ命令を挿入したり、データの依存関係によって、同じサイクル内で並列に処理する命令の命令スロットを入れ替える等の最適化も行われている。 In a compiler that generates object code for a multiprocessor system, scheduling is performed so that different instruction codes are processed in parallel in each processor. The compiler also performs optimizations such as inserting a NOP instruction to synchronize instructions and threads, and replacing instruction slots of instructions that are processed in parallel within the same cycle depending on data dependency. ing.

このようなマルチプロセッサシステムでは、複数のプロセッサ毎に命令メモリから命令デコーダへ命令バスを介して命令コードが転送されるため、上記のようなバスの消費電力の増大はより大きな問題となる。
特開平４−１５１７５５号公報特開平９−６２４２２号公報 In such a multiprocessor system, since the instruction code is transferred from the instruction memory to the instruction decoder via the instruction bus for each of the plurality of processors, the increase in power consumption of the bus as described above becomes a greater problem.
Japanese Patent Laid-Open No. 4-151755 Japanese Patent Laid-Open No. 9-62422

このように、従来のマルチプロセッサシステム等の計算機システムでは、バスに出力されるデータのビット遷移を少なくすると、ハードウェア等の構成を冗長化する必要があるため、消費電力が増大するという問題点があった。 As described above, in a computer system such as a conventional multiprocessor system, if the bit transition of data output to the bus is reduced, it is necessary to make the configuration of hardware etc. redundant, which increases power consumption. was there.

本発明は、このような問題点を解決するためになされたもので、本発明の目的は、ハードウェア等の構成を冗長化することなく、バスに出力されるデータのビット遷移を抑止し、計算機システムの消費電力を低減することである。 The present invention was made to solve such problems, and the object of the present invention is to suppress bit transition of data output to the bus without making the configuration of hardware or the like redundant, It is to reduce the power consumption of the computer system.

本発明にかかるプログラムは、複数の命令を並列に処理可能なコンピュータシステム（計算機システム）で実行される命令コードを最適化する処理をコンピュータに実行させるプログラムであって、前記処理は、並列に実行する命令コードが複数配列された第１の命令コード列と並列に実行する命令コードが複数配列され前記第１の命令コード列の次に実行される第２の命令コード列とを入力し、前記第１の命令コード列と前記第２の命令コード列とのビット遷移数を計算し、前記計算されたビット遷移数に応じて、前記第２の命令コード列の命令コードの配列を並び替え、前記第１命令コード列と前記並び替えられた第２の命令コード列とを出力するものである。これにより、ハードウェア等の構成を冗長化することなく、バスに出力されるデータのビット遷移を抑止でき、計算機システムの消費電力を低減することができる。 A program according to the present invention is a program for causing a computer to execute a process for optimizing an instruction code executed by a computer system (computer system) capable of processing a plurality of instructions in parallel. The process is executed in parallel. A plurality of instruction codes to be executed in parallel and a second instruction code string to be executed next to the first instruction code string; Calculating the number of bit transitions between the first instruction code string and the second instruction code string, and rearranging the instruction code array of the second instruction code string according to the calculated bit transition number; The first instruction code string and the rearranged second instruction code string are output. This makes it possible to suppress bit transitions of data output to the bus without making the hardware configuration redundant, thereby reducing the power consumption of the computer system.

上述のプログラムにおいて、前記並び替えは、前記計算されたビット遷移数が最小となるように、前記第２の命令コード列の命令コードの配列を並び替えるものであってもよい。これにより、さらに計算機システムの消費電力を低減することができる。 In the above program, the rearrangement may rearrange the instruction code array of the second instruction code string so that the calculated number of bit transitions is minimized. Thereby, the power consumption of the computer system can be further reduced.

上述のプログラムにおいて、前記並び替えは、第２の命令コード列の命令コードの配列である、第１の配列パターンと、前記第１の配列パターンと異なる第２の配列パターンとを用意し、前記第１の配列パターンによる前記ビット遷移数と前記第２の配列パターンによる前記ビット遷移数とを比較し、前記ビット遷移数が小さい配列パターンを判定し、前記出力は、前記ビット遷移数が小さいと判定された前記第１あるいは第２の配列パターンからなる前記第２の命令コード列を出力するものであってもよい。これにより、効果的に計算機システムの消費電力を低減することができる。 In the above-described program, the rearrangement prepares a first arrangement pattern which is an instruction code arrangement of a second instruction code string, and a second arrangement pattern different from the first arrangement pattern, The number of bit transitions according to the first array pattern is compared with the number of bit transitions according to the second array pattern, an array pattern with a small number of bit transitions is determined, and the output is when the number of bit transitions is small The second instruction code sequence including the determined first or second arrangement pattern may be output. Thereby, the power consumption of a computer system can be reduced effectively.

上述のプログラムにおいて、前記並び替えは、前記第２の命令コード列に含まれる命令コードに対する命令スロットの割り当てを変更することによって、命令コードの配列を入れ替えるものであってもよい。これにより、効率よく計算機システムの消費電力を低減することができる。 In the above-described program, the rearrangement may change the instruction code arrangement by changing the assignment of instruction slots to instruction codes included in the second instruction code string. Thereby, the power consumption of a computer system can be reduced efficiently.

上述のプログラムは、前記第１の命令コード列に含まれる第１の命令コードと前記第２の命令コード列に含まれる第２の命令コードとのスレッド番号に応じて、前記第１の命令コードと前記第２の命令コードとを入れ替えるものであってもよい。これにより、さらに計算機システムの消費電力を低減することができる。 The above-mentioned program has the first instruction code according to a thread number between the first instruction code included in the first instruction code string and the second instruction code included in the second instruction code string. And the second instruction code may be interchanged. Thereby, the power consumption of the computer system can be further reduced.

本発明にかかる最適化方法は、複数の命令を並列に処理可能なコンピュータシステムで実行される命令コードを最適化する最適化方法であって、並列に実行する命令コードが複数配列された第１の命令コード列と並列に実行する命令コードが複数配列され前記第１の命令コード列の次に実行される第２の命令コード列とを入力し、前記第１の命令コード列と前記第２の命令コード列とのビット遷移数を計算し、前記計算されたビット遷移数に応じて、前記第２の命令コード列の命令コードの配列を並び替え、前記第１命令コード列と前記並び替えられた第２の命令コードとを出力するものである。これにより、ハードウェア等の構成を冗長化することなく、バスに出力されるデータのビット遷移を抑止でき、計算機システムの消費電力を低減することができる。 An optimization method according to the present invention is an optimization method for optimizing an instruction code executed in a computer system capable of processing a plurality of instructions in parallel, and is a first method in which a plurality of instruction codes to be executed in parallel are arranged. A plurality of instruction codes to be executed in parallel with the first instruction code string, and a second instruction code string to be executed next to the first instruction code string, and the first instruction code string and the second instruction code string are input. The number of bit transitions with respect to the instruction code sequence of the second instruction code sequence is rearranged in accordance with the calculated number of bit transitions, and the first instruction code sequence and the rearrangement are rearranged. The second instruction code is output. This makes it possible to suppress bit transitions of data output to the bus without making the hardware configuration redundant, thereby reducing the power consumption of the computer system.

本発明にかかる最適化装置は、複数の命令を並列に処理可能なコンピュータシステムで実行される命令コードを最適化する最適化装置であって、並列に実行する命令コードが複数配列された第１の命令コード列と並列に実行する命令コードが複数配列され前記第１の命令コード列の次に実行される第２の命令コード列とを入力する入力部と、前記第１の命令コード列と前記第２の命令コード列とのビット遷移数を計算する計算部と、前記計算されたビット遷移数に応じて、前記第２の命令コード列の命令コードの配列を並び替える並び替え部と、前記第１命令コード列と前記並び替えられた第２の命令コードとを出力する出力部と、を備えるものである。これにより、ハードウェア等の構成を冗長化することなく、バスに出力されるデータのビット遷移を抑止でき、計算機システムの消費電力を低減することができる。 An optimization device according to the present invention is an optimization device that optimizes an instruction code executed in a computer system capable of processing a plurality of instructions in parallel, and is a first in which a plurality of instruction codes to be executed in parallel are arranged. A plurality of instruction codes to be executed in parallel with the instruction code string, and an input unit for inputting a second instruction code string to be executed next to the first instruction code string; and the first instruction code string; A calculation unit that calculates the number of bit transitions with the second instruction code sequence, and a rearrangement unit that rearranges an instruction code array of the second instruction code sequence according to the calculated number of bit transitions; And an output unit that outputs the first instruction code string and the rearranged second instruction code. This makes it possible to suppress bit transitions of data output to the bus without making the hardware configuration redundant, thereby reducing the power consumption of the computer system.

本発明によれば、ハードウェア等の構成を冗長化することなく、バスに出力されるデータのビット遷移を抑止し、計算機システムの消費電力を低減することができる最適化装置、最適化方法及びプログラムを提供することができる。 According to the present invention, an optimization device, an optimization method, and an optimization method capable of suppressing bit transition of data output to a bus and reducing power consumption of a computer system without making a hardware configuration redundant. A program can be provided.

発明の実施の形態１．
まず、図１を用いて、本発明の実施の形態１にかかる計算機システムの構成について説明する。この計算機システム１００は、複数の命令を並列に実行できるシステムであり、例えば、ＭＩＭＤ方式のマルチプロセッサシステムである。計算機システム１００は、１チップのマイクロコンピュータでもよいし、複数のチップから構成されていてもよい。 Embodiment 1 of the Invention
First, the configuration of the computer system according to the first embodiment of the present invention will be described with reference to FIG. The computer system 100 is a system that can execute a plurality of instructions in parallel, and is, for example, a MIMD multiprocessor system. The computer system 100 may be a one-chip microcomputer or may be composed of a plurality of chips.

計算機システム１００は、図に示されるように、命令メモリ１０１、命令デコーダ１０２ａ，１０２ｂ、データメモリ１０３、演算器１０４ａ，１０４ｂを備えている。命令メモリ１０１と命令デコーダ１０２ａ，１０２ｂとは、それぞれ命令バス１０５ａ，１０５ｂを介して接続され、データメモリ１０３と演算器１０４ａ,１０４ｂとは、それぞれデータバス１０６ａ,１０６ｂを介して接続されている。 As shown in the figure, the computer system 100 includes an instruction memory 101, instruction decoders 102a and 102b, a data memory 103, and arithmetic units 104a and 104b. The instruction memory 101 and instruction decoders 102a and 102b are connected via instruction buses 105a and 105b, respectively. The data memory 103 and the arithmetic units 104a and 104b are connected via data buses 106a and 106b, respectively.

尚、この例では、２つの命令デコーダと２つの演算器を備え、２つの命令を並列に実行可能であるが、その他の任意の数の命令デコーダや演算器を備えてもよい。また、この例では、命令メモリとデータメモリを１つ備えているが、命令デコーダや演算器と同じ数の命令メモリやデータメモリを備えていてもよい。 In this example, two instruction decoders and two arithmetic units are provided and two instructions can be executed in parallel. However, any other number of instruction decoders and arithmetic units may be provided. In this example, one instruction memory and one data memory are provided, but the same number of instruction memories and data memories as the instruction decoder and the arithmetic unit may be provided.

命令メモリ１０１には、プログラム最適化装置１１０によって生成された実行コードが格納されている。この実行コードは、例えば、図示しない入出力回路等によって、命令バス１０５ａ，１０５ｂを介して命令メモリ１０１に格納されたり、メモリ書込み装置であるライタ等によって直接、命令メモリ１０１に書き込まれる。 The instruction memory 101 stores an execution code generated by the program optimization device 110. This execution code is stored in the instruction memory 101 via the instruction buses 105a and 105b by, for example, an input / output circuit (not shown) or directly written in the instruction memory 101 by a writer as a memory writing device.

また、実行コードには、計算機システム１００において並列に実行可能な命令コード列が実行順に配置されている。この命令コード列は、プロセッサ（演算器）の数に応じた命令スロットを有し、１つの命令スロットに１つの命令コードが割り当てられている。この例では、命令コード列に２つの命令スロット、すなわち２つの命令コードが含まれている。 In the execution code, an instruction code sequence that can be executed in parallel in the computer system 100 is arranged in the execution order. This instruction code string has instruction slots corresponding to the number of processors (operation units), and one instruction code is assigned to one instruction slot. In this example, the instruction code string includes two instruction slots, that is, two instruction codes.

例えば、実行コードの処理が実行される場合、まず、プログラムカウンタ（不図示）が指すアドレスに従い、命令メモリ１０１に格納された実行コードの命令コード列がフェッチされる。この命令コード列の一方の命令スロットの命令コードが命令メモリ１０１から命令デコーダ１０２ａへ命令バス１０５ａを介して転送され、他方の命令スロットの命令コードが命令メモリ１０１から命令デコーダ１０２ｂへ命令バス１０５ｂを介して転送される。本実施形態では、このように命令コードを命令メモリからフェッチするときに、命令コードのビット遷移を抑え、命令バス１０５ａ,１０５ｂの消費電力を低減することを特徴としている。 For example, when execution code processing is executed, first, an instruction code string of execution code stored in the instruction memory 101 is fetched according to an address indicated by a program counter (not shown). The instruction code of one instruction slot of the instruction code string is transferred from the instruction memory 101 to the instruction decoder 102a via the instruction bus 105a, and the instruction code of the other instruction slot is transferred from the instruction memory 101 to the instruction decoder 102b via the instruction bus 105b. Transferred through. The present embodiment is characterized in that when the instruction code is fetched from the instruction memory as described above, the bit transition of the instruction code is suppressed and the power consumption of the instruction buses 105a and 105b is reduced.

次いで、命令デコーダ１０２ａは、転送された命令コードをデコードし、データメモリ１０３及び演算器１０４ａへ制御信号や、データメモリ１０３へのアクセス番地を出力する。同様に、命令デコーダ１０２ｂは、データメモリ１０３及び演算器１０４ｂへ制御信号や、データメモリ１０３へのアクセス番地を出力する。 Next, the instruction decoder 102a decodes the transferred instruction code, and outputs a control signal and an access address to the data memory 103 to the data memory 103 and the arithmetic unit 104a. Similarly, the instruction decoder 102b outputs a control signal and an access address to the data memory 103 to the data memory 103 and the arithmetic unit 104b.

次いで、演算器１０４ａは、命令デコーダ１０２ａからの制御信号やアクセス番地に従い、データメモリ１０３からデータバス１０６ａを介してデータを読み出し、所定の演算を行って、演算結果をデータバス１０６ａを介してデータメモリ１０３へ書き込む。同様に、演算器１０４ｂは、データメモリ１０３からデータを読み出し、所定の演算を行って、演算結果をデータメモリ１０３へ書き込む。こうして、実行コードの命令コード列が並列に実行される。 Next, the arithmetic unit 104a reads data from the data memory 103 via the data bus 106a in accordance with the control signal and access address from the instruction decoder 102a, performs a predetermined operation, and outputs the operation result to the data via the data bus 106a. Write to the memory 103. Similarly, the calculator 104 b reads data from the data memory 103, performs a predetermined calculation, and writes the calculation result to the data memory 103. In this way, the instruction code sequence of the execution code is executed in parallel.

次に、図２のブロック図を用いて、本実施形態にかかるプログラム最適化装置の構成について説明する。このプログラム最適化装置１１０は、高級言語等のソースコードをコンパイルして得られたオブジェクトコードを、このコードを実行する計算機システムの消費電力が低減するように最適化し、実行コードを生成する装置である。 Next, the configuration of the program optimization apparatus according to the present embodiment will be described using the block diagram of FIG. The program optimizing device 110 is an apparatus that generates an execution code by optimizing an object code obtained by compiling a source code such as a high-level language so that power consumption of a computer system that executes the code is reduced. is there.

プログラム最適化装置１１０は、パーソナルコンピュータやサーバコンピュータ等のコンピュータシステムにより構成され、図中の各ブロックは、ハードウェアもしくはハードウェア上で実行されるソフトウェアによって構成される。尚、プログラム最適化装置１１０は、単一のコンピュータでなくとも、複数のコンピュータによって構成することも可能である。例えば、コンパイラ１１１、最適化部１１２、リンカ１１３をそれぞれ別の装置としてもよいし、コンパイラ１１１と最適化部１１２を同じ装置とし、リンカ１１３を別の装置としてもよい。また、この例では、最適化部１１２をコンパイラ１１１やリンカ１１３とは別のブロックとしているが、最適化部１１２をコンパイラ１１１やリンカ１１３に内蔵していてもよい。 The program optimization apparatus 110 is configured by a computer system such as a personal computer or a server computer, and each block in the drawing is configured by hardware or software executed on the hardware. The program optimizing device 110 can be configured by a plurality of computers, not a single computer. For example, the compiler 111, the optimization unit 112, and the linker 113 may be separate devices, or the compiler 111 and the optimization unit 112 may be the same device, and the linker 113 may be separate devices. In this example, the optimization unit 112 is a separate block from the compiler 111 and the linker 113, but the optimization unit 112 may be built in the compiler 111 and the linker 113.

プログラム最適化装置１１０は、図に示されるように、コンパイラ１１１、最適化部１１２、リンカ１１３、ソースコード格納部１１４、オブジェクトコード格納部１１５、最適化コード格納部１１６、実行コード格納部１１７を備えている。プログラム最適化装置１１０は、この他、図示しないキーボードやマウス等の入力部や、ＣＲＴやＬＣＤ等表示部を備えており、必要に応じてユーザから情報を入力したり、各処理の実行結果等の情報をユーザへ表示することができる。 As shown in the figure, the program optimization apparatus 110 includes a compiler 111, an optimization unit 112, a linker 113, a source code storage unit 114, an object code storage unit 115, an optimization code storage unit 116, and an execution code storage unit 117. I have. In addition, the program optimizing device 110 includes an input unit such as a keyboard and a mouse (not shown) and a display unit such as a CRT and an LCD, and inputs information from a user as necessary, execution results of each process, and the like. Can be displayed to the user.

コンパイラ１１１、リンカ１１３は、例えば、ＣＰＵ等が記憶装置に格納されたアプリケーションプログラムに従って処理を実行し、他のハードウェア構成と協働することによって構成することができる。また、ソースコード格納部１１４、オブジェクトコード格納部１１５、最適化コード格納部１１６、実行コード格納部１１７は、ハードディスク等の内部記憶手段や光ディスク等の外部記憶手段により構成することができる。最適化部１１２の構成については後述する。 The compiler 111 and the linker 113 can be configured by, for example, a CPU or the like executing processing according to an application program stored in a storage device and cooperating with other hardware configurations. The source code storage unit 114, the object code storage unit 115, the optimization code storage unit 116, and the execution code storage unit 117 can be configured by an internal storage unit such as a hard disk or an external storage unit such as an optical disk. The configuration of the optimization unit 112 will be described later.

ソースコード格納部１１４には、高級言語等により記述されたソースコードが格納されている。ソースコードは、高級言語であるＣ言語やＣ＋＋言語等で記述されているが、これに限らず、アセンブラ言語等で記述されていてもよい。例えば、このソースコードは、ユーザにより入力部を介して入力されて、ソースコード格納部１１４にあらかじめ格納されている。 The source code storage unit 114 stores source code written in a high-level language or the like. The source code is described in a high-level language such as C language or C ++ language, but is not limited thereto, and may be described in assembler language or the like. For example, the source code is input by the user via the input unit and stored in advance in the source code storage unit 114.

コンパイラ１１１は、ソースコード格納部１１４に格納されたソースコードを入力し、これをコンパイルすることにより、オブジェクトコードを生成する。例えば、コンパイラ１１１は、入力されたソースコードに基づいてアセンブラ言語のアセンブラコードを生成し、アセンブラコードを機械語に変換してオブジェクトコードを生成する。このコンパイルの際に、計算機システム１００において並列に実行可能な命令コードがスケジュールされて、命令コード列が実行順に生成される。オブジェクトコード格納部１１５には、コンパイラ１１１によって生成されたオブジェクトコードが格納されている。このオブジェクトコードは、上記のように計算機システム１００で並列実行可能な命令コード列が実行順に記述されている。 The compiler 111 inputs the source code stored in the source code storage unit 114 and compiles it to generate an object code. For example, the compiler 111 generates assembler language assembler code based on the input source code, and converts the assembler code into machine language to generate object code. During this compilation, instruction codes that can be executed in parallel in the computer system 100 are scheduled, and instruction code strings are generated in the order of execution. The object code storage unit 115 stores the object code generated by the compiler 111. In this object code, an instruction code sequence that can be executed in parallel by the computer system 100 is described in the order of execution as described above.

最適化部１１２は、オブジェクトコード格納部１１５に格納されたオブジェクトコードを入力し、これを計算機システム１００の消費電力が低減するように最適化し、最適化コードを生成する。最適化部１１２は、後述するように、オブジェクトコードの命令コード列について、直前の命令コード列と比べてビット遷移数が少なくなるように、命令コードの並び替えを行う。最適化コード格納部１１６には、最適化部１１２によって生成された最適化コードが格納されている。 The optimization unit 112 inputs the object code stored in the object code storage unit 115, optimizes this so that the power consumption of the computer system 100 is reduced, and generates an optimization code. As will be described later, the optimization unit 112 rearranges the instruction codes so that the number of bit transitions in the instruction code string of the object code is smaller than that in the immediately preceding instruction code string. The optimization code storage unit 116 stores the optimization code generated by the optimization unit 112.

リンカ１１３は、最適化コード格納部１１６に格納された最適化コードを入力し、これをリンケージすることにより、実行コードを生成する。例えば、リンカ１１３は、複数の最適化コードの結合や、参照しているライブラリの呼び出し等を行い、実行コードを生成する。尚、リンカ１１３は、コンパイラ１１１によって生成されたオブジェクトコードをリンケージすることも可能であり、このオブジェクトコードと最適化コードをリンケージすることもできる。実行コード格納部１１７には、リンカ１１３によって生成された実行コードが格納されている。実行コードは、計算機システム１００でそのまま実行可能なコードであり、ロードモジュールとも呼ばれる。 The linker 113 receives the optimized code stored in the optimized code storage unit 116 and links it to generate an execution code. For example, the linker 113 generates an execution code by combining a plurality of optimization codes, calling a referenced library, and the like. The linker 113 can also link the object code generated by the compiler 111, and can also link the object code and the optimized code. The execution code storage unit 117 stores an execution code generated by the linker 113. The execution code is a code that can be directly executed by the computer system 100 and is also called a load module.

次に、図３を用いて、本実施形態にかかるプログラム最適化装置の最適化部の構成について説明する。最適化部１１２は、上記のように、オブジェクトコードを計算機システム１００の消費電力が低減するように最適化するブロックである。 Next, the configuration of the optimization unit of the program optimization apparatus according to the present embodiment will be described with reference to FIG. As described above, the optimization unit 112 is a block that optimizes the object code so that the power consumption of the computer system 100 is reduced.

最適化部１１２は、図に示されるように、入力部３０１、命令スロット割り当て部３０２、ビット遷移数演算部３０３、ビット遷移数比較・判定部３０４、出力部３０５、直前の命令コード列格納部３０６、最小ビット遷移数格納部３０７、最小ビット遷移数の命令コード列格納部３０８を備えている。 As shown in the figure, the optimization unit 112 includes an input unit 301, an instruction slot allocation unit 302, a bit transition number calculation unit 303, a bit transition number comparison / determination unit 304, an output unit 305, and an immediately preceding instruction code string storage unit. 306, a minimum bit transition number storage unit 307, and an instruction code string storage unit 308 for the minimum bit transition number.

入力部３０１、命令スロット割り当て部３０２、ビット遷移数演算部３０３、ビット遷移数比較・判定部３０４、出力部３０５は、図２のコンパイラ１１１等と同様に、アプリケーションプログラムがＣＰＵにより実行され、ハードウェアと協働することによって構成されている。直前の命令コード列格納部３０６、最小ビット遷移数格納部３０７、最小ビット遷移数の命令コード列格納部３０８は、図２のソースコード格納部１１４等と同様に、内部記憶手段や外部記憶手段によって構成される。 The input unit 301, the instruction slot allocation unit 302, the bit transition number calculation unit 303, the bit transition number comparison / determination unit 304, and the output unit 305 are similar to the compiler 111 in FIG. It is configured by cooperating with the wear. The immediately preceding instruction code string storage unit 306, minimum bit transition number storage unit 307, and minimum bit transition number instruction code string storage unit 308 are similar to the source code storage unit 114 in FIG. Consists of.

入力部３０１は、オブジェクトコード格納部１１５に格納されたオブジェクトコードから、命令コード列を順に入力する。例えば、入力部３０１は、オブジェクトコード格納部１１５から、命令コード列を１列ずつ読み出して取得してもよいし、オブジェクトコード全体を読み出した後、その中から命令コード列を１列ずつ取得してもよい。この例では、命令コード列“００１１００００，０００１１０１１”が入力されている。 The input unit 301 inputs an instruction code sequence in order from the object code stored in the object code storage unit 115. For example, the input unit 301 may read and acquire the instruction code sequence from the object code storage unit 115 one by one, or after reading the entire object code, acquire the instruction code sequence one by one. May be. In this example, the instruction code string “00110000,00011011” is input.

命令スロット割り当て部３０２は、入力部３０１によって入力された命令コード列を命令スロットに割り当てる。すなわち、命令コード列の命令コードの配列パターンを用意し、さらに、このパターンを並び替えた配列パターンを用意する。例えば、まず、入力された命令コード列のまま命令スロットに割り当て、次は、命令コードを入れ替えて命令スロットに割り当てる。例えば、命令コード列のパターンを、まず“００１１００００，０００１１０１１”とし、次に“０００１１０１１，００１１００００”とする。 The instruction slot assigning unit 302 assigns the instruction code string input by the input unit 301 to the instruction slot. That is, an instruction code array pattern of the instruction code string is prepared, and an array pattern obtained by rearranging the patterns is prepared. For example, first, the input instruction code string is assigned to the instruction slot, and next, the instruction code is replaced and assigned to the instruction slot. For example, the pattern of the instruction code string is first “00110000,00011011” and then “00011011 00110000”.

ビット遷移数演算部３０３（計算部）は、命令スロット割り当て部３０２によって命令スロットが割り当てられた命令コード列と直前の命令コード列とのビット遷移数を計算する。ビット遷移数とは、コードの１と０の極性が反転するビット数である。すなわち、コードとコードで異なるビットの数である。例えば、“０００１００１１，０００１０００１”と“０００１１０１１，００１１００００”のビット遷移数は“３”である。 The bit transition number calculation unit 303 (calculation unit) calculates the number of bit transitions between the instruction code sequence to which the instruction slot is allocated by the instruction slot allocation unit 302 and the immediately preceding instruction code sequence. The number of bit transitions is the number of bits in which the polarity of 1 and 0 of the code is inverted. That is, the number of bits that differ between codes. For example, the number of bit transitions between “00010011, 00010001” and “00011011 and 00110000” is “3”.

ビット遷移数比較・判定部３０４は、ビット遷移数演算部３０３によって計算されたビット遷移数をこれまでの最小ビット遷移数と比較し、より小さい方を最小ビット遷移数と判定する。出力部３０５は、最小ビット遷移数となる命令コード列を最適化コードとして、最適化コード格納部１１６へ出力する。例えば、出力部３０５は、最小ビット遷移数と判定された命令コード列を命令コード列毎に逐次出力してもよいし、オブジェクトコード全ての命令コード列をまとめて出力してもよい。この例では、“０００１１０１１，００１１００００”が出力されている。 The bit transition number comparison / determination unit 304 compares the bit transition number calculated by the bit transition number calculation unit 303 with the minimum bit transition number so far, and determines the smaller one as the minimum bit transition number. The output unit 305 outputs an instruction code string having the minimum number of bit transitions as an optimized code to the optimized code storage unit 116. For example, the output unit 305 may sequentially output the instruction code sequence determined to be the minimum number of bit transitions for each instruction code sequence, or may output the instruction code sequences of all object codes collectively. In this example, “00011011 and 00110000” are output.

直前の命令コード列格納部３０６には、出力部３０５によって出力された直前の命令コード列が格納されている。この直前の命令コード列とは、最適化部１１２が処理中の命令コード列の１つ前の命令コード列である。この例では、命令スロット１に“０００１００１１”、命令スロット２に“０００１０００１”が格納されている。 The immediately preceding instruction code string storage unit 306 stores the immediately preceding instruction code string output by the output unit 305. The immediately preceding instruction code string is an instruction code string immediately before the instruction code string being processed by the optimization unit 112. In this example, “00010011” is stored in the instruction slot 1 and “00010001” is stored in the instruction slot 2.

最小ビット遷移数格納部３０７には、ビット遷移数比較・判定部３０４によって判定された最小ビット遷移数が格納されている。この例では、最小ビット遷移数として“３”が格納されている。最小ビット遷移数の命令コード列格納部３０８には、ビット遷移数比較・判定部３０４によって判定された最小ビット遷移数の命令コード列が格納されている。この例では、命令スロット１に“０００１１０１１”、命令スロット２に“００１１００００”が格納されている。 The minimum bit transition number storage unit 307 stores the minimum bit transition number determined by the bit transition number comparison / determination unit 304. In this example, “3” is stored as the minimum number of bit transitions. The instruction code string storage unit 308 with the minimum number of bit transitions stores the instruction code string with the minimum number of bit transitions determined by the bit transition number comparison / determination unit 304. In this example, “00011011” is stored in the instruction slot 1, and “00110000” is stored in the instruction slot 2.

次に、図４及び図５を用いて、本実施形態にかかるプログラム最適化処理について説明する。図４は、このプログラム最適化処理を示すフローチャートであり、図５は、プログラム最適化処理で用いられるデータの例を示している。この処理は、プログラム最適化装置１１０の最適化部１１２における処理であり、例えば、ＣＰＵ等において所定のアプリケーションプログラムにより実行される。 Next, the program optimization process according to the present embodiment will be described with reference to FIGS. FIG. 4 is a flowchart showing the program optimization process, and FIG. 5 shows an example of data used in the program optimization process. This process is a process in the optimization unit 112 of the program optimization apparatus 110, and is executed by a predetermined application program in the CPU or the like, for example.

まず、直前の命令コード列があるか判定する（Ｓ４０１）。入力部３０１において、オブジェクトコード格納部１１５に格納されたオブジェクトコードの命令コード列が先頭から順に入力され、処理が行われる。現在、入力されて最適化処理の対象となる命令コード列を「発生しようとしている命令コード列」という。 First, it is determined whether there is an immediately preceding instruction code string (S401). In the input unit 301, the instruction code string of the object code stored in the object code storage unit 115 is input in order from the top, and processing is performed. An instruction code string that is currently input and subjected to optimization processing is referred to as an “instruction code string that is about to be generated”.

例えば、入力部３０１は、命令コード列が入力されると、直前の命令コード列格納部３０６を参照し、直前に出力した命令コード列があるかどうか判定する。図５の例では、直前の命令コード列格納部３０６の命令スロット１に“０００１０００１”、命令スロット２に“０００１１１１１”が格納されており、直前の命令コード列があると判定される。 For example, when an instruction code string is input, the input unit 301 refers to the immediately preceding instruction code string storage unit 306 and determines whether there is an immediately preceding instruction code string. In the example of FIG. 5, “00010001” is stored in the instruction slot 1 and “00011111” is stored in the instruction slot 2 of the immediately preceding instruction code string storage unit 306, and it is determined that there is an immediately preceding instruction code string.

Ｓ４０１において、直前の命令コード列があると判定された場合には、Ｓ４０２以降の処理によって、発生しようとしている命令コード列の最適化が行われる。Ｓ４０１において、直前の命令コード列がないと判定された場合には、Ｓ４０８の処理によって、発生しようとしている命令コード列が最適化による並び替えをせずにそのまま出力される。このとき、Ｓ４０８でそのまま出力するために、最小ビット遷移数の命令コード列格納部３０８に、発生しようとしている命令コード列を格納する。 If it is determined in S401 that there is an immediately preceding instruction code string, the instruction code string that is about to be generated is optimized by the processes in and after S402. If it is determined in S401 that there is no immediately preceding instruction code string, the instruction code string to be generated is output as it is without being rearranged by optimization in the process of S408. At this time, the instruction code string to be generated is stored in the instruction code string storage unit 308 having the minimum number of bit transitions for output as it is in S408.

次いで、発生しようとしている命令コード列を命令スロットに割り当てる（Ｓ４０２）。命令スロット割り当て部３０２は、発生しようとしている命令コード列に含まれる命令コードを命令スロットに割り当てる。すなわち、命令コード列の命令コードの配列パターンを用意する。この割り当ては、命令コード列の最適な組み合わせを求めるために繰り返し行われ、最終的に全ての組み合わせの命令コード列の割り当てが行われる。 Next, the instruction code string to be generated is assigned to the instruction slot (S402). The instruction slot allocation unit 302 allocates an instruction code included in the instruction code string to be generated to the instruction slot. That is, an instruction code array pattern of an instruction code string is prepared. This assignment is repeated in order to obtain the optimum combination of instruction code strings, and finally the instruction code strings of all combinations are assigned.

最終的に全ての組み合わせの命令コード列となるように割り当てられればよく、まず、入力された命令コード列のまま割り当ててもよいし、任意の順序に入れ替えてもよい。図５（ａ）の例では、発生しようとしている命令コード列として、命令スロット１に“０００１１１１０”、命令スロット２に“０００１０００１”が割り当てられている。 It is only necessary that the instruction code strings are finally assigned to all combinations. First, the input instruction code strings may be assigned as they are, or may be switched in an arbitrary order. In the example of FIG. 5A, “00011110” is assigned to the instruction slot 1 and “00010001” is assigned to the instruction slot 2 as the instruction code string to be generated.

次いで、発生しようとしている命令コード列と直前の命令コード列のビット遷移数を計算する（Ｓ４０３）。ビット遷移数演算部３０３は、Ｓ４０２において命令スロットに割り当てられた命令コード列と、直前の命令コード列格納部３０６に格納された直前の命令コード列と、について各命令コードを比較しビット遷移数を計算する。 Next, the number of bit transitions between the instruction code string to be generated and the immediately preceding instruction code string is calculated (S403). The bit transition number calculation unit 303 compares each instruction code with respect to the instruction code string assigned to the instruction slot in S402 and the immediately preceding instruction code string stored in the immediately preceding instruction code string storage unit 306, and compares the number of bit transitions. Calculate

例えば、ビット遷移数演算部３０３は、各命令コードの排他的論理和をとり、排他的論理和が１となったビット数を合計する。ビット遷移数を計算は、各命令コードごとに計算してもよいし、複数の命令コードをまとめたり、命令コード列全体で計算してもよい。図５（ａ）の例では、命令スロット１の命令コードのビット遷移数が“４”、命令スロット２の命令コードのビット遷移数が“３”となり、ビット遷移数の合計は“７”となる。 For example, the bit transition number calculation unit 303 calculates the exclusive OR of each instruction code and adds up the number of bits for which the exclusive OR is 1. The number of bit transitions may be calculated for each instruction code, a plurality of instruction codes may be collected, or the entire instruction code string may be calculated. In the example of FIG. 5A, the bit transition number of the instruction code in the instruction slot 1 is “4”, the bit transition number of the instruction code in the instruction slot 2 is “3”, and the total number of bit transitions is “7”. Become.

次いで、これまでの最小ビット遷移数と比較し（Ｓ４０４）、ビット遷移数がこれまでの最小ビット遷移数より小さいか判定する（Ｓ４０５）。ビット遷移数比較・判定部３０４は、Ｓ４０３において計算したビット遷移数と最小ビット遷移数格納部３０７に格納された最小ビット遷移数とを比較し、その大小関係を判定する。 Next, the number of bit transitions so far is compared (S404), and it is determined whether the number of bit transitions is smaller than the number of minimum bit transitions so far (S405). The bit transition number comparison / determination unit 304 compares the bit transition number calculated in S403 with the minimum bit transition number stored in the minimum bit transition number storage unit 307, and determines the magnitude relationship.

計算したビット遷移数がこれまでの最小ビット遷移数よりも小さいと判定された場合には、Ｓ４０６の処理によって、最小ビット遷移数格納部３０７が更新され、計算したビット遷移数がこれまでの最小ビット遷移数よりも小さくはない（最小ビット遷移数以上である）と判定された場合には、最小ビット遷移数格納部３０７を更新せずに、Ｓ４０７の処理によって、次の命令コード列の組み合わせが計算される。 When it is determined that the calculated number of bit transitions is smaller than the minimum number of bit transitions so far, the minimum bit transition number storage unit 307 is updated by the process of S406, and the calculated number of bit transitions is the minimum number thus far If it is determined that it is not smaller than the number of bit transitions (greater than or equal to the minimum number of bit transitions), the combination of the next instruction code sequence is not performed by updating the minimum number of bits transition storage unit 307 and the process of S407 Is calculated.

次いで、最小ビット遷移数を更新し、ビット遷移数が最小となる命令コード列を記憶する（Ｓ４０６）。比較・判定部３０４は、Ｓ４０３において計算され、Ｓ４０５において最小ビット遷移数と判定されたビット遷移数を最小ビット遷移数格納部３０７に格納する。さらに、比較・判定部３０４は、このビット遷移数の命令コード列、すなわち、Ｓ４０２において割り当てられ、Ｓ４０３においてビット遷移数が計算された命令コード列を最小ビット遷移数の命令コード列格納部３０８に格納する。 Next, the minimum number of bit transitions is updated, and an instruction code string that minimizes the number of bit transitions is stored (S406). The comparison / determination unit 304 stores, in the minimum bit transition number storage unit 307, the bit transition number calculated in S403 and determined as the minimum bit transition number in S405. Further, the comparison / determination unit 304 stores the instruction code string having the number of bit transitions, that is, the instruction code string allocated in S402 and the number of bit transitions calculated in S403 in the instruction code string storage unit 308 having the minimum number of bit transitions. Store.

図５（ａ）の例では、最小ビット遷移数格納部３０７に最小ビット遷移数がまだ格納されていない場合、“７”が最小ビット遷移数であると判定され、最小ビット遷移数格納部３０７に格納される。最小ビット遷移数の命令コード列格納部３０８の命令スロット１に“０００１１１１０”、命令スロット２に“０００１０００１”が格納される。 In the example of FIG. 5A, when the minimum bit transition number storage unit 307 has not yet stored the minimum bit transition number, it is determined that “7” is the minimum bit transition number, and the minimum bit transition number storage unit 307. Stored in “00011110” is stored in the instruction slot 1 and “00010001” is stored in the instruction slot 2 of the instruction code string storage unit 308 having the minimum number of bit transitions.

次いで、全ての命令コード列の組み合わせでビット遷移数を検査したか判定する（Ｓ４０７）。比較・判定部３０４は、Ｓ４０２〜Ｓ４０６において、発生しようとしている命令スロット列を、全ての組み合わせで命令スロットに割り当て、ビット遷移数の計算や比較等が行われたか判定する。 Next, it is determined whether the number of bit transitions has been checked for all combinations of instruction code strings (S407). In S402 to S406, the comparison / determination unit 304 assigns the instruction slot sequence to be generated to the instruction slots in all combinations, and determines whether the bit transition number is calculated or compared.

全ての命令コード列の組み合わせでビット遷移数を検査したと判定された場合には、Ｓ４０８の処理によって、命令コード列が出力される。全ての命令コード列の組み合わせでビット遷移数を検査していない判定された場合には、Ｓ４０２以降の処理が再度行われ、残りの組み合わせに並び替えて、ビット遷移数が計算される。 If it is determined that the number of bit transitions has been checked for all combinations of instruction code strings, an instruction code string is output by the processing of S408. If it is determined that the number of bit transitions is not inspected for all combinations of instruction code strings, the processing after S402 is performed again, rearranged into the remaining combinations, and the number of bit transitions is calculated.

図５の例では、図５（ａ）の処理の次に、残りの組み合わせがあるため、図５（ｂ）の処理が行われる。図５（ｂ）では、発生しようとしている命令コード列として、命令スロット１に“０００１０００１”、命令スロット２に“０００１１１１０”が割り当て、すなわち、命令スロット１と命令スロット２の命令コードを並び替えている。この命令コード列のビット遷移数は“１”となる。そして、最小ビット遷移数格納部３０７に“７”が格納されていた場合、“１”の方が小さいので、最小ビット遷移数格納部３０７に“１”が格納され、最小ビット遷移数の命令コード列格納部３０８には、命令スロット１“０００１０００１”、命令スロット２“０００１１１１０”が格納される。 In the example of FIG. 5, since there are remaining combinations after the process of FIG. 5A, the process of FIG. 5B is performed. In FIG. 5B, “00010001” is assigned to instruction slot 1 and “00011110” is assigned to instruction slot 2 as the instruction code string to be generated, that is, the instruction codes in instruction slot 1 and instruction slot 2 are rearranged. Yes. The number of bit transitions in this instruction code string is “1”. When “7” is stored in the minimum bit transition number storage unit 307, “1” is smaller, so “1” is stored in the minimum bit transition number storage unit 307, and the instruction of the minimum bit transition number is stored. The code string storage unit 308 stores instruction slot 1 “00010001” and instruction slot 2 “00011110”.

次いで、ビット遷移数最小となる命令コード列を出力し、直前の命令コード列を更新する（Ｓ４０８）。出力部３０５は、Ｓ４０１〜Ｓ４０７において、最小ビット遷移数の命令コード列格納部３０８に格納された命令コード列を、最適化コードとして最適化コード格納部１１６へ出力する。そして、出力部３０５は、出力した命令コード列を直前の命令コード列格納部３０６に格納する。 Next, an instruction code string that minimizes the number of bit transitions is output, and the immediately preceding instruction code string is updated (S408). In S401 to S407, the output unit 305 outputs the instruction code string stored in the instruction code string storage unit 308 having the minimum number of bit transitions to the optimization code storage unit 116 as an optimization code. Then, the output unit 305 stores the output instruction code string in the immediately preceding instruction code string storage unit 306.

図５の例では、図５（ｂ）の命令コード列である命令スロット１“０００１０００１”、命令スロット２“０００１１１１０”が出力され、この命令コード列が直前の命令コード列格納部３０６に格納される。そして、オブジェクトコードの全ての命令コードについて最適化処理が行われて、この処理が終了する。 In the example of FIG. 5, instruction slot 1 “00010001” and instruction slot 2 “00011110” which are the instruction code strings of FIG. 5B are output, and this instruction code string is stored in the immediately preceding instruction code string storage unit 306. The Then, optimization processing is performed for all instruction codes of the object code, and this processing ends.

このように、前のサイクルで実行する命令コード列と、次のサイクルで実行する命令コード列のビット遷移が少なくなるように、命令コードの配列を並び替えることにより、この命令コードを実行する計算機システムで命令コードを読み出す際に、バスに流れる命令コードのビット遷移を抑止することができる。よって、命令コードを転送する命令バスの消費電力を低減し、計算機システムの低消費電力化を図ることができる。また、命令コードの配列を並び替えているため、命令コードを実行する計算機システムにおいて、冗長なハードウェア構成を設ける必要がない。 In this way, a computer that executes this instruction code by rearranging the instruction code array so that the bit transition between the instruction code string executed in the previous cycle and the instruction code string executed in the next cycle is reduced. When the instruction code is read out by the system, the bit transition of the instruction code flowing on the bus can be suppressed. Therefore, the power consumption of the instruction bus for transferring the instruction code can be reduced, and the power consumption of the computer system can be reduced. Further, since the instruction code array is rearranged, it is not necessary to provide a redundant hardware configuration in the computer system that executes the instruction code.

発明の実施の形態２．
次に、図６のフローチャートを用いて、本発明の実施の形態２にかかるプログラム最適化処理について説明する。本実施形態では、図４の処理と異なり、命令コード列の全ての組み合わせについて、一度にビット遷移数の計算や比較を行うことを特徴とする。 Embodiment 2 of the Invention
Next, a program optimization process according to the second embodiment of the present invention will be described using the flowchart of FIG. In the present embodiment, unlike the processing of FIG. 4, the number of bit transitions is calculated and compared at once for all combinations of instruction code strings.

この処理によって得られる命令コードを実行する計算機システムや、この処理を実行するプログラム最適化装置の構成等は、図１〜図３と同様である。この処理は、図４の処理と同様に、プログラム最適化装置１１０の最適化部１１２における処理である。また、Ｓ６０１〜Ｓ６０４と図４のＳ４０１〜Ｓ４０４、Ｓ６０５と図４のＳ４０８は、その一部又は全部が同じか類似の処理であり、適宜説明を省略する。 The computer system that executes the instruction code obtained by this processing, the configuration of the program optimization device that executes this processing, and the like are the same as those in FIGS. This process is a process in the optimization unit 112 of the program optimizing device 110 as in the process of FIG. Also, S601 to S604 and S401 to S404 in FIG. 4 and S605 and S408 in FIG. 4 are the same or similar processes, and description thereof will be omitted as appropriate.

命令コード列が入力されると、まず、直前の命令コード列があるか判定し（Ｓ６０１）、直前の命令コード列があると判定された場合にはＳ６０２が行われ、直前の命令コード列がないと判定された場合にはＳ６０５が行われる。 When an instruction code string is input, it is first determined whether there is an immediately preceding instruction code string (S601). If it is determined that there is an immediately preceding instruction code string, S602 is performed, and the immediately preceding instruction code string is determined. If it is determined that there is no, S605 is performed.

次いで、発生しようとしている命令コード列の全ての組み合わせを命令スロットに割り当てる（Ｓ６０２）。命令スロット割り当て部３０２は、命令コード列の全ての組み合わせを求めて、命令スロットの割り当てを行う。すなわち、命令コード列の命令コードの全ての配列パターンを用意する。例えば、図５（ａ）と（ｂ）の２つの命令コード列を割り当てる。 Next, all combinations of instruction code strings to be generated are assigned to instruction slots (S602). The instruction slot assignment unit 302 obtains all combinations of instruction code strings and assigns instruction slots. That is, all array patterns of instruction codes in the instruction code string are prepared. For example, two instruction code strings shown in FIGS. 5A and 5B are assigned.

次いで、発生しようとしている命令コード列の全ての組み合わせにおいて、命令コード列と直前の命令コード列のビット遷移数を計算する（Ｓ６０３）。ビット遷移数演算部３０３は、Ｓ６０２において割り当てられた全ての組み合わせの命令コード列について、直前の命令コード列格納部３０６に格納された直前の命令コード列とのビット遷移数を計算する。例えば、図５（ａ）と（ｂ）のビット遷移数である“７”と“１”を求める。 Next, in all combinations of instruction code strings to be generated, the number of bit transitions between the instruction code string and the immediately preceding instruction code string is calculated (S603). The bit transition number calculation unit 303 calculates the number of bit transitions with the immediately preceding instruction code string stored in the immediately preceding instruction code string storage unit 306 for all combinations of instruction code strings assigned in S602. For example, “7” and “1” which are the number of bit transitions in FIGS. 5A and 5B are obtained.

次いで、命令コード列の全ての組み合わせのビット遷移数を比較する（Ｓ６０４）。ビット遷移数比較・判定部３０４は、Ｓ６０３において計算した全ての組み合わせの命令コード列のビット遷移数を比較し、その大小関係を判定する。例えば、図５（ａ）と（ｂ）のビット遷移数である“７”と“１”を比較し、“１”の方が小さいと判定し、図５（ｂ）の命令コード列が最小ビット遷移数の命令コード列となる。 Next, the number of bit transitions of all combinations of instruction code strings is compared (S604). The bit transition number comparison / determination unit 304 compares the number of bit transitions of the instruction code strings of all combinations calculated in S603, and determines the magnitude relationship. For example, the number of bit transitions “7” and “1” in FIGS. 5A and 5B are compared, and it is determined that “1” is smaller, and the instruction code string in FIG. The instruction code string is the number of bit transitions.

次いで、ビット遷移数最小となる命令コード列を出力し、直前の命令コード列を更新する（Ｓ６０５）。Ｓ６０１〜Ｓ６０４において最小ビット遷移数と判定した命令コード列を最適化コードとして出力する。 Next, an instruction code string that minimizes the number of bit transitions is output, and the immediately preceding instruction code string is updated (S605). The instruction code string determined as the minimum number of bit transitions in S601 to S604 is output as an optimization code.

このように、全ての組み合わせの命令コード列について、一度にビット遷移数の計算、比較を行うことも可能であり、図４と同じ最適化コードを効率よく得ることができる。 In this way, it is possible to calculate and compare the number of bit transitions at once for all combinations of instruction code strings, and the same optimized code as in FIG. 4 can be obtained efficiently.

その他の発明の実施の形態．
尚、上述の例では、前の命令コード列と次の命令コード列のビット遷移数を順次求めて、命令コードを並び替えたが、さらに、その他の情報を考慮して、命令コードを並び替えてもよい。例えば、命令コードの属するスレッド番号を参照し、スレッド番号によって並び替えるかどうか制御してもよい。同じスレッドの場合には、命令コードの実行順序は変えられないが、違うスレッドの場合には、命令コードの実行順序を入れ替えることも可能である。 Other Embodiments of the Invention
In the above example, the number of bit transitions of the previous instruction code string and the next instruction code string is sequentially obtained and the instruction codes are rearranged. In addition, the instruction codes are rearranged in consideration of other information. May be. For example, referring to the thread number to which the instruction code belongs, it may be controlled whether to rearrange by the thread number. In the case of the same thread, the instruction code execution order cannot be changed. However, in the case of a different thread, the instruction code execution order can be changed.

上述したプログラム最適化装置１１０を実現するためのハードウェア構成の例を図７に示す。プログラム最適化装置１１０は、典型的なコンピュータ・システムが利用可能であり、中央処理装置（ＣＰＵ）２０１とメモリ２０４とを含んでいる。 An example of a hardware configuration for realizing the program optimization apparatus 110 described above is shown in FIG. The program optimizing device 110 can use a typical computer system and includes a central processing unit (CPU) 201 and a memory 204.

ＣＰＵ２０１とメモリ２０４とは、バスを介して補助記憶装置としてのハードディスク装置２１３に接続される。フレキシブルディスク装置２２０、ハードディスク装置２１３・２３０、ＣＤ−ＲＯＭドライブ２２６・２２９、ＭＯドライブ２２８等の記憶媒体駆動装置は、フレキシブルディスク・コントローラ２１９、ＩＤＥコントローラ２２５、ＳＣＳＩコントローラ２２７等の各種コントローラを介してバスに接続される。フレキシブルディスク装置２２０等の記憶媒体駆動装置には、フレキシブルディスク等の可搬型記憶媒体が挿入される。 The CPU 201 and the memory 204 are connected to a hard disk device 213 as an auxiliary storage device via a bus. Storage medium drive devices such as the flexible disk device 220, hard disk devices 213 and 230, CD-ROM drives 226 and 229, and MO drive 228 are connected via various controllers such as the flexible disk controller 219, IDE controller 225, and SCSI controller 227. Connected to the bus. A portable storage medium such as a flexible disk is inserted into a storage medium driving device such as the flexible disk device 220.

記憶媒体にはオペレーティングシステムと共同してＣＰＵ２０１等に命令を与え、プログラム最適化装置１１０の機能を実施するためのコンピュータ・プログラムを記憶することができる。コンピュータ・プログラムは、メモリ２０４にロードされることによって実行される。コンピュータ・プログラムは圧縮し、又、複数に分割して記憶媒体に記憶することができる。ハードウェア構成は、典型的には、ユーザ・インターフェース・ハードウェアを備える。 The storage medium can store a computer program for giving instructions to the CPU 201 and the like in cooperation with the operating system and for executing the functions of the program optimization device 110. The computer program is executed by being loaded into the memory 204. The computer program can be compressed or divided into a plurality of pieces and stored in a storage medium. The hardware configuration typically comprises user interface hardware.

ユーザ・インターフェース・ハードウェアとしては、例えば、入力をするためのポインティングデバイス（マウス２０７、ジョイスティック等）やキーボード２０６、あるいは、視覚データをユーザに提示するための液晶ディスプレイなどの表示装置２１１やＣＲＴディスプレイ２１２がある。 Examples of user interface hardware include a display device 211 such as a pointing device (mouse 207, joystick, etc.) and a keyboard 206 for inputting, a liquid crystal display for presenting visual data to the user, and a CRT display. There are 212.

また、パラレルポート２１６を介してプリンタを接続するも可能である。このコンピュータ・システムは、シリアルポート２１５を介してモデムを接続することが可能であり、シリアルポート２１５およびモデムまたはトークンリングや通信アダプタ２１８等を介してネットワークに接続し、他のコンピュータ・システムと通信を行っている。尚、上記構成は必要に応じて省略することができる。 It is also possible to connect a printer via the parallel port 216. This computer system can be connected to a modem via the serial port 215, and is connected to a network via the serial port 215 and the modem or token ring, communication adapter 218, etc., and communicates with other computer systems. It is carried out. In addition, the said structure can be abbreviate | omitted as needed.

本発明にかかる計算機システムのハードウェア構成図である。It is a hardware block diagram of the computer system concerning this invention. 本発明にかかるプログラム最適化装置の構成を示すブロック図である。It is a block diagram which shows the structure of the program optimization apparatus concerning this invention. 本発明にかかるプログラム最適化装置の最適化部の構成を示すブロック図である。It is a block diagram which shows the structure of the optimization part of the program optimization apparatus concerning this invention. 本発明にかかるプログラム最適化処理を示すフローチャートである。It is a flowchart which shows the program optimization process concerning this invention. 本発明にかかるプログラム最適化処理で用いられるデータ例を示す図である。It is a figure which shows the example of data used by the program optimization process concerning this invention. 本発明にかかるプログラム最適化処理を示すフローチャートである。It is a flowchart which shows the program optimization process concerning this invention. 本発明にかかるプログラム最適化装置のハードウェア構成図である。It is a hardware block diagram of the program optimization apparatus concerning this invention. 一般的な計算機システムのハードウェア構成図である。It is a hardware block diagram of a general computer system.

Explanation of symbols

１００計算機システム
１０１命令メモリ
１０２ａ，ｂ命令デコーダ
１０３データメモリ
１０４ａ，ｂ演算器
１０５ａ，ｂ命令バス
１０６ａ，ｂデータバス
１１０プログラム最適化装置
１１１コンパイラ
１１２最適化部
１１３リンカ
１１４ソースコード格納部
１１５オブジェクトコード格納部
１１６最適化コード格納部
１１７実行コード格納部
３０１入力部
３０２命令スロット割り当て部
３０３ビット遷移数演算部
３０４ビット遷移数比較・判定部
３０５出力部
３０６直前の命令コード列格納部
３０７最小ビット遷移数格納部
３０８最小ビット遷移数の命令コード列格納部 DESCRIPTION OF SYMBOLS 100 Computer system 101 Instruction memory 102a, b Instruction decoder 103 Data memory 104a, b Operation unit 105a, b Instruction bus 106a, b Data bus 110 Program optimization apparatus 111 Compiler 112 Optimization part 113 Linker 114 Source code storage part 115 Object code Storage unit 116 Optimization code storage unit 117 Execution code storage unit 301 Input unit 302 Instruction slot allocation unit 303 Bit transition number calculation unit 304 Bit transition number comparison / determination unit 305 Output unit 306 Immediate instruction code string storage unit 307 Minimum bit transition Number storage unit 308 Instruction code string storage unit with minimum bit transition number

Claims

A program for causing a computer to execute a process for optimizing an instruction code executed in a computer system capable of processing a plurality of instructions in parallel,
A first instruction code string in which a plurality of instruction codes to be executed in parallel are arranged and a second instruction code string to be executed next to the first instruction code string in which a plurality of instruction codes to be executed in parallel are arranged are input. And
Calculating the number of bit transitions between the first instruction code string and the second instruction code string;
According to the calculated bit transition number, the instruction code array of the second instruction code sequence is rearranged,
Outputting the first instruction code string and the rearranged second instruction code string;
program.

The rearrangement rearranges the instruction code array of the second instruction code string so that the calculated number of bit transitions is minimized.
The program according to claim 1.

The sort is
Preparing a first array pattern, which is an instruction code array of a second instruction code string, and a second array pattern different from the first array pattern;
Comparing the number of bit transitions according to the first array pattern with the number of bit transitions according to the second array pattern to determine an array pattern with a small number of bit transitions;
The output outputs the second instruction code string composed of the first or second arrangement pattern determined that the number of bit transitions is small.
The program according to claim 1 or 2.

The rearrangement replaces an instruction code array by changing an assignment of an instruction slot to an instruction code included in the second instruction code string.
The program according to any one of claims 1 to 3.

The first instruction code and the second instruction code according to a thread number between the first instruction code included in the first instruction code string and the second instruction code included in the second instruction code string. Replace the instruction code,
The program according to any one of claims 1 to 4.

An optimization method for optimizing an instruction code executed in a computer system capable of processing a plurality of instructions in parallel,
A first instruction code string in which a plurality of instruction codes to be executed in parallel are arranged and a second instruction code string to be executed next to the first instruction code string in which a plurality of instruction codes to be executed in parallel are arranged are input. And
Calculating the number of bit transitions between the first instruction code string and the second instruction code string;
According to the calculated number of bit transitions, the instruction code array of the second instruction code sequence is rearranged,
Outputting the first instruction code string and the rearranged second instruction code string;
Optimization method.

An optimization device that optimizes an instruction code executed in a computer system capable of processing a plurality of instructions in parallel,
A first instruction code string in which a plurality of instruction codes to be executed in parallel are arranged and a second instruction code string to be executed next to the first instruction code string in which a plurality of instruction codes to be executed in parallel are arranged are input. An input unit to
A calculation unit for calculating the number of bit transitions between the first instruction code string and the second instruction code string;
A rearrangement unit for rearranging the instruction code array of the second instruction code string according to the calculated number of bit transitions;
An output unit for outputting the first instruction code string and the rearranged second instruction code string;
An optimization device comprising: