JP3692884B2

JP3692884B2 - Program processing method and recording medium

Info

Publication number: JP3692884B2
Application number: JP2000014517A
Authority: JP
Inventors: 哲也田中; 岳人瓶子
Original assignee: Panasonic Corp; Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Corp; Panasonic Holdings Corp
Priority date: 2000-01-24
Filing date: 2000-01-24
Publication date: 2005-09-07
Anticipated expiration: 2020-01-24
Also published as: JP2001202252A

Description

【０００１】
【発明の属する技術分野】
本発明は、高級言語からオブジェクトコードを生成するコンパイラ、複数のオブジェクトコードを連結編集し、実行形式コードを生成するリンカを含むプログラム処理方法、処理装置および記録媒体に関するものであり、特に、並列プロセッサ向けの最適化技術に関する。
【０００２】
【従来の技術】
従来のプログラム処理方法では、コンパイラが高級言語で記述されたソースコードを解析し、目的機械向けに最適化を行い、オブジェクトコードとして生成し、リンカが複数のオブジェクトコードを連結編集して実行形式コードを生成していた。
【０００３】
ソースコードはユーザが記述する際、目的機械の具体的構成を想定して記述するのではなく図１２に示す計算機モデルを想定していた。このモデルはプロセッサ１つに対し、メモリバスが１つ接続され主記憶としての記憶装置が接続されている。このモデルにおける主記憶やメモリバスは概念上の構成であり、実際の構成は例えば図１３に示す構成をとることもできる。図１３では、プロセッサに３つのメモリバスが接続され、それぞれのメモリバスには記憶装置が接続されているという構成をとっている。
【０００４】
従来のプログラム処理方法の動作について、図１０に示すＣ言語で記述されたソースコードの具体例を用いて簡単に説明する。図１０のソースコードの場合、従来のプログラム処理方法では、図７に示すアセンブリコードを生成する。すなわち、目的機械が図１３のようにメモリアクセス命令（”ｌｄ”や”ｓｔ”命令）を並列実行可能な場合があっても、図１２の目的機械モデルを想定するのでメモリアクセス命令の並列実行の可否を判定できない。
【０００５】
【発明が解決しようとする課題】
上記した従来のプログラム処理方法では、具体的目的機械を想定するステップが存在しておらず、また様々な構成の目的機械の違いを目的機械モデルで吸収することにしているため、目的機械の違いを考慮した最適化を行うことができないことが課題である。
【０００６】
また、従来のプログラム処理方法では、それぞれ特性の異なるアプリケーションの特性を十分に用いた最適化ができない。すなわち、従来のプログラム処理装置では様々なアプリケーションの特性を目的機械モデルの構成の範囲内で表せる特性のみを用いて最適化しており、実際の目的機械の構成を活用するためのアプリケーション情報を用いた最適化を行うことができないことが課題である。
【０００７】
本発明はかかる問題点に鑑みてなされたものであり、ソースコードを記述するユーザがアプリケーションの特性と目的機械の構成をプログラム処理における最適化に用いるステップが備わったプログラム処理方法を提供することを目的とする。
【０００８】
【課題を解決するための手段】
この課題を解決するために請求項１記載のプログラム処理方法は、
高級言語で記述された複数のソースコードからなるプログラムをＣＰＵが実行形式コードに変換する方法であって、前記ソースコードを第１の中間コードに変換する第１の変換ステップと、前記第１の中間コードを第２の中間コードに最適化する最適化ステップと、前記第２の中間コードを前記実行形式コードに変換する第２の変換ステップとを備え、前記最適化ステップが、前記実行形式コードを実行する目的機械の並列アクセス可能なメモリの構成情報を抽出するハードウェア情報抽出ステップと、前記ソースコードからプログラムで実現されるアプリケーションのメモリ使用の特性情報を抽出するアプリケーション情報抽出ステップと、命令間の並列実行可否の判定におけるメモリ資源競合を判定するとき前記メモリの構成情報と前記アプリケーションのメモリ使用の特性情報の参照により互いに並列アクセス可能なメモリへのアクセス命令であると判明した場合はメモリ資源の競合がないと判定し、そうでない場合はメモリ資源の競合があると判定するメモリ資源競合判定ステップを備える。
【００１３】
【発明の実施の形態】
以下、本発明の実施の形態について、図１から図１３を参照しながら説明する。
【００１４】
図１は本発明の実施の形態におけるプログラム処理方法の処理の流れとファイルの入出力関係を示すフローチャートである。
【００１５】
コンパイラ上流処理１００は、ファイル形式で保存されている高級言語ソースコード２００を読み込み、構文解析および意味解析などを行って内部形式コードを生成する。さらに必要に応じて、最終的に生成される実行形式コードの実行時間やコードサイズが短くなるように内部形式コードを最適化する。なお、ソースコード２００に後述する本発明特有のアプリケーション情報が含まれる場合は読み飛ばされる。
【００１６】
アセンブラコード生成処理１０１は、コンパイラ上流処理１００で生成、最適化された内部形式コードからアセンブラコードを生成する。
【００１７】
コンパイラ上流処理１００およびアセンブラコード生成処理１０１は、本発明の主眼ではなく、また、前記したアプリケーション情報を読み飛ばすことを除いて、従来のプログラム処理方法と同一であるので、詳細は省略する。
【００１８】
命令スケジューリング処理１０２は、アセンブラコード生成処理１０１で生成されたアセンブラコードに対し命令間の依存関係の解析、資源競合の解析に基づき命令スケジューリング(命令順の並び替え)を行うことで、アセンブラコードを目的機械向けに並列化する。
【００１９】
また、命令スケジューリング処理１０２はアプリケーション情報をアプリケーション情報ファイル２０１から読み取る。アプリケーション情報ファイル２０１はソースコードあるいはソースコードとは異なるファイルのどちらの形式でも良い。本実施の形態では、ソースコードにアプリケーション情報が記述されているものとする。図４にアプリケーション情報が含まれたソースコードの例を示す。
【００２０】
図４において１〜２行目にアプリケーション情報が含まれる。１行目の”ｍ０：：ｉｎｔＧ０［２］；”は２要素の整数配列Ｇ０を論理メモリｍ０に割り当てることを示している。２行目も同様に配列Ｇ１を論理メモリ間ｍ１に割り当てることを示している。論理メモリについては後述する。
【００２１】
さらに、命令スケジューリング処理１０２はハードウェア情報をハードウェア情報ファイル２０２から読み取る。ハードウェア情報ファイル２０２はソースコードあるいはソースコードとは異なるファイルのどちらの形式でもよい。本実施の形態では、ソースコードとは異なるファイルにハードウェア情報が記述されているものとする。図８にハードウェア情報ファイル２０２の内容を示す。
【００２２】
図８に示すハードウェア情報は図１１に示すフォーマットで、目的機械ごとに用意される。本実施の形態の図８のハードウェア情報は図１３に示した目的機械に対応したものである。以下、ハードウェア情報内の各要素について説明する。
【００２３】
論理メモリは”：：”以降に示された属性を示す名称であり、図４に示すように、ソースコードにおける静的変数(メモリへの割当が決まる変数)に付与される。なお、論理メモリｓｔａｃｋは、自動変数アクセスなどのスタック操作の場合に自動的に付与され、論理メモリｍａｉｎは上記以外のメモリアクセスの場合に自動的に付与される。
【００２４】
物理メモリは、目的機械がアクセス可能なメモリを示しており、異なる名称はそれぞれ互いに並列にアクセス可能であることを示している。本実施の形態では図１０に示すようにプロセッサはＸメモリ(主記憶)、Ｙメモリ、Ｚメモリの３つのメモリが接続されている。これらのメモリは並列アクセスが可能なメモリであり、アクセスするための命令はそれぞれ異なるものとする。なお、本実施の形態では命令によりどのメモリをアクセスするかを命令で選択しているが、アドレスや並列実行する際の命令位置により選択しても良い。
【００２５】
Ｘメモリは主記憶として使用されキャッシュを搭載する。周辺機器のレジスタはＸメモリにマッピングされている。ＹメモリはＲＯＭで実装されており書き込むことはできない。ＺメモリはＲＡＭで実装されているが容量が小さいものとする。それぞれのメモリは独立したアドレス空間を有するとしているが、他の方法として、一つのアドレス空間の一部をそれぞれのメモリに割り当てても良い。
【００２６】
アドレス範囲属性は、論理メモリに割り当てられた変数が利用可能なアドレスの範囲を示しており、物理メモリで利用できるアドレス範囲を越えることはない。アドレス範囲属性は、変数割り当て時のサイズチェックを行いコンパイル時にソースコードの誤りを検出するのに用いる。
【００２７】
アクセス単位属性は、論理メモリがアクセス可能な単位を示しており、物理メモリのアクセス可能なアクセス単位より小さくなることはない。一般に、ある単位のアクセスに限定したメモリは、複数のアクセス単位をサポートするメモリより高速にアクセスできるかあるいは低コストで実現できる。このようなアクセス単位の限定されたメモリをサポートする目的機械向けに最適化する場合に指定することができる。具体的には、変数割り当て時のアラインやメモリアクセス命令のサイズ指定に用いる。
【００２８】
アクセス方法属性は、論理メモリに割り当てられた変数が物理メモリをアクセスするときのアクセス方法を示しており、ｃａｃｈｅ，ｕｎｃａｃｈｅ，ｓｔｒｅａｍがある。物理メモリのサポートするアクセス方法以外を指定することはできない。ｃａｃｈｅはキャッシュが利用可能な場合はキャッシュに格納することを命令中に指定し、ｕｎｃａｃｈｅはキャッシュが利用可能であってもキャッシュに格納することはないように命令中に指定する。ｓｔｒｅａｍはシーケンシャルなアクセスを行う場合に効率が良くなるようにプリフェッチなどを行うように命令中に指定する。
【００２９】
リードライト属性は、論理メモリに割り当てられた変数がリードおよびライト可能（ｒｗ）、リードのみ可能（ｒｏ）、ライトのみ可能（ｗｏ）であるかを指定する。物理メモリがＲＯＭの場合ｒｏを指定することで、ＲＯＭ領域への書き込みをするソースコードの誤りをコンパイル時に検出するのに用いる。
【００３０】
命令スケジューリング処理１０２の詳細について図２、図３を参照しながら説明する。説明の簡単化のために命令スケジューリングは基本ブロックを処理単位とする。したがって、命令シーケンスは１パスのみ存在する。図２は命令スケジューリング処理１０２の詳細なフローチャートである。図２の各ステップについて説明する。
【００３１】
ステップ１１０は、未処理の命令のうち、その時点での先頭の命令を選択し空の状態にある命令群Ａに加えることで最初の要素とする。以降のステップで、命令群Ａに含まれる各命令と並列実行可能な命令を残りの未処理命令から検索し命令群Ａに加える処理を行う。
【００３２】
なお、本実施の形態の目的機械は最大３命令を並列実行可能なスーパスカラ構成のプロセッサシステムを想定しており、３命令のうち最大２つのメモリアクセス命令を並列に実行可能であるとする。その際メモリアクセス命令はそれぞれ異なる物理メモリでなければならないという制約がある。
【００３３】
ステップ１１１は未処理の命令の中から命令群Ａの各命令と並列実行するために、命令実行順を変更可能な命令を複数選択し、候補とする。ステップ１１２は候補の中から先頭の命令を選択し命令Ｂとする。
【００３４】
ステップ１１３は命令群Ａの各命令と命令Ｂの並列実行可否を判定する処理である。図３に並列実行可否判定処理の詳細なフローチャートを示す。以降、図３を参照しながら説明する。
【００３５】
ステップ１３０は命令群Ａの各命令と命令Ｂの間のデータ依存関係の有無を判定する。データ依存関係とはある結果を定義する命令とその結果を参照する命令間の関係であり、この関係にある命令は互いに並列実行できない。データ依存関係にあると判定された場合は、並列実行不可として並列実行可否判定処理を終了する。
【００３６】
ステップ１３１は命令群Ａの各命令と命令Ｂの間に目的機械の演算器資源に競合があるかを判定する。演算器資源に競合があると判定された場合は、並列実行不可として並列実行可否判定処理を終了する。
【００３７】
ステップ１３２は、命令群Ａの各命令と命令Ｂの間に目的機械の物理メモリに競合があるかを判定する。命令群Ａの各命令あるいは命令Ｂがメモリアクセス命令の場合、その命令が生成された要因により処理が異なる。
【００３８】
メモリアクセス命令が静的変数へのアクセスにより生成された場合は、静的変数に付与された論理メモリ（変数に付与したｍ０、ｍ１、ｉｏなど）を図４に示したソースコードに含まれるアプリケーション情報から読み取り、論理メモリに対応する物理メモリを図８に示したハードウェア情報から読み取る。
【００３９】
メモリアクセス命令が自動変数やスタック退避、復帰から生成された場合は、論理メモリをｓｔａｃｋとし、対応する物理メモリを図８に示したハードウェア情報から読み取る。上記に属さないケースにより生成された場合は、論理メモリをｍａｉｎとし、対応する物理メモリを図８に示したハードウェア情報から読み取る。
【００４０】
命令群Ａの各命令で使用している物理メモリを命令Ｂが使用する場合は、並列実行不可として並列実行可否判定処理を終了する。
【００４１】
ステップ１３０でデータ依存がなく、ステップ１３１で演算器資源競合がなく、ステップ１３２で物理メモリの競合がないと判定された場合は並列実行可として並列実行可否判定処理を終了する。
【００４２】
図２に戻って、ステップ１１３の判定が並列実行不可の場合はステップ１１４へ、並列実行可の場合はステップ１１５へそれぞれ進む。ステップ１１４は命令Ｂが命令群Ａの各命令と並列実行できないと判定されたため、命令Ｂを候補から外す。そして、ステップ１１７に進む。
【００４３】
ステップ１１５では命令群Ａの各命令と命令Ｂが並列実行可能と判定されたので命令Ｂを命令群Ａに加え、候補から外す。
【００４４】
ステップ１１６では命令群Ａに含まれる命令数が目的機械が並列実行できる最大の命令数に到達したことを判定し、到達した場合は、命令群Ａと並列実行可能な命令の検索を終了するためステップ１１８に進む。命令群Ａの命令数が最大命令数に到達していなければステップ１１７に進む。
【００４５】
ステップ１１７は候補がまだ存在するかを判定する。存在する場合はステップ１１２に戻り次の候補の並列実行可否判定を行う。候補が存在しない場合はステップ１１８に進む。
【００４６】
ステップ１１８は命令群Ａに含まれる命令を処理済とし、命令スケジューリング処理後のアセンブラコードを生成する。なお、その際メモリアクセス命令については、使用する物理メモリをアクセスする命令に置き換える。同時に前記したアクセス方法属性すなわち、ｃａｃｈｅ、ｕｎｃａｃｈｅ、ｓｔｒｅａｍを命令中に指定する。
【００４７】
また、メモリアクセス命令のアドレスを特定できる場合は前記したアドレス範囲属性を超えていないことを確認し、超えている場合はエラー処理を行う。さらに、メモリアクセス命令のアクセス単位が前記した物理メモリのアクセス単位属性より小さかったり、２のべき乗倍になっていない場合は、物理メモリのサポートしないアクセスサイズなのでエラー処理を行う。前記リードライト属性についてもメモリアクセス命令のアクセス方向についてのエラー検出を行う。すなわち、物理メモリのリードライト属性が”ｒｏ”の物理メモリを使用する命令がストア命令であったり、”ｗｏ”の物理メモリを使用する命令がロード命令である場合にエラー処理を行う。
【００４８】
ステップ１１９は、未処理命令が存在する場合、ステップ１１０に戻り新たな命令群Ａについて並列実行可能な命令の検索を行う。未処理命令が存在しない場合は命令スケジューリング処理を終了する。
【００４９】
図１に戻って、オブジェクトコード生成処理１０３は、命令スケジューリング処理１０２で生成したアセンブラコードをオブジェクトコードに変換し、オブジェクトコードファイル２０３として出力する。連結編集処理１０４は、複数のオブジェクトコードファイル２０３を読み込み編集連結を行って実行形式コードファイル２０４を生成する。オブジェクトコード生成処理１０３および連結編集処理１０４は本発明の主眼でなく、また、従来のプログラム処理方法と同一であるので詳細は省略する。
【００５０】
（具体的動作の説明）
次に、本プログラム処理方法の特徴的な構成要素の動作について具体的なプログラムを用いて説明する。図４は、本発明用に記述されたソースコードであり、従来のＣ言語の仕様にアプリケーション情報を追加した形になっている。
【００５１】
図４において、１行目、２行目の”ｍ０：：”、”ｍ１：：”がアプリケーション情報であり、整数配列Ｇ０，Ｇ１の論理メモリへの割り当てを表している。アプリケーション情報はプログラマがアプリケーションの特性を考慮して記述する。この場合、Ｇ０とＧ１は異なる論理メモリに割り当てており、プログラマはＧ０とＧ１のアクセスを並列に行うことのできるアプリケーションの特性を利用して、そのように動作することを期待している。
【００５２】
なお、図４においては整数配列が論理メモリへの割り当てされた例を示しているが、整数以外の型であっても同様であることは言うまでもない。また、配列でない通常変数やポインタであっても同様である。ただし、ポインタの場合、ある論理メモリに割り当てられたポインタに他の論理メモリに割り当てられた変数のアドレスを代入すると論理メモリの参照先を間違えるので、エラーとすべきである。
【００５３】
図４の関数ｆｕｎｃは整数引数ａ、ｂをとり、グローバル整数配列Ｇ０のインデックス０の値と引数ａの値を乗算し、グローバル整数配列Ｇ１のインデックス０に格納し、グローバル整数配列Ｇ０のインデックス１の値と引数ｂの値を乗算し、グローバル整数配列Ｇ１のインデックス１に格納するものである。図４のソースコードはソースコード２００にファイルとして格納されている。
【００５４】
図５はソースコード２００に格納された図４のソースコードをコンパイラ上流処理１００およびアセンブラコード生成処理１０１の処理を終えた後のアセンブラコードである。以下簡単に説明する。
【００５５】
１行目は、関数ｆｕｎｃの先頭を表すラベルである。
【００５６】
２行目は、配列Ｇ０のアドレスをレジスタｒ０に格納する。
【００５７】
３行目は、配列Ｇ１のアドレスをレジスタｒ１に格納する。
【００５８】
４行目は、レジスタｒ０に格納されているデータをアドレスとしてメモリに格納されているデータを読み出しレジスタｒ２に格納する（Ｇ０［０］の値）。
【００５９】
５行目は、スタックポインタｓｐにオフセット８を加えたデータをアドレスとしてメモリから読み出しレジスタｒ３に格納する。図９にスタックフレームの構成図を示す。図９において関数ｆｕｎｃの処理中はｓｐは図の位置を示しており、引数ａ，ｂはそれぞれオフセット４、８の位置に格納されている。したがって、５行目は引数ａの値をレジスタｒ３に格納していることになる。
【００６０】
６行目は、レジスタｒ２とレジスタｒ３の値を乗算しレジスタｒ４に格納する。
【００６１】
７行目は、レジスタｒ１に格納されているデータをアドレスとして、レジスタｒ４に格納されているデータをメモリに格納する（Ｇ１［０］に格納）。
【００６２】
８行目は、レジスタｒ０に格納されているデータにオフセット４を加えた値をアドレスとしてメモリに格納されているデータを読み出しレジスタｒ２に格納する（Ｇ０［１］の値）。
【００６３】
９行目は、スタックポインタｓｐにオフセット４を加えたデータをアドレスとしてメモリから読み出しレジスタｒ３に格納する（引数ｂの値）。
【００６４】
１０行目は、レジスタｒ２とレジスタｒ３の値を乗算しレジスタｒ４に格納する。
【００６５】
１１行目は、レジスタｒ１に格納されているデータにオフセット４を加えたデータをアドレスとして、レジスタｒ４に格納されているデータをメモリに格納する（Ｇ１［１］に格納）。
【００６６】
１２行目は、関数ｆｕｎｃから呼び出しプログラムに復帰する。
【００６７】
図５のアセンブラコードは命令スケジューリング処理１０２で最適化（並列化）され、図６のアセンブラコードとなる。命令スケジューリング処理１０２は図４のソースコードに含まれるアプリケーション情報と図８のハードウェア情報を読み取り以下の最適化に用いる。
【００６８】
次に図２を参照しながら説明する。ステップ１１０ではまず”２：ｍｏｖＧ０，ｒ０”が空の状態の命令群Ａに加えられる。ここで、”２：”とは図５の行番号を示している。以下も同様の表記をする。
【００６９】
ステップ１１１では、”３：ｍｏｖＧ１，ｒ１”、”５：ｌｄ（８，ｓｐ），ｒ３”が候補として登録される。ステップ１１２では、まず”３：ｍｏｖＧ１，ｒ１”が命令Ｂとされ、ステップ１１３で並列実行可否判定処理が行われる。”２：ｍｏｖＧ０，ｒ０”と”３：ｍｏｖＧ１，ｒ１”はデータ依存関係、演算器資源競合、物理メモリ資源競合はないので並列実行可と判定される。ステップ１１５で”３：ｍｏｖＧ１，ｒ１”が命令群Ａに加えられ候補から外される。ステップ１１６では目的機械の最大並列実行数の３に到達しておらず、ステップ１１７では候補がまだ存在するのでステップ１１２に戻る。”５：ｌｄ（８，ｓｐ），ｒ３”も同様に並列実行可能と判定され命令群Ａに加えられる。
【００７０】
ステップ１１７で候補がなくなったので、ステップ１１８で命令群Ａに含まれる命令をすべて処理済とし、図６の２行目のアセンブラコードを生成する。このとき”５：ｌｄ（８，ｓｐ），ｒ３”はメモリアクセス命令であるので命令の置き換えがなされる。すなわち、当該命令の生成要因は自動変数アクセスであるので論理メモリとしてｓｔａｃｋが割り当てられていることがわかる。論理メモリｓｔａｃｋは図８のハードウェア情報によると物理メモリとしてＸメモリを使用するので、Ｘメモリをアクセスするための命令を生成する。同時に論理メモリｓｔａｃｋにはアクセス方法属性としてｃａｃｈｅが指定されているのでキャッシュをアクセスするための属性を命令に付与する。すなわち、”ｌｄｃ，Ｘ（８，ｓｐ），ｒ３”が生成される。
【００７１】
さらに、次のエラーチェックもなされる。
【００７２】
アドレス範囲属性
この場合スタックアクセスであるのでアドレスを特定することができないためアドレス範囲のエラー検出は行われない。
【００７３】
アクセスサイズ属性
命令が”ｌｄ”なので４バイトをロードする命令である。論理メモリｓｔａｃｋは１バイト単位のアクセスが可能であることが図８のハードウェア情報からわかり、４バイトは１バイトの２のべき乗倍であるのでエラーとはならない。
【００７４】
リードライト属性
命令はロード命令であり、論理メモリｓｔａｃｋは”ｒｗ”（リードライト可能）であるので同様にエラーとならない。
【００７５】
ステップ１１９では未処理命令がまだ存在するので、ステップ１１０に戻る。
【００７６】
ステップ１１０で、”４：ｌｄ（ｒ０），ｒ２”が空の状態の命令群Ａに加えられる。ステップ１１１では、候補を得るが、この場合候補がないのでステップ１１８に進み、”４：ｌｄ（ｒ０），ｒ２”を処理済とし、図６の３行目のアセンブラコードを生成する。この場合、並列実行可能な命令は存在しない。このとき”４：ｌｄ（ｒ０），ｒ２”はメモリアクセス命令であるので命令の置き換えがなされる。すなわち、当該命令の生成要因は静的変数Ｇ０のアクセスであり、図４のアプリケーション情報からＧ０は論理メモリｍ０に割り当てられていることがわかる。論理メモリｍ０は図８のハードウェア情報によると物理メモリとしてＹメモリを使用するので、Ｙメモリをアクセスするための命令を生成する。同時に論理メモリｍ０にはアクセス方法属性としてｓｔｒｅａｍが指定されているのでストリームをアクセスするための属性を命令に付与する。すなわち、”ｌｄｓ，Ｙ（ｒ０），ｒ２”が生成される。
【００７７】
さらに、次のエラーチェックもなされる。
【００７８】
アドレス範囲属性
この場合、静的変数Ｇ０のアクセスであり、アドレスであるレジスタｒ０の内容はプログラムの流れからアドレスＧ０であることがわかる。図８において論理メモリＹのアドレス範囲属性は０ｘ０００〜０ｘＦＦＦ（０ｘは１６進数を表す接頭辞）であり、アドレスＧ０はこの範囲に置かれるのでエラーとならない。
【００７９】
アクセスサイズ属性
命令が”ｌｄ”なので４バイトをロードする命令である。論理メモリｍ０は４バイト単位のアクセスが可能であることが図８のハードウェア情報からわかり、サイズは同じなのでエラーとはならない。
【００８０】
リードライト属性
命令はロード命令であり、論理メモリｍ０は”ｒｏ”（リードオンリ）であるので同様にエラーとならない。
【００８１】
ステップ１１９では未処理命令がまだ存在するので、ステップ１１０に戻る。
【００８２】
次にステップ１１０で”６：ｍｕｌｒ２，ｒ３，ｒ４”を空の状態の命令群Ａに加える。ステップ１１１で”８：ｌｄ（４，ｒ０），ｒ２”、”９：ｌｄ（４，ｓｐ），ｒ３”を候補とする。ステップ１１２で”８：ｌｄ（４，ｒ０），ｒ２”が命令Ｂとされ、ステップ１１３で並列実行可否判定処理が行われる。”６：ｍｕｌｒ２，ｒ３，ｒ４”と”８：ｌｄ（４，ｒ０），ｒ２”はデータ依存関係、演算器資源競合、物理メモリ資源競合は存在しないので並列実行可能と判定される。ステップ１１５で”８：ｌｄ（４，ｒ０），ｒ２”が命令群Ａに加えられ候補から外される。ステップ１１６では目的機械の最大並列実行数の３に到達しておらず、ステップ１１７では候補がまだ存在するのでステップ１１２に戻る。
【００８３】
次に、ステップ１１２で”９：ｌｄ（４，ｓｐ），ｒ３”が命令Ｂとされ、ステップ１１３で並列実行可否判定処理が行われる。”６：ｍｕｌｒ２，ｒ３，ｒ４”と”９：ｌｄ（４，ｓｐ），ｒ３”データ依存関係、演算器資源競合、物理メモリ資源競合は存在しない。次に”８：ｌｄ（４，ｒ０），ｒ２”と”９：ｌｄ（４，ｓｐ），ｒ３”ではデータ依存関係、演算器資源競合は存在しないが、物理メモリの競合の可能性がある。
【００８４】
”８：ｌｄ（４，ｒ０），ｒ２”は図４のソースコードにおける整数配列Ｇ０のインデックス１のデータを読み出す命令である。配列Ｇ０は論理メモリｍ０に割り当てられていることが図４のアプリケーション情報から知ることができる。図８のハードウェア情報を参照すると論理メモリｍ０は物理メモリＹと対応付けがなされている。一方、”９：ｌｄ（４，ｓｐ），ｒ３”はスタックフレームアクセス命令であり、自動的に論理メモリｓｔａｃｋに割り当てられ、論理メモリｓｔａｃｋは物理メモリＸへ対応付けられている。したがって、それぞれ使用する物理メモリが異なるので、物理メモリの競合を起しておらず並列実行可能であると判定される。
【００８５】
従来のプログラム処理方法ではこの２命令が同時にアクセス可能なメモリを使用していることを判定する手段がないため、メモリ資源の競合を起していると判定され、並列実行可能とされない。
【００８６】
ステップ１１５で”９：ｌｄ（４，ｓｐ），ｒ３”が命令群Ａに加えられ候補から外される。ステップ１１６では目的機械の最大並列実行数の３に到達したのでステップ１１８に進み、命令群Ａに含まれる命令を処理済とし、図６の４行目のコードを生成する。このとき”８：ｌｄ（４，ｒ０），ｒ２”、”９：ｌｄ（４，ｓｐ），ｒ３”はメモリアクセス命令であるので命令の置き換えがなされる。
【００８７】
まず、”８：ｌｄ（４，ｒ０），ｒ２”の生成要因は静的変数Ｇ０のアクセスであり、図４のアプリケーション情報からＧ０は論理メモリｍ０に割り当てられていることがわかる。論理メモリｍ０は図８のハードウェア情報によると物理メモリとしてＹメモリを使用するので、Ｙメモリをアクセスするための命令を生成する。同時に論理メモリｍ０にはアクセス方法属性としてｓｔｒｅａｍが指定されているのでストリームをアクセスするための属性を命令に付与する。すなわち、”ｌｄｓ，Ｙ（４，ｒ０），ｒ２”が生成される。
【００８８】
さらに、次のエラーチェックもなされる。
【００８９】
アドレス範囲属性
この場合、静的変数Ｇ０のアクセスであり、アドレスであるレジスタｒ０の内容はプログラムの流れからアドレスＧ０＋４であることがわかる。図８において論理メモリＹのアドレス範囲属性は０ｘ０００〜０ｘＦＦＦ（０ｘは１６進数を表す接頭辞）であり、アドレスＧ０はこの範囲に置かれるのでエラーとならない。
【００９０】
アクセスサイズ属性
命令が”ｌｄ”なので４バイトをロードする命令である。論理メモリｍ０は４バイト単位のアクセスが可能であることが図８のハードウェア情報からわかり、サイズは同じなのでエラーとはならない。
【００９１】
リードライト属性
命令はロード命令であり、論理メモリｍ０は”ｒｏ”（リードオンリ）であるので同様にエラーとならない。
【００９２】
次に、”９：ｌｄ（４，ｓｐ），ｒ３”の生成要因は自動変数アクセスであるので論理メモリとしてｓｔａｃｋが割り当てられていることがわかる。論理メモリｓｔａｃｋは図８のハードウェア情報によると物理メモリとしてＸメモリを使用するので、Ｘメモリをアクセスするための命令を生成する。同時に論理メモリｓｔａｃｋにはアクセス方法属性としてｃａｃｈｅが指定されているのでキャッシュをアクセスするための属性を命令に付与する。すなわち、”ｌｄｃ，Ｘ（４，ｓｐ），ｒ３”が生成される。
【００９３】
さらに、次のエラーチェックもなされる。
【００９４】
アドレス範囲属性
この場合スタックアクセスであるのでアドレスを特定することができないためアドレス範囲のエラー検出は行われない。
【００９５】
アクセスサイズ属性
命令が”ｌｄ”なので４バイトをロードする命令である。論理メモリｓｔａｃｋは１バイト単位のアクセスが可能であることが図８のハードウェア情報からわかり、４バイトは１バイトの２のべき乗倍であるのでエラーとはならない。
【００９６】
リードライト属性
命令はロード命令であり、論理メモリｓｔａｃｋは”ｒｗ”（リードライト可能）であるので同様にエラーとならない。
【００９７】
以降、同様に図６の５行目、６行目が生成される。
【００９８】
図７に従来のプログラム処理方法で命令スケジューリング処理した場合のアセンブラコードを示す。図６のコードは図７のコードに対して３行少ない、すなわち、より実行時間が短いことを示している。
【００９９】
つまり、目的機械が並列実行可能な物理メモリを有し、論理メモリの定義とともにハードウェア情報として提供され、また、プログラマがアプリケーションの特性と論理メモリを用いて適切にアプリケーション情報を提供することで、プログラム処理方法における命令スケジューリング処理でこれらを用いることで目的機械のハードウェア資源を最大限に使ったより実行時間の短いアセンブラコードを得ることができる。
【０１００】
また、物理メモリにアドレス範囲、アクセスサイズ、リードライトなどの属性を与えることで、ソースコードのエラーチェックを容易にすることができ、ソフトウェアの生産性を向上することができる。さらに、物理メモリにアクセス方法などの属性を与えることで、ソフトウェアによるより細かいメモリアクセス方法の指定ができ、目的機械をメモリ機能を最大限に活用したプログラミングができる。
【０１０１】
なお、本実施の形態では物理メモリとしてＸメモリ、Ｙメモリ、Ｚメモリを搭載した目的機械を想定したが、より多くのメモリを搭載した目的機械においても同様に効果を出すことができる。
【０１０２】
一方、本実施の形態における実行形式コードではＸメモリ、Ｙメモリ、Ｚメモリの３つのメモリを使用しているが、それ以上のメモリを使うことはないことがわかる。したがって、４つ以上のメモリを搭載した目的機械で使用しないメモリは電源供給やクロック供給を止めることで低消費電力状態にすることで目的機械システムの低消費電力化にもなる。
【０１０３】
【発明の効果】
以上説明したように本発明によれば、目的機械が並列実行可能な物理メモリを有し、論理メモリの定義とともにハードウェア情報として提供され、また、プログラマがアプリケーションの特性と論理メモリを用いて適切にアプリケーション情報を提供することで、プログラム処理方法における命令スケジューリング処理にこれらを用いることで目的機械のハードウェア資源を最大限に使ったより実行時間の短いアセンブラコードを得ることができるという有利な効果が得られる。
【０１０４】
また、物理メモリにアドレス範囲、アクセスサイズ、リードライトなどの属性を与えることで、ソースコードのエラーチェックを容易にすることができ、ソフトウェアの生産性を向上することができ、さらに、物理メモリにアクセス方法などの属性を与えることで、ソフトウェアによるより細かいメモリアクセス方法の指定ができ、目的機械をメモリ機能を最大限に活用したプログラミングができるという有利な効果が得られる。
【０１０５】
また、目的機械で本発明により生成した実行形式コードを実行する際、使用しないメモリがあれば電力供給やクロック供給を停止することで目的機械システムにおける低電力化ができるという有利な効果が得られる。
【図面の簡単な説明】
【図１】本発明の実施の形態におけるプログラム処理方法の処理の流れとファイルの入出力関係を示すフローチャート
【図２】図１に示す本発明の実施の形態における命令スケジューリング処理１０２の処理の流れを示すフローチャート
【図３】図２に示す本発明の実施の形態における並列実行可否判定処理１１３の処理の流れを示すフローチャート
【図４】本発明の実施の形態における説明用のアプリケーション情報を含んだＣ言語ソースコードを示す図
【図５】本発明の実施の形態における命令スケジューリング処理を行う前のアセンブラコードを示す図
【図６】本発明の実施の形態における命令スケジューリング処理後のアセンブラコードを示す図
【図７】本発明の従来のプログラム処理方法による命令スケジューリング処理後のアセンブラコードを示す図
【図８】本発明の実施の形態の図１３の目的機械に対応した説明用のハードウェア情報を示す図
【図９】本発明の実施の形態の説明用のスタックフレーム構成図
【図１０】本発明の実施の形態および従来のプログラム処理方法における説明用のアプリケーション情報を含まないＣ言語ソースコードを示す図
【図１１】本発明の実施の形態のハードウェア情報のフォーマットを示す図
【図１２】本発明の従来のプログラム処理方法の目的機械モデルを示す図
【図１３】本発明の実施の形態の説明用の目的機械を示す図
【符号の説明】
１００コンパイラ上流処理
１０１アセンブラコード生成処理
１０２命令スケジューリング処理
１０３オブジェクトコード生成処理
１０４連結編集処理
２００ソースコードファイル
２０１アプリケーション情報ファイル
２０２ハードウェア情報ファイル
２０３オブジェクトコードファイル
２０４実行形式コードファイル[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a compiler that generates an object code from a high-level language, a program processing method including a linker that concatenates and edits a plurality of object codes and generates an execution format code, a processing apparatus, and a recording medium. Related to optimization technology.
[0002]
[Prior art]
In the conventional program processing method, the compiler analyzes the source code written in a high-level language, optimizes it for the target machine, generates it as object code, and the linker concatenates and edits multiple object codes to execute the executable code. Was generated.
[0003]
When the user describes the source code, the computer model shown in FIG. 12 is assumed instead of assuming a specific configuration of the target machine. In this model, one memory bus is connected to one processor, and a storage device as a main memory is connected. The main memory and the memory bus in this model are conceptual configurations, and the actual configuration can take the configuration shown in FIG. 13, for example. In FIG. 13, three memory buses are connected to the processor, and a storage device is connected to each memory bus.
[0004]
The operation of the conventional program processing method will be briefly described using a specific example of source code described in C language shown in FIG. In the case of the source code of FIG. 10, the assembly code shown in FIG. 7 is generated in the conventional program processing method. That is, even if the target machine can execute memory access instructions (“ld” and “st” instructions) in parallel as shown in FIG. 13, the target machine model in FIG. 12 is assumed, so the memory access instructions are executed in parallel. Cannot determine whether or not.
[0005]
[Problems to be solved by the invention]
In the conventional program processing method described above, there is no step that assumes a specific target machine, and the target machine model absorbs the differences in target machines with various configurations. The problem is that it is not possible to perform optimization considering the above.
[0006]
In addition, the conventional program processing method cannot perform optimization using the characteristics of applications having different characteristics. In other words, in the conventional program processing device, the characteristics of various applications are optimized using only the characteristics that can be expressed within the range of the configuration of the target machine model, and the application information for utilizing the actual configuration of the target machine is used. The problem is that it cannot be optimized.
[0007]
The present invention has been made in view of such problems, and provides a program processing method including a step in which a user who writes source code uses application characteristics and the configuration of a target machine for optimization in program processing. Objective.
[0008]
[Means for Solving the Problems]
In order to solve this problem, the program processing method according to claim 1 comprises:
A program consisting of multiple source codes written in a high-level language CPU A method of converting into executable code, wherein the source code is First Intermediate coat To Convert First Conversion step And , Previous Record First Intermediate coat Do the second Intermediate coat To An optimization step to optimize, and Second Intermediate coat Do Convert to the executable code Second Conversion step And With A hardware information extraction step in which the optimization step extracts configuration information of a parallel-accessible memory of a target machine that executes the executable code, and memory usage characteristics of an application realized by a program from the source code Application information extraction step for extracting information and memory resource contention in determining whether or not parallel execution between instructions is possible. By referring to the configuration information of the memory and the memory usage characteristic information of the application, the memory can be accessed in parallel with each other. A memory resource contention determining step that determines that there is no memory resource contention if it is determined that the access instruction is not, and otherwise determines that there is a memory resource contention Is provided.
[0013]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, embodiments of the present invention will be described with reference to FIGS.
[0014]
FIG. 1 is a flowchart showing a process flow of a program processing method and an input / output relationship of a file according to an embodiment of the present invention.
[0015]
The compiler upstream processing 100 reads the high-level language source code 200 stored in a file format, and performs syntax analysis and semantic analysis to generate an internal format code. Further, as necessary, the internal format code is optimized so that the execution time and code size of the finally generated execution format code are shortened. Note that if the source code 200 includes application information specific to the present invention, which will be described later, it is skipped.
[0016]
The assembler code generation process 101 generates assembler code from the internal format code generated and optimized by the compiler upstream process 100.
[0017]
The compiler upstream processing 100 and the assembler code generation processing 101 are not the main points of the present invention, and are the same as the conventional program processing method except that the application information is skipped, so that the details are omitted.
[0018]
The instruction scheduling process 102 performs instruction scheduling (order rearrangement of instructions) on the assembler code generated in the assembler code generation process 101 based on the analysis of dependency relation between instructions and the analysis of resource conflict. Parallelize for the target machine.
[0019]
The instruction scheduling process 102 reads application information from the application information file 201. The application information file 201 may be in the form of a source code or a file different from the source code. In this embodiment, it is assumed that application information is described in the source code. FIG. 4 shows an example of source code including application information.
[0020]
In FIG. 4, application information is included in the first and second lines. “M0 :: int G0 [2];” in the first row indicates that a two-element integer array G0 is allocated to the logical memory m0. Similarly, the second row indicates that the array G1 is allocated to the logical memory m1. The logical memory will be described later.
[0021]
Further, the instruction scheduling process 102 reads hardware information from the hardware information file 202. The hardware information file 202 may be in the form of a source code or a file different from the source code. In this embodiment, it is assumed that hardware information is described in a file different from the source code. FIG. 8 shows the contents of the hardware information file 202.
[0022]
The hardware information shown in FIG. 8 is prepared for each target machine in the format shown in FIG. The hardware information in FIG. 8 of this embodiment corresponds to the target machine shown in FIG. Hereinafter, each element in the hardware information will be described.
[0023]
The logical memory is a name indicating the attribute shown after “::”, and is given to a static variable (variable determined to be allocated to the memory) in the source code, as shown in FIG. The logical memory stack is automatically given in the case of stack operations such as automatic variable access, and the logical memory main is automatically given in memory accesses other than the above.
[0024]
The physical memory indicates a memory accessible by the target machine, and different names indicate that they can be accessed in parallel with each other. In the present embodiment, as shown in FIG. 10, the processor is connected to three memories: an X memory (main memory), a Y memory, and a Z memory. These memories are memories that can be accessed in parallel, and the instructions for accessing are different. In this embodiment, which memory is to be accessed by an instruction is selected by the instruction. However, it may be selected by an address or an instruction position when executing in parallel.
[0025]
The X memory is used as a main memory and has a cache. The peripheral device registers are mapped to the X memory. Y memory is mounted in ROM and cannot be written. The Z memory is implemented by a RAM but has a small capacity. Although each memory has an independent address space, as another method, a part of one address space may be allocated to each memory.
[0026]
The address range attribute indicates the range of addresses that can be used by variables assigned to the logical memory, and does not exceed the address range that can be used in the physical memory. The address range attribute is used to check the size at the time of variable assignment and detect an error in the source code at the time of compilation.
[0027]
The access unit attribute indicates a unit that can be accessed by the logical memory, and is never smaller than the accessible unit of the physical memory. In general, a memory limited to a certain unit of access can be accessed at a higher speed or can be realized at a lower cost than a memory that supports a plurality of access units. This can be specified when optimizing for a target machine that supports a memory having a limited access unit. Specifically, it is used for alignment at the time of variable assignment and for specifying the size of a memory access instruction.
[0028]
The access method attribute indicates an access method when a variable assigned to the logical memory accesses the physical memory, and includes cache, uncache, and stream. It is not possible to specify access methods other than those supported by physical memory. cache specifies in the instruction that it is stored in the cache if the cache is available, and uncache specifies in the instruction that it is not stored in the cache even if the cache is available. The stream is designated in the instruction to perform prefetching or the like so as to improve efficiency when performing sequential access.
[0029]
The read / write attribute designates whether a variable assigned to the logical memory is readable and writable (rw), readable only (ro), or writable only (wo). When the physical memory is ROM, by specifying ro, it is used to detect errors in the source code for writing to the ROM area at the time of compilation.
[0030]
Details of the instruction scheduling process 102 will be described with reference to FIGS. In order to simplify the description, instruction scheduling uses a basic block as a processing unit. Therefore, there is only one instruction sequence. FIG. 2 is a detailed flowchart of the instruction scheduling process 102. Each step of FIG. 2 will be described.
[0031]
Step 110 selects the first instruction at that time among unprocessed instructions and adds it to the instruction group A in an empty state to be the first element. In the subsequent steps, a process that searches the remaining unprocessed instructions for instructions that can be executed in parallel with the instructions included in the instruction group A and adds them to the instruction group A is performed.
[0032]
Note that the target machine of the present embodiment assumes a superscalar processor system capable of executing up to three instructions in parallel, and assumes that up to two of the three instructions can be executed in parallel. In this case, there is a restriction that the memory access instructions must be different physical memories.
[0033]
In step 111, in order to execute in parallel with each instruction of the instruction group A from unprocessed instructions, a plurality of instructions whose instruction execution order can be changed are selected and set as candidates. In step 112, the first instruction is selected from the candidates and set as instruction B.
[0034]
Step 113 is a process for determining whether or not each instruction of the instruction group A and the instruction B can be executed in parallel. FIG. 3 shows a detailed flowchart of the parallel execution possibility determination process. Hereinafter, a description will be given with reference to FIG.
[0035]
Step 130 determines whether or not there is a data dependency between each instruction in the instruction group A and the instruction B. Data dependency is a relationship between an instruction that defines a result and an instruction that refers to the result, and instructions in this relationship cannot be executed in parallel with each other. If it is determined that there is a data dependency relationship, the parallel execution permission determination process is terminated as parallel execution disabled.
[0036]
In step 131, it is determined whether or not there is a conflict in the computing unit resources of the target machine between each instruction in the instruction group A and the instruction B. If it is determined that there is contention for the arithmetic unit resource, the parallel execution permission determination process is terminated as parallel execution impossible.
[0037]
Step 132 determines whether there is a conflict in the physical memory of the target machine between each instruction in instruction group A and instruction B. When each instruction in the instruction group A or the instruction B is a memory access instruction, the processing differs depending on the factor that generated the instruction.
[0038]
When the memory access instruction is generated by accessing the static variable, the logical memory (m0, m1, io, etc. given to the variable) assigned to the static variable is included in the source code shown in FIG. The physical memory corresponding to the logical memory is read from the hardware information shown in FIG.
[0039]
When a memory access instruction is generated from an automatic variable or stack save / restore, the logical memory is set as stack, and the corresponding physical memory is read from the hardware information shown in FIG. When generated by a case not belonging to the above, the logical memory is set to “main”, and the corresponding physical memory is read from the hardware information shown in FIG.
[0040]
When the instruction B uses the physical memory used by each instruction of the instruction group A, the parallel execution impossibility determination process is terminated as the parallel execution is impossible.
[0041]
If it is determined in step 130 that there is no data dependence, there is no computing resource conflict in step 131, and there is no physical memory conflict in step 132, the parallel execution enable / disable determination process is terminated as parallel execution possible.
[0042]
Returning to FIG. 2, if the determination in step 113 is impossible to execute in parallel, the process proceeds to step 114. If parallel execution is possible, the process proceeds to step 115. In step 114, since it is determined that the instruction B cannot be executed in parallel with each instruction of the instruction group A, the instruction B is excluded from the candidates. Then, the process proceeds to Step 117.
[0043]
In step 115, since it is determined that each instruction of the instruction group A and the instruction B can be executed in parallel, the instruction B is added to the instruction group A and removed from the candidates.
[0044]
In step 116, it is determined that the number of instructions included in the instruction group A has reached the maximum number of instructions that can be executed in parallel by the target machine, and if so, the search for instructions that can be executed in parallel with the instruction group A is terminated. Proceed to step 118. If the number of instructions in the instruction group A has not reached the maximum number of instructions, the process proceeds to step 117.
[0045]
Step 117 determines whether there are more candidates. If it exists, the process returns to step 112 to determine whether the next candidate can be executed in parallel. If no candidate exists, the process proceeds to step 118.
[0046]
In step 118, the instructions included in the instruction group A are processed, and the assembler code after the instruction scheduling process is generated. At that time, the memory access instruction is replaced with an instruction for accessing the physical memory to be used. At the same time, the access method attributes, i.e. cache, uncache, and stream are specified in the instruction.
[0047]
If the address of the memory access instruction can be specified, it is confirmed that the address range attribute is not exceeded, and if it exceeds, error processing is performed. Further, when the access unit of the memory access instruction is smaller than the access unit attribute of the physical memory described above or is not a power of 2, an error process is performed because the access size is not supported by the physical memory. For the read / write attribute, error detection is performed for the access direction of the memory access instruction. In other words, error processing is performed when an instruction that uses a physical memory whose physical memory read / write attribute is “ro” is a store instruction, or when an instruction that uses a physical memory “wo” is a load instruction.
[0048]
In step 119, if there is an unprocessed instruction, the process returns to step 110 to search for a new instruction group A that can be executed in parallel. If there is no unprocessed instruction, the instruction scheduling process is terminated.
[0049]
Returning to FIG. 1, the object code generation process 103 converts the assembler code generated in the instruction scheduling process 102 into an object code, and outputs it as an object code file 203. The concatenation editing process 104 reads a plurality of object code files 203 and performs edit concatenation to generate an execution format code file 204. The object code generation process 103 and the connected editing process 104 are not the main points of the present invention, and are the same as the conventional program processing method, and therefore the details are omitted.
[0050]
(Description of specific operation)
Next, operations of characteristic components of the present program processing method will be described using a specific program. FIG. 4 is a source code described for the present invention, in which application information is added to the conventional C language specification.
[0051]
In FIG. 4, “m0 ::” and “m1 ::” on the first and second lines are application information, which indicates allocation of the integer arrays G0 and G1 to the logical memory. Application information is described by the programmer in consideration of the characteristics of the application. In this case, G0 and G1 are assigned to different logical memories, and the programmer expects to operate as such by using the characteristics of an application that can access G0 and G1 in parallel.
[0052]
Although FIG. 4 shows an example in which an integer array is allocated to a logical memory, it goes without saying that the same applies to types other than integers. The same applies to ordinary variables and pointers that are not arrays. However, in the case of a pointer, if the address of a variable assigned to another logical memory is assigned to a pointer assigned to a certain logical memory, the reference destination of the logical memory is mistaken, so that it should be an error.
[0053]
The function func in FIG. 4 takes integer arguments a and b, multiplies the value of the index 0 of the global integer array G0 and the value of the argument a, stores the result in the index 0 of the global integer array G1, and stores the index 1 of the global integer array G0. And the value of the argument b are multiplied and stored in the index 1 of the global integer array G1. The source code of FIG. 4 is stored as a file in the source code 200.
[0054]
FIG. 5 shows the assembler code after the processing of the compiler upstream processing 100 and the assembler code generation processing 101 is completed for the source code of FIG. 4 stored in the source code 200. This will be briefly described below.
[0055]
The first line is a label indicating the beginning of the function func.
[0056]
The second line stores the address of array G0 in register r0.
[0057]
The third line stores the address of the array G1 in the register r1.
[0058]
In the fourth row, the data stored in the memory is read out and stored in the register r2 using the data stored in the register r0 as an address (value of G0 [0]).
[0059]
In the fifth line, data obtained by adding the offset 8 to the stack pointer sp is read from the memory and stored in the register r3. FIG. 9 shows a configuration diagram of the stack frame. In FIG. 9, during processing of the function func, sp indicates the position in the figure, and the arguments a and b are stored at the positions of offsets 4 and 8, respectively. Accordingly, the fifth line stores the value of the argument a in the register r3.
[0060]
In the sixth line, the values of the register r2 and the register r3 are multiplied and stored in the register r4.
[0061]
The seventh line stores the data stored in the register r4 in the memory using the data stored in the register r1 as an address (stored in G1 [0]).
[0062]
In the eighth line, the data stored in the memory is read out and stored in the register r2 with the value obtained by adding the offset 4 to the data stored in the register r0 (the value of G0 [1]).
[0063]
In the ninth line, data obtained by adding offset 4 to the stack pointer sp is read from the memory and stored in the register r3 (value of argument b).
[0064]
The tenth line multiplies the values of the register r2 and the register r3 and stores them in the register r4.
[0065]
The eleventh row stores the data stored in the register r4 in the memory using the data obtained by adding the offset 4 to the data stored in the register r1 (stored in G1 [1]).
[0066]
The 12th line returns from the function func to the calling program.
[0067]
The assembler code shown in FIG. 5 is optimized (parallelized) by the instruction scheduling process 102 and becomes the assembler code shown in FIG. The instruction scheduling process 102 reads the application information included in the source code of FIG. 4 and the hardware information of FIG. 8 and uses them for the following optimization.
[0068]
Next, a description will be given with reference to FIG. In step 110, "2: mov G0, r0" is first added to the empty instruction group A. Here, “2:” indicates the row number in FIG. The following is the same notation.
[0069]
In step 111, “3: mov G1, r1” and “5: ld (8, sp), r3” are registered as candidates. In step 112, first, “3: mov G1, r1” is set as the instruction B, and in step 113, parallel execution possibility determination processing is performed. Since “2: mov G0, r0” and “3: mov G1, r1” have no data dependency, arithmetic unit resource contention, and physical memory resource contention, it is determined that parallel execution is possible. In step 115, “3: mov G1, r1” is added to the instruction group A and removed from the candidates. In step 116, the maximum parallel execution number 3 of the target machine has not been reached. Similarly, “5: ld (8, sp), r3” is determined to be executable in parallel and added to the instruction group A.
[0070]
Since there are no candidates in step 117, all instructions included in the instruction group A are processed in step 118, and the assembler code in the second line in FIG. 6 is generated. At this time, since “5: ld (8, sp), r3” is a memory access instruction, the instruction is replaced. That is, since the generation factor of the instruction is automatic variable access, it is understood that stack is assigned as a logical memory. Since the logical memory stack uses the X memory as the physical memory according to the hardware information shown in FIG. 8, an instruction for accessing the X memory is generated. At the same time, since cache is specified as the access method attribute in the logical memory stack, an attribute for accessing the cache is given to the instruction. That is, “ld c, X (8, sp), r3” is generated.
[0071]
In addition, the following error check is also performed.
[0072]
Address range attributes
In this case, since the address cannot be specified because it is a stack access, an error in the address range is not detected.
[0073]
Access size attribute
Since the instruction is “ld”, the instruction loads 4 bytes. It can be seen from the hardware information in FIG. 8 that the logical memory stack can be accessed in units of 1 byte, and since 4 bytes is a power of 2 of 1 byte, no error occurs.
[0074]
Read / write attribute
Since the instruction is a load instruction and the logical memory stack is “rw” (read / write available), no error occurs in the same manner.
[0075]
In step 119, since there are still unprocessed instructions, the process returns to step 110.
[0076]
In step 110, “4: ld (r0), r2” is added to the empty instruction group A. In step 111, candidates are obtained. In this case, since there are no candidates, the process proceeds to step 118, where “4: ld (r0), r2” is processed, and the assembler code in the third line in FIG. 6 is generated. In this case, there is no instruction that can be executed in parallel. At this time, since “4: ld (r0), r2” is a memory access instruction, the instruction is replaced. That is, the generation factor of the instruction is access to the static variable G0, and it can be seen from the application information in FIG. 4 that G0 is assigned to the logical memory m0. Since the logical memory m0 uses the Y memory as the physical memory according to the hardware information of FIG. 8, an instruction for accessing the Y memory is generated. At the same time, since stream is specified as an access method attribute in the logical memory m0, an attribute for accessing the stream is given to the instruction. That is, “ld s, Y (r0), r2” is generated.
[0077]
In addition, the following error check is also performed.
[0078]
Address range attributes
In this case, it is an access to the static variable G0, and the contents of the register r0, which is an address, can be seen from the program flow as the address G0. In FIG. 8, the address range attribute of the logical memory Y is 0x000 to 0xFFF (0x is a prefix representing a hexadecimal number), and the address G0 does not cause an error because it is placed in this range.
[0079]
Access size attribute
Since the instruction is “ld”, the instruction loads 4 bytes. It can be seen from the hardware information in FIG. 8 that the logical memory m0 can be accessed in units of 4 bytes. Since the size is the same, no error occurs.
[0080]
Read / write attribute
Since the instruction is a load instruction and the logical memory m0 is “ro” (read only), no error occurs in the same manner.
[0081]
In step 119, since there are still unprocessed instructions, the process returns to step 110.
[0082]
Next, in step 110, “6: mul r2, r3, r4” is added to the empty instruction group A. In step 111, “8: ld (4, r0), r2” and “9: ld (4, sp), r3” are candidates. In step 112, “8: ld (4, r0), r2” is set as the instruction B, and in step 113, parallel execution possibility determination processing is performed. “6: mul r2, r3, r4” and “8: ld (4, r0), r2” are determined to be executable in parallel because there is no data dependency, arithmetic resource competition, and physical memory resource competition. In step 115, “8: ld (4, r0), r2” is added to the instruction group A and removed from the candidates. In step 116, the maximum number of parallel executions of the target machine has not been reached, and in step 117 there are still candidates, so the process returns to step 112.
[0083]
Next, in step 112, “9: ld (4, sp), r3” is set as the instruction B, and in step 113, parallel execution possibility determination processing is performed. “6: mul r2, r3, r4” and “9: ld (4, sp), r3” data dependency, arithmetic unit resource conflict, and physical memory resource conflict do not exist. Next, in “8: ld (4, r0), r2” and “9: ld (4, sp), r3”, there is no data dependency and arithmetic unit resource contention, but there is a possibility of physical memory contention. .
[0084]
“8: ld (4, r0), r2” is an instruction for reading data of index 1 of the integer array G0 in the source code of FIG. It can be known from the application information in FIG. 4 that the array G0 is assigned to the logical memory m0. Referring to the hardware information in FIG. 8, the logical memory m0 is associated with the physical memory Y. On the other hand, “9: ld (4, sp), r3” is a stack frame access instruction and is automatically assigned to the logical memory stack, and the logical memory stack is associated with the physical memory X. Therefore, since the physical memories used are different, it is determined that physical memory contention does not occur and that parallel execution is possible.
[0085]
In the conventional program processing method, since there is no means for determining that a memory that can be accessed simultaneously by these two instructions is used, it is determined that a memory resource contention has occurred, and parallel execution is not possible.
[0086]
In step 115, “9: ld (4, sp), r3” is added to the instruction group A and removed from the candidates. In step 116, since the maximum parallel execution number 3 of the target machine has been reached, the routine proceeds to step 118, where the instructions included in the instruction group A are processed, and the code on the fourth line in FIG. 6 is generated. At this time, since “8: ld (4, r0), r2” and “9: ld (4, sp), r3” are memory access instructions, the instructions are replaced.
[0087]
First, the generation factor of “8: ld (4, r0), r2” is the access of the static variable G0, and it can be seen from the application information in FIG. 4 that G0 is allocated to the logical memory m0. Since the logical memory m0 uses the Y memory as the physical memory according to the hardware information of FIG. 8, an instruction for accessing the Y memory is generated. At the same time, since stream is specified as an access method attribute in the logical memory m0, an attribute for accessing the stream is given to the instruction. That is, “ld s, Y (4, r0), r2” is generated.
[0088]
In addition, the following error check is also performed.
[0089]
Address range attributes
In this case, it is an access to the static variable G0, and it can be seen from the program flow that the content of the register r0, which is an address, is the address G0 + 4. In FIG. 8, the address range attribute of the logical memory Y is 0x000 to 0xFFF (0x is a prefix representing a hexadecimal number), and the address G0 does not cause an error because it is placed in this range.
[0090]
Access size attribute
Since the instruction is “ld”, the instruction loads 4 bytes. It can be seen from the hardware information in FIG. 8 that the logical memory m0 can be accessed in units of 4 bytes. Since the size is the same, no error occurs.
[0091]
Read / write attribute
Since the instruction is a load instruction and the logical memory m0 is “ro” (read only), no error occurs in the same manner.
[0092]
Next, since the generation factor of “9: ld (4, sp), r3” is automatic variable access, it is understood that stack is assigned as a logical memory. Since the logical memory stack uses the X memory as the physical memory according to the hardware information shown in FIG. 8, an instruction for accessing the X memory is generated. At the same time, since cache is specified as the access method attribute in the logical memory stack, an attribute for accessing the cache is given to the instruction. That is, “ld c, X (4, sp), r3” is generated.
[0093]
In addition, the following error check is also performed.
[0094]
Address range attributes
In this case, since the address cannot be specified because it is a stack access, an error in the address range is not detected.
[0095]
Access size attribute
Since the instruction is “ld”, the instruction loads 4 bytes. It can be seen from the hardware information in FIG. 8 that the logical memory stack can be accessed in units of 1 byte, and since 4 bytes is a power of 2 of 1 byte, no error occurs.
[0096]
Read / write attribute
Since the instruction is a load instruction and the logical memory stack is “rw” (read / write available), no error occurs in the same manner.
[0097]
Thereafter, the fifth and sixth lines in FIG. 6 are similarly generated.
[0098]
FIG. 7 shows an assembler code when instruction scheduling is performed by a conventional program processing method. The code of FIG. 6 shows that there are three lines less than the code of FIG. 7, that is, the execution time is shorter.
[0099]
In other words, the target machine has physical memory that can be executed in parallel, and is provided as hardware information together with the definition of logical memory, and the programmer provides appropriate application information using the characteristics and logical memory of the application, By using these in the instruction scheduling process in the program processing method, it is possible to obtain an assembler code having a shorter execution time than the maximum use of the hardware resources of the target machine.
[0100]
In addition, by giving attributes such as an address range, access size, and read / write to the physical memory, it is possible to easily check an error of the source code and improve software productivity. Furthermore, by giving an attribute such as an access method to the physical memory, a more detailed memory access method can be specified by software, and the target machine can be programmed using the memory function to the maximum.
[0101]
In the present embodiment, a target machine equipped with an X memory, a Y memory, and a Z memory is assumed as the physical memory. However, the same effect can be obtained with a target machine equipped with more memories.
[0102]
On the other hand, although the execution format code in the present embodiment uses three memories, that is, an X memory, a Y memory, and a Z memory, it can be seen that no more memory is used. Therefore, a memory that is not used by the target machine equipped with four or more memories is brought into a low power consumption state by stopping power supply or clock supply, thereby reducing the power consumption of the target machine system.
[0103]
【The invention's effect】
As described above, according to the present invention, the target machine has a physical memory that can be executed in parallel and is provided as hardware information together with the definition of the logical memory. By providing the application information, it is possible to obtain an assembler code having a shorter execution time than using the hardware resources of the target machine to the maximum by using these in the instruction scheduling process in the program processing method. can get.
[0104]
In addition, by giving attributes such as address range, access size, and read / write to physical memory, it is possible to facilitate error checking of source code, improve software productivity, By giving an attribute such as an access method, a more detailed memory access method can be specified by software, and an advantageous effect that the target machine can be programmed using the memory function to the maximum is obtained.
[0105]
In addition, when executing the executable code generated by the present invention on the target machine, there is an advantageous effect that if there is a memory that is not used, power supply and clock supply can be stopped to reduce power in the target machine system. .
[Brief description of the drawings]
FIG. 1 is a flowchart showing a process flow of a program processing method and a file input / output relationship according to an embodiment of the present invention;
FIG. 2 is a flowchart showing a processing flow of instruction scheduling processing 102 in the embodiment of the present invention shown in FIG. 1;
FIG. 3 is a flowchart showing a flow of processing of parallel execution availability determination processing 113 in the embodiment of the present invention shown in FIG. 2;
FIG. 4 is a diagram showing C language source code including application information for explanation in the embodiment of the present invention;
FIG. 5 is a diagram showing an assembler code before performing instruction scheduling processing in the embodiment of the present invention;
FIG. 6 is a diagram showing an assembler code after instruction scheduling processing in the embodiment of the present invention;
FIG. 7 is a diagram showing an assembler code after instruction scheduling processing by the conventional program processing method of the present invention;
FIG. 8 is a diagram showing hardware information for explanation corresponding to the target machine in FIG. 13 according to the embodiment of this invention;
FIG. 9 is a configuration diagram of a stack frame for explaining an embodiment of the present invention.
FIG. 10 is a view showing C language source code not including application information for explanation in the embodiment of the present invention and the conventional program processing method;
FIG. 11 is a diagram showing a format of hardware information according to the embodiment of the present invention.
FIG. 12 is a diagram showing a target machine model of a conventional program processing method according to the present invention.
FIG. 13 is a diagram showing a target machine for explaining the embodiment of the present invention;
[Explanation of symbols]
100 Compiler upstream processing
101 Assembler code generation processing
102 Instruction scheduling processing
103 Object code generation processing
104 Concatenated editing processing
200 Source code file
201 Application information file
202 Hardware information file
203 Object code file
204 Executable code file

Claims

A method in which a CPU converts a program composed of a plurality of source codes described in a high-level language into an executable code,
A first conversion steps to convert the source code into a first intermediate code,
Includes and optimization step of optimizing the pre Symbol first intermediate code into a second intermediate code, and a second conversion steps for converting the second intermediate code to the executable code,
A hardware information extraction step in which the optimization step extracts configuration information of parallel accessible memory of a target machine that executes the executable code;
An application information extraction step of extracting memory usage characteristic information of the application realized by the program from the source code;
If you find that the access instruction to the parallel accessible memory one another by reference to the memory usage of the characteristic information of the configuration information of the memory application when determining the memory resource conflict in the determination of the parallel executability between instruction A program processing method comprising: a memory resource contention determination step that determines that there is no memory resource contention, and otherwise determines that there is memory resource contention.

Configuration information of the memory is a memory name identifying the parallel accessible memory, according to claim 1, wherein the memory resource conflict determining step is determining a match / mismatch of the memory name Program processing method.

The memory configuration information includes the address range attribute that can be specified for the memory that is the configuration of the target machine, and the address of the memory access instruction is indicated by the address range attribute of the memory accessed by the memory access instruction in the optimization step. An address range error detecting step for detecting an error outside the address range, and an address range error processing step for performing error processing when the address of the memory access instruction is outside the address range in the address range error detecting step. The program processing method according to claim 1 .

The memory configuration information includes an accessible unit attribute of the memory that is the configuration of the target machine, and the access unit of the memory access instruction is indicated by the access unit attribute of the memory accessed by the memory access instruction in the optimization step. An access unit error detecting step for detecting an error that is smaller than a given access unit or not being a power of 2, and an access unit error processing step for performing error processing when an error is detected in the access unit error detecting step. The program processing method according to claim 1 .

Whether the memory access instruction for the memory that includes the read / write attribute of the memory that is the configuration of the target machine in the memory configuration information and that is indicated as unreadable by the read / write attribute in the optimization step is a read instruction. When an error is detected in the read / write attribute error detecting step for detecting an error that the memory access instruction for the memory indicated as unwritable by the read / write attribute is a write instruction, and in the read / write attribute error detecting step, program processing method according to claim 1, characterized in that it comprises a read-write attribute error detecting step of performing error processing.

An access method that includes a memory access method attribute that is a configuration of a target machine in the configuration information of the memory, and that indicates a memory access instruction for accessing the memory indicated by the access method attribute in the optimization step by the access method attribute program processing method according to claim 1, characterized in that it comprises an access method attribute instruction replacement step of replacing a specified memory access instruction.

The memory resource contention determination includes a logical memory name for identifying set information including any of the memory name and the address range attribute, the access unit attribute, the read / write attribute, and the access method attribute in the memory configuration information. A step of searching for a memory name from the logical memory name and using it for determination of match / mismatch, the address range error detection step, the access unit error detection step, the read / write attribute error detection step, the access method attribute command address range attribute from the logical memory name to one of the steps of the substitution step, the access unit properties, read-write attribute, of claims 1 to 6, characterized in that it comprises the step of searching using any access method attribute The program processing method according to any one of claims.

A memory unused detecting step of detecting that said second executable code further obtained in the conversion step in is not at least one of the plurality of memories is the configuration of the object machine is used, the executable code program processing method according to claim 1, characterized in that it comprises the step of setting the memory at run time is detected as the unused object machine to a low power consumption state.

A program of source code written in a high-level language to a recording medium recording a program for executing the steps on a computer for converting the CPU into executable code, wherein each step,
A first conversion steps to convert the source code into a first intermediate code,
And optimization step of optimizing the first intermediate code into a second intermediate code,
And a second conversion steps for converting the second intermediate code to the executable code further,
A hardware information extraction step in which the optimization step extracts configuration information of parallel accessible memory of a target machine that executes the executable code;
An application information extraction step of extracting memory usage characteristic information of the application realized by the program from the source code;
If you find that the access instruction to the parallel accessible memory one another by reference to the memory usage of the characteristic information of the configuration information of the memory application when determining the memory resource conflict in the determination of the parallel executability between instruction A memory resource contention determination step that determines that there is no memory resource contention; otherwise, determines that there is a memory resource contention.
A recording medium comprising the recording medium.

Each step further includes a logical memory name for identifying set information including any of the memory name and the address range attribute, the access unit attribute, the read / write attribute, and the access method attribute in the memory configuration information, The memory resource conflict determination step includes a step of searching for a memory name from the logical memory name and using it for determination of match / mismatch, the address range error detection step, the access unit error detection step, the read / write attribute error detection step, The step of searching for an address range attribute, an access unit attribute, a read / write attribute, or an access method attribute from the logical memory name is included in any one of the access method attribute instruction replacement steps. Item 10. The recording medium according to Item 9 .

Further as each of the steps, the memory unused detection step of detecting that at least one is not used among the plurality of memory more resulting executable code is the configuration of the object machine in the second conversion steps 10. The recording medium according to claim 9 , further comprising a step of setting the memory of the target machine detected as unused when executing the executable code to a low power consumption state.