JPH0877068A

JPH0877068A - Multiprocessor system and memory allocation optimizing method

Info

Publication number: JPH0877068A
Application number: JP6212665A
Authority: JP
Inventors: Hideaki Hirayama; 秀昭平山
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1994-09-06
Filing date: 1994-09-06
Publication date: 1996-03-22

Abstract

PURPOSE: To enable memory allocation which makes the most of the capability of a cache. CONSTITUTION: The multiprocessor system which operates data on a memory in specific cache line units is equipped with a data recognition means 12 which recognizes data only to be written when a program analysis part 11 which takes an analysis for generating an executable object program for a source program, a cache size recognition means 14 which recognizes the size of a cache line of its system, and a data arranging means 15 which arranges data only to be read on the same cache line where read and write data accessed at the same time with the data only to be read so that plural data accessed by the program at the same time are stored in the cache from the memory at the same time when a code generation part 13 performs code generation according to the result of the analysis.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、キャッシュを持った複
数のプロセッサが接続されたマルチプロセッサシステム
及びメモリアロケーションの最適化方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a multiprocessor system to which a plurality of processors each having a cache are connected and a method for optimizing memory allocation.

【０００２】[0002]

【従来の技術】近年、プロセッサの演算処理の高速化の
割合に比べると、メモリのアクセス処理の高速化の割合
は極めて低く、プロセッサの演算処理が速くなっても、
メモリに対するアクセス処理が速くならないため、シス
テム全体の性能が上がらないという問題があった。2. Description of the Related Art In recent years, the rate of memory access processing is extremely low compared to the rate of processor arithmetic processing, and even if processor arithmetic processing is faster,
There is a problem that the performance of the entire system does not improve because the access processing to the memory does not become faster.

【０００３】この問題を解決するために、キャッシュと
いう技術が存在する。キャッシュはプロセッサ内、ある
いはプロセッサの近くに配置した小容量の高速メモリ
で、プロセッサがアクセスするメモリ上のデータの高速
なバッファとしての役割を果たしている。In order to solve this problem, there is a technique called cache. The cache is a small-capacity high-speed memory arranged in or near the processor, and plays a role as a high-speed buffer for data on the memory accessed by the processor.

【０００４】また、一方でマルチプロセッサ技術が発展
し、キャッシュを持った多数のプロセッサが相互に接続
されるようになってきた。マルチプロセッサシステムに
おいては、多数のプロセッサが各々メモリのコピーをそ
のキャッシュに保持してしまうため、キャッシュ間のデ
ータの整合性をとる必要が出てくる。このデータの整合
性をとる技術がスヌープという機能である。On the other hand, the multiprocessor technology has been developed and many processors each having a cache have been connected to each other. In a multiprocessor system, a large number of processors each hold a copy of memory in its cache, so that it becomes necessary to maintain data consistency between the caches. A technology called "snoop" is a technique for ensuring the consistency of this data.

【０００５】スヌープ機能としては、様々な方式がある
が、何れの方式においても、例えばキャッシュをもつ複
数のプロセッサがメモリ上の共通するデータを順番に更
新すると、対象とするデータを各プロセッサのキャッシ
ュの間で順番に受け渡すことによって一貫性を保つ。There are various methods for the snoop function. In any of the methods, for example, when a plurality of processors having a cache sequentially update common data in a memory, the target data is cached by each processor. Be consistent by passing between in turn.

【０００６】しかし、多数のプロセッサの間でデータを
共有すると、キャッシュ間でのスヌープによりデータの
整合性をとるための操作（データの受け渡し）が増加し
てしまい、スヌープの操作によってプロセッサ間を接続
するバスが飽和してしまい、マルチプロセッサシステム
性能が向上しないという問題があった。However, when data is shared among a large number of processors, snooping between caches increases the number of operations for data consistency (data passing), and the snooping operation connects the processors. However, there was a problem that the multi-processor system performance was not improved due to the saturation of the bus.

【０００７】[0007]

【発明が解決しようとする課題】このように従来のキャ
ッシュを持った複数のプロセッサが接続されたコンピュ
ータシステムにおいては、各プロセッサが持つキャッシ
ュのデータの整合性をとるために、バスのトラフィック
が増大し、マルチプロセッサシステム性能を低下させて
しまう場合があった。In a computer system in which a plurality of processors each having a cache are connected as described above, bus traffic increases in order to ensure consistency of cache data of each processor. However, the multiprocessor system performance may be degraded.

【０００８】本発明は前記のような事情を考慮してなさ
れたもので、キャッシュの能力を活かしたマルチプロセ
ッサシステム及びメモリアロケーションの最適化方法を
提供することを目的とする。The present invention has been made in consideration of the above circumstances, and an object of the present invention is to provide a multiprocessor system and a method for optimizing memory allocation, which make the most of the cache capability.

【０００９】[0009]

【課題を解決するための手段】本発明は、キャッシュを
持った複数のプロセッサが接続されたマルチプロセッサ
システムであって、メモリ上のデータが所定のキャッシ
ュライン単位で操作されるマルチプロセッサシステムに
おいて、原始プログラムに対して実行可能な目的プログ
ラムを作成するための解析を行なう際に読み込みのみの
対象となるデータを認識するデータ認識手段と、自シス
テムにおけるキャッシュラインのサイズを認識するキャ
ッシュサイズ認識手段と、解析の結果に応じてコード生
成を行なう際に、前記データ認識手段及び前記キャッシ
ュサイズ認識手段による認識結果に基づいて、プログラ
ムが同時にアクセスする複数のデータが同時にメモリか
らキャッシュに格納されるように、読み込みのみのデー
タを前記読み込みのみのデータと同時にアクセスされる
読み書きデータが配置される同じキャッシュライン上に
配置するデータ配置手段とを具備したことを特徴とす
る。SUMMARY OF THE INVENTION The present invention is a multiprocessor system in which a plurality of processors each having a cache are connected, in which data on a memory is operated in predetermined cache line units. A data recognizing means for recognizing read-only data when performing an analysis to create an executable target program for the source program, and a cache size recognizing means for recognizing the size of a cache line in the own system. When performing code generation according to the result of analysis, based on the recognition results by the data recognizing means and the cache size recognizing means, a plurality of data accessed by a program at the same time are stored in a cache from a memory , Read only data read above Wherein the Mino write data data to be accessed at the same time is provided with the data arrangement means for arranging the same cache line on which is arranged.

【００１０】また、前記データ配置手段によってデータ
を配置する際に、前記読み込みのみのデータを配置すべ
きキャッシュラインが複数存在する場合に、前記読み込
みのみのデータを複数のキャッシュライン中にコピーし
て配置するコピーデータ配置手段をさらに具備したこと
を特徴とする。Further, when arranging the data by the data arranging means and there are a plurality of cache lines in which the read-only data should be arranged, the read-only data is copied into a plurality of cache lines. It is characterized by further comprising copy data arranging means for arranging.

【００１１】また本発明は、データ構造にロック変数を
付加し、データ構造にアクセスする場合にはロック変数
をセットしてからデータ構造にアクセスし、データ構造
へのアクセスが終わった後にロック変数をリセットする
ことにより、マルチプロセッサ間で共有するデータ構造
を排他制御するマルチプロセッサシステムにおいて、原
始プログラムに対して実行可能な目的プログラムを作成
するための解析を行なう際に、プログラム中で使用され
ているデータ構造を排他制御するロック変数を検索する
ためのロック変数検索手段と、前記ロック変数検索手段
によって検索されたロック変数によって保護されるデー
タ構造を検索するためのデータ構造検索手段と、解析結
果に応じてコード生成を行なう際に、前記ロック変数検
索手段及び前記データ構造検索手段による検索結果に基
づいて、１つのキャッシュライン中にロック変数によっ
て保護されたデータ構造を１個以下しかアロケーション
しないようにするデータ構造アロケーション手段とを具
備したことを特徴とする。According to the present invention, a lock variable is added to a data structure, and when the data structure is accessed, the lock variable is set before the data structure is accessed, and the lock variable is added after the access to the data structure is completed. In a multiprocessor system in which a data structure shared by multiple processors is exclusively controlled by resetting, it is used in a program when performing an analysis to create an executable target program for a source program. A lock variable searching means for searching a lock variable for exclusively controlling a data structure, a data structure searching means for searching a data structure protected by the lock variable searched by the lock variable searching means, and an analysis result. In response to the code generation, the lock variable search means and the data Based on the search result by data structure retrieving means, characterized in that below one of the protected data structure by lock variable only equipped with a data structure allocation means to avoid the allocation in one cache line.

【００１２】また、キャッシュライン中のアロケーショ
ンされたデータ構造以外の部分に、前記データ構造を保
護するロックをセットしてからリセットするまでの間に
アクセスする読み込みのデータのコピーをアロケーショ
ンするコピーデータ配置手段をさらに具備したことを特
徴とする。A copy data arrangement for allocating a copy of read data to be accessed in a portion other than the allocated data structure in the cache line between setting and resetting a lock for protecting the data structure. It is characterized by further comprising means.

【００１３】また、原始プログラムに対して実行可能な
目的プログラムを作成するための解析を行なう際に入出
力処理のバッファを認識するバッファ確認手段と、解析
結果に応じてコード生成を行なう際に、前記バッファ認
識手段による認識結果に基づいて、入出力処理のバッフ
ァを１つのキャッシュライン中に１個以下しかアロケー
ションしないようにするアロケーション手段とを具備し
たことを特徴とする。Further, a buffer confirmation means for recognizing a buffer for input / output processing when performing analysis for creating an executable target program for the source program, and code generation according to the analysis result, And an allocation unit for allocating no more than one buffer for input / output processing in one cache line based on the recognition result by the buffer recognition unit.

【００１４】[0014]

【作用】このような構成によれば、キャッシュを持った
複数のプロセッサが接続されたマルチプロセッサシステ
ムにおいて、そのキャッシュの能力を活かして性能を上
げることが可能となる。With such a configuration, in a multiprocessor system in which a plurality of processors each having a cache are connected, it is possible to improve the performance by utilizing the capacity of the cache.

【００１５】すなわち、読み込みのみのデータを、読み
込みのみのデータと同時にアクセスされる読み書きデー
タが配置されるのと同じキャッシュライン上に配置する
ことにより、一方のデータに対する処理を実行する際に
他方のデータもキャッシュに存在することになるので、
メモリからのデータ転送の必要回数が削減される。That is, by arranging read-only data on the same cache line where read / write data that is accessed at the same time as read-only data is arranged, when executing processing for one data, Since the data will also exist in the cache,
The required number of data transfers from memory is reduced.

【００１６】さらに、読み込みのみのデータを配置すべ
きキャッシュラインが複数存在する場合には、読み書き
データが配置されるキャッシュラインのそれぞれに読み
込みのみのデータをコピーして配置することにより、キ
ャッシュ間でのデータの受け渡し等の処理を低減させる
ことができる。Further, when there are a plurality of cache lines in which the read-only data should be arranged, the read-only data is copied and arranged in each of the cache lines in which the read / write data is arranged, so that the caches are cached. It is possible to reduce the processing such as the delivery of the data.

【００１７】また、マルチプロセッサ間で共有されるデ
ータ構造が排他制御される場合に、ロック変数によって
保護されたデータ構造を１つのキャッシュライン中に１
個以下しかアロケーションしないようにすることで、ロ
ック制御に伴って本来ロックの不要なデータ構造に対す
るアクセスができなくなってしまうことが回避される。Further, when the data structure shared by the multiprocessors is exclusively controlled, the data structure protected by the lock variable is set to 1 in one cache line.
By allocating less than this number, it is possible to prevent the lock control from making it impossible to access a data structure that originally does not require a lock.

【００１８】[0018]

【実施例】以下、図面を参照して本発明の実施例を説明
する。図１は本発明の実施例に係わるコンピュータシス
テムの概略構成を示すブロック図である。図１に示すよ
うに、第１実施例におけるコンピュータシステムは、複
数のプロセッサ１ａ，１ｂ，…と、プロセッサ１ａ，１
ｂ，…に共有される共有メモリ２が、バス３によって接
続されたマルチプロセッサシステム構成となっている。Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram showing a schematic configuration of a computer system according to an embodiment of the present invention. As shown in FIG. 1, the computer system according to the first embodiment includes a plurality of processors 1a, 1b, ...
The shared memory 2 shared by b, ... Has a multiprocessor system configuration connected by a bus 3.

【００１９】本発明が対象とするマルチプロセッサシス
テムは、キャッシュメモリと、プロセッサや共有メモリ
２との間のデータ転送は、所定の単位毎、すなわちキャ
ッシュライン毎に行なう。In the multiprocessor system to which the present invention is applied, data transfer between the cache memory and the processor or the shared memory 2 is performed for each predetermined unit, that is, for each cache line.

【００２０】プロセッサ１ａには、キャッシュ４ａ、最
適化手段５ａが設けられている。最適化手段５ａは、共
有メモリ２に格納された最適化プログラム（コンパイ
ラ）を、プロセッサ１ａ上で実行することによって機能
が実現される。なお、他のプロセッサ１ｂ，…にも、同
様にして、最適化手段５ｂ，…を設けて実行させること
ができる。The processor 1a is provided with a cache 4a and an optimizing means 5a. The function of the optimizing means 5a is realized by executing the optimizing program (compiler) stored in the shared memory 2 on the processor 1a. The other processors 1b, ... Can be similarly provided with the optimizing means 5b ,.

【００２１】最適化手段５ａは、図２に示すような機能
によって構成されている。図２に示すように、最適化手
段５ａは、プログラム解析部１１、データ認識手段１
２、コード生成部１３、キャッシュラインサイズ認識手
段１４、及びデータ配置手段１５によって構成されてい
る。The optimizing means 5a has a function as shown in FIG. As shown in FIG. 2, the optimizing means 5a includes a program analyzing section 11 and a data recognizing means 1.
2, a code generation unit 13, a cache line size recognition unit 14, and a data arrangement unit 15.

【００２２】プログラム解析部１１は、所定のプログラ
ム言語を用いて記述された原始プログラムに対して、コ
ンピュータで実行可能な目的プログラムを作成するため
の解析を行なうものである。プログラム解析部１１は、
原始プログラムに対する字句解析、構文解析、意味解析
等を実行する。The program analysis unit 11 analyzes a source program written in a predetermined programming language to create a computer-executable target program. The program analysis unit 11
Performs lexical analysis, syntactic analysis, semantic analysis, etc. on the source program.

【００２３】データ認識手段１２は、プログラム解析部
１１によって解析された結果に基づいて、プログラム中
で使用されるデータのうち読み込みのみの対象となるデ
ータの存在を認識するものである。The data recognizing means 12 recognizes the existence of the data to be read only among the data used in the program, based on the result analyzed by the program analysis section 11.

【００２４】コード生成部１３は、プログラム解析部１
１における解析結果、及びデータ認識手段１２による認
識結果に応じて、目的プログラムとなるコードを生成す
るものである。The code generation unit 13 includes the program analysis unit 1
A code serving as a target program is generated according to the analysis result in 1 and the recognition result by the data recognition means 12.

【００２５】キャッシュラインサイズ認識手段１４は、
プログラムが動作する自システムのキャッシュラインサ
イズを認識するものである。データ配置手段１５は、コ
ード生成部１３によって生成されるコードのうち、デー
タに関して共有メモリ２への配置を決定するものであ
る。データ配置手段１５は、データ認識手段１２によっ
て認識された読み込みのみの対象とするデータを、その
読み込みのみのデータと同時にアクセスされる読み書き
が行われるデータが配置されるキャッシュラインと同じ
キャッシュラインに配置する。これにより、同時にアク
セスされるデータが、同時に共有メモリ２からキャッシ
ュに持ってこられるようになる。The cache line size recognizing means 14 is
It recognizes the cache line size of the local system where the program operates. The data arranging unit 15 determines the arrangement of the data in the shared memory 2 among the codes generated by the code generating unit 13. The data arranging means 15 arranges the read-only data recognized by the data recognizing means 12 in the same cache line as the cache line in which the read / write data accessed simultaneously with the read-only data is arranged. To do. As a result, simultaneously accessed data can be simultaneously brought from the shared memory 2 to the cache.

【００２６】図２に示す構成において、プログラム解析
部１１及びコード生成部１３は、コンパイラを構成す
る。さらに、コンパイラには、データ認識手段１２及び
データ配置手段１５による機能が付加されている。ま
た、キャッシュラインサイズ認識手段１４は、例えばコ
ンピュータシステムにおけるＯＳ（オペレーティングシ
ステム）の機能によって実現されるものとする。In the configuration shown in FIG. 2, the program analysis section 11 and the code generation section 13 constitute a compiler. Further, the compiler is provided with the functions of the data recognizing means 12 and the data arranging means 15. Further, the cache line size recognition means 14 is realized by the function of the OS (operating system) in the computer system, for example.

【００２７】次に、第１実施例の動作について説明す
る。最適化手段５ａは、原始プログラムに対してコンパ
イルを行なう際、システムに実装されたキャッシュを有
効に利用できるように最適化を行なう。まず、プログラ
ム解析部１１は、図示せぬ記憶装置に格納されているコ
ンパイルの対象とする原始プログラムについて解析を行
なう。この際、データ認識手段１２は、読み込みのみに
使用されるデータを識別している。データ認識手段１２
によって認識されたデータは、データ配置手段１５によ
るデータ配置の際に考慮される。Next, the operation of the first embodiment will be described. The optimizer 5a optimizes the cache implemented in the system when the source program is compiled. First, the program analysis unit 11 analyzes a source program stored in a storage device (not shown) to be compiled. At this time, the data recognition means 12 identifies the data used only for reading. Data recognition means 12
The data recognized by is considered in the data arrangement by the data arrangement means 15.

【００２８】コード生成部１３は、プログラム解析部１
１による解析結果に基づいてコードを生成する。この
際、コード生成部１３は、キャッシュラインサイズ認識
手段１４により識別される自システムのキャッシュライ
ンサイズを考慮して、データに関するコード生成を行な
う。The code generation unit 13 includes the program analysis unit 1
A code is generated based on the analysis result of 1. At this time, the code generation unit 13 generates a code relating to data in consideration of the cache line size of its own system identified by the cache line size recognition unit 14.

【００２９】図３は本発明の第１実施例の最適化手段５
ａにおけるデータ配置手段１５の処理の流れを示すフロ
ーチャートである。データ配置手段１５は、コード生成
部１３によって生成されたデータに関するコードを、キ
ャッシュを有効に利用できるようにキャッシュラインを
考慮してデータを配置する。FIG. 3 shows the optimizing means 5 of the first embodiment of the present invention.
It is a flow chart which shows a flow of processing of data arrangement means 15 in a. The data arranging unit 15 arranges the code relating to the data generated by the code generation unit 13 in consideration of the cache line so that the cache can be effectively used.

【００３０】まず、データ配置手段１５は、読み書きデ
ータをキャッシュライン中に配置する（ステップＡ
１）。その後、データ配置手段１５は、ステップＡ１に
おいて配置が完了した読み書きデータの周辺（プログラ
ム実行過程の前後近く）でアクセスする読み込みのみの
データ（データ認識手段１２によって認識されたデー
タ）が存在するか否かを調べる（ステップＡ２）。First, the data arranging means 15 arranges the read / write data in the cache line (step A).
1). After that, the data arranging means 15 determines whether or not there is read-only data (data recognized by the data recognizing means 12) to be accessed around the read / write data that has been arranged in step A1 (near before and after the program execution process). Is checked (step A2).

【００３１】ここで、対象とするデータが存在する場合
には、データ配置手段１５は、その読み込みのみのデー
タがまだ配置されているか否かを調べる（ステップＡ
３）。まだ配置されていない場合には、データ配置手段
１５は、その読み込みのみのデータを、ステップＡ１で
配置された読み書きデータ（現在対象としている読み込
みのみのデータの周辺にあるもの）と同じキャッシュラ
イン中に配置する（ステップＡ４）。If the target data exists, the data arranging means 15 checks whether or not the read-only data is still arranged (step A).
3). If it is not arranged yet, the data arranging means 15 arranges the read-only data in the same cache line as the read / write data arranged in step A1 (those around the currently read-only data). (Step A4).

【００３２】なお、ステップＡ３において、既に読み込
みのみのデータが何れかのキャッシュラインに配置され
ていると判別された場合には、データ配置手段１５は、
その読み込みのみのデータの配置を行なわない。If it is determined in step A3 that read-only data has already been placed in any of the cache lines, the data placement means 15
Do not arrange the read-only data.

【００３３】こうして、全ての読み書きデータについ
て、それぞれの周辺に読み込みのみのデータがあれば、
このデータを読み書きのデータと同じキャッシュライン
中に配置する。Thus, for all the read / write data, if there is read-only data in the vicinity of each,
This data is placed in the same cache line as the read / write data.

【００３４】図４は共有メモリ２のキャッシュラインに
データが配置される様子を具体的に説明するための図で
ある。なお、図４（ａ）に示すプログラムは、便宜上、
原始プログラムの形式によって示している。FIG. 4 is a diagram for specifically explaining how data is arranged in the cache line of the shared memory 2. Note that the program shown in FIG.
It is shown by the format of the source program.

【００３５】図３（ａ）に示すプログラム中の関数１
は、まず「データＡを参照」して、続いて「データＢを
更新」するものである。ここでデータＡは読み込みのみ
に使用されるものとする。Function 1 in the program shown in FIG.
First refers to "reference data A" and then "updates data B". Here, the data A is used only for reading.

【００３６】この場合、読み書きされるデータＢについ
ては、ステップＡ１において所定のキャッシュラインに
配置される。データＡは、ステップＡ２において、「デ
ータＢを更新」する処理の周辺に存在するものとして識
別され、ステップＡ４において、図３（ｂ）に示すよう
に、データＢが配置されたキャッシュラインの残りの部
分に配置される。In this case, the data B to be read / written is placed in a predetermined cache line in step A1. The data A is identified as existing around the process of "updating the data B" in step A2, and in step A4, as shown in FIG. 3 (b), the rest of the cache line in which the data B is arranged. Is placed in the part.

【００３７】データＡとデータＢを同じキャッシュライ
ン中に配置することにより、プログラムを実行した際
（図４（ａ）に示す処理）、「データＡを参照」した時
にデータＡが共有メモリ２からキャッシュに転送され
て、同時に同一のキャッシュライン中に配置されている
データＢもキャッシュに転送されて格納される。従っ
て、「データＢを更新」する処理を実行する際、キャッ
シュには既にデータＢが存在するので、共有メモリ２か
ら改めてデータを転送する必要がない。すなわち、キャ
ッシュを有効に利用することができる。By arranging the data A and the data B in the same cache line, when the program is executed (the process shown in FIG. 4A), the data A is transferred from the shared memory 2 when the "reference to the data A" is made. The data B transferred to the cache and simultaneously arranged in the same cache line is also transferred to and stored in the cache. Therefore, when executing the process of “updating the data B”, since the data B already exists in the cache, it is not necessary to transfer the data from the shared memory 2 again. That is, the cache can be effectively used.

【００３８】次に、本発明の第２実施例について説明す
る。第２実施例においては、第１実施例と同様に図１に
示すコンピュータシステム上で、第２実施例による最適
化手段４０ａ，４０ｂ，…が設けられている。Next, a second embodiment of the present invention will be described. In the second embodiment, as in the first embodiment, the optimizing means 40a, 40b, ... According to the second embodiment are provided on the computer system shown in FIG.

【００３９】最適化手段４０ａは、図５に示すような機
能によって構成されている。図５に示すように、最適化
手段４０ａは、プログラム解析部４１、データ認識手段
４２、コード生成部４３、キャッシュラインサイズ認識
手段４４、データ配置手段４５、及びコピーデータ配置
手段４６によって構成されている。The optimizing means 40a has a function as shown in FIG. As shown in FIG. 5, the optimizing unit 40a includes a program analyzing unit 41, a data recognizing unit 42, a code generating unit 43, a cache line size recognizing unit 44, a data arranging unit 45, and a copy data arranging unit 46. There is.

【００４０】プログラム解析部４１は、所定のプログラ
ム言語を用いて記述された原始プログラムに対して、コ
ンピュータで実行可能な目的プログラムを作成するため
の解析を行なうものである。プログラム解析部４１は、
原始プログラムに対する字句解析、構文解析、意味解析
等を実行する。The program analysis unit 41 analyzes a source program written in a predetermined programming language to create a computer-executable target program. The program analysis unit 41
Performs lexical analysis, syntactic analysis, semantic analysis, etc. on the source program.

【００４１】データ認識手段４２は、プログラム解析部
４１によって解析された結果に基づいて、プログラム中
で使用されるデータのうち読み込みのみの対象となるデ
ータの存在を認識するものである。The data recognizing means 42 recognizes the presence of the data to be read only among the data used in the program, based on the result analyzed by the program analysis unit 41.

【００４２】コード生成部４３は、プログラム解析部４
１における解析結果、及びデータ認識手段４２による認
識結果に応じて、目的プログラムとなるコードを生成す
るものである。The code generation section 43 includes a program analysis section 4
A code serving as a target program is generated according to the analysis result in 1 and the recognition result by the data recognition means 42.

【００４３】キャッシュラインサイズ認識手段４４は、
プログラムが動作する自システムのキャッシュラインサ
イズを認識するものである。データ配置手段４５は、コ
ード生成部４３によって生成されるコードのうち、デー
タに関して共有メモリ２への配置を決定するものであ
る。データ配置手段４５は、データ認識手段４２によっ
て認識された読み込みのみの対象とするデータを、その
読み込みのみのデータと同時にアクセスされる読み書き
が行われるデータが配置されるキャッシュラインと同じ
キャッシュラインに配置する。これにより、同時にアク
セスされるデータが、同時に共有メモリ２からキャッシ
ュに持ってこられるようになる。The cache line size recognition means 44 is
It recognizes the cache line size of the local system where the program operates. The data placement means 45 determines the placement of the data in the shared memory 2 among the codes generated by the code generation unit 43. The data arranging unit 45 arranges the read-only data recognized by the data recognizing unit 42 on the same cache line as the cache line on which the read / write data that is accessed simultaneously with the read-only data is arranged. To do. As a result, simultaneously accessed data can be simultaneously brought from the shared memory 2 to the cache.

【００４４】コピーデータ配置手段４６は、データ配置
手段４５によって、ある読み込みのみの対象となるデー
タが既に他のキャッシュラインに配置されている場合
に、その読み込みのみのデータのコピーを作成して、そ
のデータと同時にアクセスされる読み書きが行なわれる
データが配置された同じキャッシュライン中に配置され
るようにするものである。The copy data arranging means 46 makes a copy of the read only data when the data arranging means 45 has already arranged the data to be read only in another cache line, The data is arranged in the same cache line in which data to be read / written which is accessed simultaneously with the data is arranged.

【００４５】次に、第２実施例の動作について説明す
る。最適化手段４０ａは、原始プログラムに対してコン
パイルを行なう際、システムに実装されたキャッシュを
有効に利用できるように最適化を行なう。まず、プログ
ラム解析部４１は、図示せぬ記憶装置に格納されている
コンパイルの対象とする原始プログラムについて解析を
行なう。この際、データ認識手段４２は、読み込みのみ
に使用されるデータを識別している。データ認識手段４
２によって認識されたデータは、データ配置手段１５及
びコピーデータ配置手段４６によるデータ配置の際に考
慮される。Next, the operation of the second embodiment will be described. The optimizer 40a optimizes the cache implemented in the system when the source program is compiled. First, the program analysis unit 41 analyzes a source program to be compiled, which is stored in a storage device (not shown). At this time, the data recognition unit 42 identifies the data used only for reading. Data recognition means 4
The data recognized by 2 is taken into consideration in the data arrangement by the data arrangement unit 15 and the copy data arrangement unit 46.

【００４６】コード生成部４３は、プログラム解析部４
１による解析結果に基づいてコードを生成する。この
際、コード生成部４３は、キャッシュラインサイズ認識
手段４４により識別された自システムのキャッシュライ
ンサイズを考慮して、データに関するコード生成を行な
う。The code generation section 43 includes a program analysis section 4
A code is generated based on the analysis result of 1. At this time, the code generation unit 43 takes into consideration the cache line size of the own system identified by the cache line size recognition unit 44, and performs code generation regarding data.

【００４７】図６は本発明の第２実施例の最適化手段４
０ａにおけるデータ配置手段４５及びコピーデータ配置
手段４６の処理の流れを示すフローチャートである。デ
ータ配置手段４５は、コード生成部４３によって生成さ
れたデータに関するコードを、キャッシュを有効に利用
できるようにキャッシュラインを考慮してデータを配置
する。FIG. 6 shows the optimizing means 4 of the second embodiment of the present invention.
It is a flow chart which shows a flow of processing of data arrangement means 45 and copy data arrangement means 46 in 0a. The data arranging unit 45 arranges the code relating to the data generated by the code generating unit 43 in consideration of the cache line so that the cache can be effectively used.

【００４８】まず、データ配置手段４５は、読み書きデ
ータをキャッシュライン中に配置する（ステップＢ）。
その後、データ配置手段４５は、ステップＢ１において
配置が完了した読み書きデータの周辺（プログラム実行
過程の前後近く）でアクセスする読み込みのみのデータ
（データ認識手段４２によって認識されたデータ）が存
在するか否かを調べる（ステップＢ２）。First, the data placement means 45 places the read / write data in the cache line (step B).
After that, the data arranging unit 45 determines whether or not there is read-only data (data recognized by the data recognizing unit 42) to be accessed around the read / write data that has been arranged in step B1 (near the front and rear of the program execution process). Is checked (step B2).

【００４９】ここで、対象とするデータが存在する場合
には、データ配置手段４５は、その読み込みのみのデー
タが、他のキャッシュラインにまだ配置されているか否
かを調べる（ステップＢ３）。Here, if the target data exists, the data arranging means 45 checks whether or not the read-only data is still arranged in another cache line (step B3).

【００５０】まだ配置されていない場合には、データ配
置手段４５は、その読み込みのみのデータを、ステップ
Ｂ１で配置された読み書きデータ（現在対象としている
読み込みのみのデータの周辺にあるもの）と同じキャッ
シュライン中に配置する（ステップＢ４）。If the data is not arranged yet, the data arrangement means 45 makes the read-only data the same as the read / write data arranged in step B1 (the data present in the periphery of the currently read-only data). It is placed in the cache line (step B4).

【００５１】なお、ステップＢ３において、既に読み込
みのみのデータが何れかのキャッシュラインに配置され
ていると判別された場合には、コピーデータ配置手段４
６は、その読み込みのみのデータのコピーを作成する
（ステップＢ５）。If it is determined in step B3 that read-only data has already been placed in any of the cache lines, copy data placement means 4
6 makes a copy of the read-only data (step B5).

【００５２】そして、コピーデータ配置手段４６は、コ
ピーによって作成したデータを、そのデータと同時にア
クセスされる読み書きが行なわれるデータが配置される
のと同じキャッシュライン中に配置する（ステップＢ
６）。Then, the copy data arranging means 46 arranges the data created by copying in the same cache line where the data to be read / written which is accessed simultaneously with the data is arranged (step B).
6).

【００５３】図７は共有メモリ２のキャッシュラインに
データが配置される様子を具体的に説明するための図で
ある。なお、図７（ａ）に示すプログラムは、便宜上、
原始プログラムの形式によって示している。FIG. 7 is a diagram for specifically explaining how data is arranged in the cache line of the shared memory 2. Note that the program shown in FIG.
It is shown by the format of the source program.

【００５４】図７（ａ）に示すプログラム中の関数１
は、まず「データＡを参照」して、続いて「データＢを
更新」するもので、関数２は、まず「データＡを参照」
して、続いて「データＣを更新」するものである。ここ
でデータＡは読み込みのみに使用されるものとする。Function 1 in the program shown in FIG. 7 (a)
First refers to “data A” and then “updates data B”. Function 2 first refers to “data A”
Then, "data C is updated" subsequently. Here, the data A is used only for reading.

【００５５】この場合、読み書きされるデータＢ及びデ
ータＣについては、ステップＢ１において、それぞれ所
定のキャッシュラインに配置される。関数１のデータＡ
は、ステップＢ２において、「データＢを更新」する処
理の周辺に存在するものとして識別され、ステップＢ４
において、図７（ｂ）に示すように、データＢが配置さ
れたキャッシュラインの残りの部分に配置される。In this case, the data B and the data C to be read / written are placed in the predetermined cache lines in step B1. Function 1 data A
Are identified in step B2 as being in the vicinity of the "update data B" process, and step B4
In FIG. 7, as shown in FIG. 7B, the data B is placed in the remaining portion of the cache line.

【００５６】さらに、ステップＢ３において、データＡ
が既に他のキャッシュラインに配置されていると判別さ
れるので、データＡのコピーを作成して、データＡと同
時にアクセスされる読み書きが行なわれるデータ、すな
わちデータＣが配置されたキャッシュラインの残りの部
分に配置する。Further, in step B3, the data A
Is determined to have already been placed in another cache line, a copy of data A is made and the data read / written that is accessed simultaneously with data A, that is, the rest of the cache line in which data C is placed. Place in the part of.

【００５７】データＡとデータＢを同じキャッシュライ
ン中に配置することにより、プログラムを実行した際
（図７（ａ）に示す関数１の処理）、「データＡを参
照」した時にデータＡが共有メモリ２からキャッシュに
転送されて、同時に同一のキャッシュライン中に配置さ
れているデータＢもキャッシュに転送されて格納され
る。また、関数２の処理でデータＡのコピーを参照した
時に、データＡのコピーと同一キャッシュライン中に配
置されているデータＣも、キャッシュに転送されて格納
される。従って、「データＣを更新」する処理を実行す
る際、キャッシュには既にデータＣが存在するので、共
有メモリ２から改めてデータを転送する必要がない。ま
た、読み込みのみのデータがコピーされて異なるキャッ
シュラインのそれぞれに配置されているので、データＢ
が配置されたキャッシュラインが、あるプロセッサのキ
ャッシュに存在しても、このデータを別のプロセッサの
キャッシュに転送する等の処理が不要となる。すなわ
ち、キャッシュを有効に利用することができる。By arranging the data A and the data B in the same cache line, when the program is executed (the processing of the function 1 shown in FIG. 7A), the data A is shared when the "reference to the data A" is made. The data B transferred from the memory 2 to the cache and simultaneously arranged in the same cache line are also transferred to the cache and stored. Further, when the copy of the data A is referred to in the processing of the function 2, the data C arranged in the same cache line as the copy of the data A is also transferred and stored in the cache. Therefore, when the process of “updating the data C” is executed, the data C already exists in the cache, and it is not necessary to transfer the data from the shared memory 2 again. Also, since the read-only data is copied and placed in different cache lines, the data B
Even if the cache line in which is arranged exists in the cache of a certain processor, the processing of transferring this data to the cache of another processor becomes unnecessary. That is, the cache can be effectively used.

【００５８】次に、本発明の第３実施例について説明す
る。第３実施例においては、第１実施例と同様に図１に
示すコンピュータシステム上で、第３実施例による最適
化手段７０ａ，７０ｂ，…が設けられている。Next, a third embodiment of the present invention will be described. In the third embodiment, the optimizing means 70a, 70b, ... According to the third embodiment are provided on the computer system shown in FIG. 1 as in the first embodiment.

【００５９】一般に、マルチプロセッサシステムでは、
共有メモリ２中の共用データの排他制御の方法としてロ
ック操作がある。ロック操作では、ある共有データに対
して共有メモリ２上にロック変数を用意し、共有データ
を処理する前にロック変数を確保し、共有データをアク
セスした後でロック変数をクリアする。Generally, in a multiprocessor system,
There is a lock operation as a method of exclusive control of shared data in the shared memory 2. In the lock operation, a lock variable is prepared on the shared memory 2 for a certain shared data, the lock variable is secured before the shared data is processed, and the lock variable is cleared after the shared data is accessed.

【００６０】プロセッサ上で実行されるプロセスは、ロ
ック変数を確保しようとする際に、既に他のプロセス
が、そのロック変数を確保している場合には、そのロッ
ク変数がクリアされるまで待ち状態となり、クリアされ
ると直ちにロック変数を確保する。こうして、複数のプ
ロセッサが同時に同じデータにアクセスして、データが
不整合になることを防いでいる。第３実施例におけるコ
ンピュータシステムは、このような排他制御を行なうも
のとする。When the process executed on the processor tries to secure the lock variable, if another process has already secured the lock variable, the process waits until the lock variable is cleared. , And the lock variable is secured as soon as it is cleared. This prevents multiple processors from accessing the same data at the same time, resulting in inconsistent data. The computer system of the third embodiment performs such exclusive control.

【００６１】最適化手段７０ａは、図７に示すような機
能によって構成されている。図７に示すように、最適化
手段７０ａは、プログラム解析部７１、ロック変数検索
手段７２、データ構造検索手段７３、コード生成部７
４、及びデータ構造アロケーション手段７５によって構
成されている。The optimizing means 70a has a function as shown in FIG. As shown in FIG. 7, the optimizing unit 70a includes a program analyzing unit 71, a lock variable searching unit 72, a data structure searching unit 73, and a code generating unit 7.
4 and data structure allocation means 75.

【００６２】プログラム解析部７１は、所定のプログラ
ム言語を用いて記述された原始プログラムに対して、コ
ンピュータで実行可能な目的プログラムを作成するため
の解析を行なうものである。プログラム解析部７１は、
原始プログラムに対する字句解析、構文解析、意味解析
等を実行する。The program analysis unit 71 analyzes a source program written in a predetermined programming language to create a computer-executable target program. The program analysis unit 71
Performs lexical analysis, syntactic analysis, semantic analysis, etc. on the source program.

【００６３】ロック変数検索手段７２は、プログラム解
析部７１によって解析された結果に基づいて、プログラ
ム中で使用されているロック変数を検索する。データ構
造検索手段７３は、プログラム解析部７１によって解析
されるプログラム中で、ロック変数検索手段７２によっ
て検索されたロック変数によって保護されるデータ構造
を検索するものである。The lock variable retrieval means 72 retrieves the lock variable used in the program based on the result analyzed by the program analysis section 71. The data structure retrieving unit 73 retrieves a data structure protected by the lock variable retrieved by the lock variable retrieving unit 72 in the program analyzed by the program analysis unit 71.

【００６４】コード生成部７４は、プログラム解析部７
１における解析結果に応じて、目的プログラムとなるコ
ードを生成するものである。データ構造アロケーション
手段７５は、プログラム解析部７１（ロック変数検索手
段７２、データ構造検索手段７３）で解析されたプログ
ラムをもとに、１つのキャッシュライン中に、ロック変
数で保護されたデータ構造が１個以下しか配置されない
ようにアロケーションを行なうものである。The code generating section 74 includes a program analyzing section 7
A code to be a target program is generated according to the analysis result in 1. Based on the program analyzed by the program analysis unit 71 (lock variable search unit 72, data structure search unit 73), the data structure allocation unit 75 stores the data structure protected by the lock variable in one cache line. The allocation is performed so that only one or less is arranged.

【００６５】図８に示す構成において、プログラム解析
部７１及びコード生成部７４は、コンパイラを構成す
る。プログラム解析部７１には、さらにロック変数検索
手段７２及びデータ構造検索手段７３による機能が付加
されている。In the configuration shown in FIG. 8, the program analysis unit 71 and the code generation unit 74 form a compiler. The program analysis unit 71 is further provided with the functions of the lock variable search means 72 and the data structure search means 73.

【００６６】次に、第３実施例の動作について、図９に
示すフローチャートを参照しながら説明する。まず、プ
ログラム解析部７１は、コンパイルされる原始プログラ
ムについて解析を行なう。この際、ロック変数検索手段
７２は、プログラム中で使用されているロック変数を検
索する（ステップＣ１）。また、データ構造検索手段７
３は、ロック変数検索手段７２によって検索されたロッ
ク変数によって保護されるデータ構造を検索する（ステ
ップＣ２）。Next, the operation of the third embodiment will be described with reference to the flow chart shown in FIG. First, the program analysis unit 71 analyzes a source program to be compiled. At this time, the lock variable retrieval means 72 retrieves the lock variable used in the program (step C1). Also, the data structure search means 7
3 retrieves the data structure protected by the lock variable retrieved by the lock variable retrieval means 72 (step C2).

【００６７】コード生成部７４は、プログラム解析部７
１による解析結果に基づいてコードを生成する。データ
構造アロケーション手段７５は、コード生成部７４によ
って生成されたデータに関するコードに対して、ステッ
プＣ２において検索されたデータ構造を、１つのキャッ
シュライン中にロック変数で保護されたデータ構造が１
個以下しか配置されないようにアロケーションを行なう
（ステップＣ３）。The code generator 74 is connected to the program analyzer 7
A code is generated based on the analysis result of 1. The data structure allocating means 75 sets the data structure searched in step C2 for the code relating to the data generated by the code generation unit 74 to the data structure protected by the lock variable in one cache line.
Allocation is performed so that only less than or equal to the number is arranged (step C3).

【００６８】図１０は共有メモリ２のキャッシュライン
にデータが配置される様子を具体的に説明するための図
である。なお、図１０（ａ）に示すプログラムは、便宜
上、原始プログラムの形式によって示している。FIG. 10 is a diagram for specifically explaining how data is arranged in the cache line of the shared memory 2. Note that the program shown in FIG. 10A is shown in the format of a source program for convenience.

【００６９】図１０（ａ）に示すプログラム中の関数１
は、まず「ロック変数Ａをロック」して、「データ構造
Ｂを更新」し、「ロック変数Ａをアンロック」する。続
いて「ロック変数Ｃをロック」して、「データ構造Ｄを
更新」し、「ロック変数Ｃをアンロック」するものであ
る。Function 1 in the program shown in FIG.
First "locks the lock variable A", "updates the data structure B", and "unlocks the lock variable A". Subsequently, the "lock variable C is locked", the "data structure D is updated", and the "lock variable C is unlocked".

【００７０】この場合、ステップＣ１においてロック変
数Ａ及びロック変数Ｃが検索され、ステップＣ２におい
て、異なるロック変数Ａ，Ｃでそれぞれ保護されたデー
タ構造Ｂ及びＤが検索される。データ構造アロケーショ
ン手段７５は、ステップＣ３において、図９（ｂ）に示
すように、データ構造Ｂ及びＤを各々異なるキャッシュ
ライン中に配置される。In this case, the lock variable A and the lock variable C are retrieved in step C1, and the data structures B and D protected by different lock variables A and C are retrieved in step C2. In step C3, the data structure allocation means 75 arranges the data structures B and D in different cache lines, as shown in FIG. 9B.

【００７１】データ構造Ｂとデータ構造Ｄを異なるキャ
ッシュラインに強制的に配置することにより、一方のデ
ータ構造がロックされたために、他方のデータ構造も使
用できなくことがなくなり、各キャッシュラインに配置
されたデータ構造を有効に利用することができる。By forcibly allocating the data structure B and the data structure D to different cache lines, it is possible to prevent the other data structure from becoming unusable because one data structure is locked and the other data structure is allocated to each cache line. The created data structure can be effectively used.

【００７２】次に、本発明の第４実施例について説明す
る。第４実施例においては、第１実施例と同様に図１に
示すコンピュータシステム上で、第４実施例による最適
化手段１００ａ，１００ｂ，…が設けられている。第４
実施例におけるコンピュータシステムも、第３実施例と
同様な排他制御を行なうものとする。Next, a fourth embodiment of the present invention will be described. In the fourth embodiment, as in the first embodiment, the optimizing means 100a, 100b, ... According to the fourth embodiment are provided on the computer system shown in FIG. Fourth
The computer system in the embodiment also performs exclusive control similar to that in the third embodiment.

【００７３】最適化手段１００ａは、図１１に示すよう
な機能によって構成されている。図１１に示すように、
最適化手段１００ａは、プログラム解析部１０１、ロッ
ク変数検索手段１０２、データ構造検索手段１０３、コ
ード生成部１０４、データ構造アロケーション手段１０
５、及びコピーデータ配置手段１０６によって構成され
ている。The optimizing means 100a has a function as shown in FIG. As shown in FIG.
The optimization means 100a includes a program analysis unit 101, a lock variable search unit 102, a data structure search unit 103, a code generation unit 104, and a data structure allocation unit 10.
5 and the copy data arranging means 106.

【００７４】プログラム解析部１０１は、所定のプログ
ラム言語を用いて記述された原始プログラムに対して、
コンピュータで実行可能な目的プログラムを作成するた
めの解析を行なうものである。プログラム解析部１０１
は、原始プログラムに対する字句解析、構文解析、意味
解析等を実行する。The program analysis unit 101, for a source program written using a predetermined programming language,
This is an analysis for creating a target program that can be executed by a computer. Program analysis unit 101
Performs lexical analysis, syntactic analysis, semantic analysis, etc. on the source program.

【００７５】ロック変数検索手段１０２は、プログラム
解析部１０１によって解析された結果に基づいて、プロ
グラム中で使用されているロック変数を検索する。デー
タ構造検索手段１０３は、プログラム解析部１０１によ
って解析されるプログラム中で、ロック変数検索手段１
０２によって検索されたロック変数によって保護される
データ構造を検索するものである。The lock variable retrieval means 102 retrieves the lock variable used in the program based on the result analyzed by the program analysis unit 101. The data structure search means 103 is a lock variable search means 1 in the program analyzed by the program analysis unit 101.
To retrieve the data structure protected by the lock variable retrieved by 02.

【００７６】コード生成部１０４は、プログラム解析部
１０１における解析結果に応じて、目的プログラムとな
るコードを生成するものである。データ構造アロケーシ
ョン手段１０５は、プログラム解析部１０１（ロック変
数検索手段１０２、データ構造検索手段１０３）で解析
されたプログラムをもとに、１つのキャッシュライン中
に、ロック変数で保護されたデータ構造が１個以下しか
配置されないようにアロケーションを行なうものであ
る。また、データ構造アロケーション手段１０５は、キ
ャッシュライン中のデータ構造が配置された部分以外の
領域に、そのデータ構造を保護するロック変数を獲得し
てから解放するまでの間に参照されるデータを配置す
る。The code generator 104 is for generating a code to be a target program according to the analysis result of the program analyzer 101. The data structure allocation means 105, based on the program analyzed by the program analysis unit 101 (lock variable search means 102, data structure search means 103), stores the data structure protected by the lock variable in one cache line. The allocation is performed so that only one or less is arranged. Further, the data structure allocation means 105 arranges the data referred to from the acquisition to the release of the lock variable for protecting the data structure in the area other than the portion where the data structure is arranged in the cache line. To do.

【００７７】コピーデータ配置手段１０６は、データ構
造アロケーション手段１０５によって配置された、デー
タ構造を保護するロック変数を獲得してから解放するま
での間に参照されるデータが既に他のキャッシュライン
に配置されている場合に、そのデータが参照のみの対象
となる場合に限って、そのデータのコピーを配置するも
のである。The copy data arranging means 106 arranges the data, which is arranged by the data structure allocating means 105 and is referred to during the period from the acquisition to the release of the lock variable for protecting the data structure, to another cache line. If so, a copy of the data is placed only if the data is for reference only.

【００７８】図１１に示す構成において、プログラム解
析部１０１及びコード生成部１０４は、コンパイラを構
成する。プログラム解析部１０１には、さらにロック変
数検索手段１０２及びデータ構造検索手段１０３による
機能が付加され、コード生成部１０４には、さらにコピ
ーデータ配置手段１０６及びデータ構造アロケーション
手段１０５による機能が付加されている。１０２は、ロ
ック変数検索手段であり、プログラム中で使用されてい
るロック変数を検索する。１０４はロック変数によって
保護されたデータ構造を検索する手段である。本発明の
最適化コンパイラではプログラム解析部で解析されたプ
ログラムを基に、１つのキャッシュライン中に、ロック
で保護されたデータ構造が１個以下しか配置されないよ
うにする。そしてキャッシュライン中の、データ構造が
配置された部分以外の場所に、そのデータ構造を保護す
るロックを獲得してから解放するまでの間に参照するデ
ータを配置する。またもしそのデータが既に他のキャッ
シュライン中に配置されている場合には、そのデータが
参照のみの場合に限って、そのデータのコピーを配置す
る。In the configuration shown in FIG. 11, the program analysis unit 101 and the code generation unit 104 constitute a compiler. The program analysis unit 101 is further provided with the functions of the lock variable search unit 102 and the data structure search unit 103, and the code generation unit 104 is further provided with the functions of the copy data arrangement unit 106 and the data structure allocation unit 105. There is. Reference numeral 102 denotes a lock variable search means, which searches for a lock variable used in the program. Reference numeral 104 is a means for searching a data structure protected by a lock variable. In the optimizing compiler of the present invention, only one data structure protected by a lock is arranged in one cache line based on the program analyzed by the program analysis unit. Then, in the cache line, the data to be referenced is arranged at a place other than the portion where the data structure is arranged, from the time the lock protecting the data structure is acquired to the time it is released. Also, if the data has already been placed in another cache line, a copy of the data is placed only if the data is referenced only.

【００７９】次に、第４実施例の動作について、図１２
に示すフローチャートを参照しながら説明する。まず、
プログラム解析部１０１は、コンパイルされる原始プロ
グラムについて解析を行なう。この際、ロック変数検索
手段１０２は、プログラム中で使用されているロック変
数を検索する（ステップＤ１）。また、データ構造検索
手段１０３は、ロック変数検索手段１０２によって検索
されたロック変数によって保護されるデータ構造を検索
する（ステップＤ２）。Next, the operation of the fourth embodiment will be described with reference to FIG.
This will be described with reference to the flowchart shown in FIG. First,
The program analysis unit 101 analyzes a source program to be compiled. At this time, the lock variable search means 102 searches for the lock variable used in the program (step D1). Further, the data structure search means 103 searches for a data structure protected by the lock variable searched by the lock variable search means 102 (step D2).

【００８０】コード生成部１０４は、プログラム解析部
１０１による解析結果に基づいてコードを生成する。デ
ータ構造アロケーション手段１０５は、コード生成部１
０４によって生成されたデータに関するコードに対し
て、ステップＤ２において検索されたデータ構造を、１
つのキャッシュライン中にロック変数で保護されたデー
タ構造が１個以下しか配置されないようにアロケーショ
ンを行なう（ステップＤ３）。The code generator 104 generates a code based on the analysis result of the program analyzer 101. The data structure allocation unit 105 is the code generation unit 1
For the code relating to the data generated by 04, set the data structure searched in step D2 to 1
Allocation is performed so that no more than one data structure protected by a lock variable is arranged in one cache line (step D3).

【００８１】その後、データ構造アロケーション手段１
０５は、ステップＤ３において配置が完了したデータ構
造について、そのデータ構造を保護するロック変数を獲
得してから解放するまでの間に参照するデータが存在す
るか否かを調べる（ステップＤ４）。After that, the data structure allocation means 1
The step 05 examines whether or not there is data to be referred to from the acquisition of the lock variable that protects the data structure to the release of the data structure that has been arranged in step D3 (step D4).

【００８２】ここで、対象とするデータが存在する場合
には、データ構造アロケーション手段１０５は、そのデ
ータがまだ他のキャッシュラインに配置されていないか
否かを調べる（ステップＤ５）。Here, if the target data exists, the data structure allocation means 105 checks whether or not the data has not been placed in another cache line (step D5).

【００８３】まだ配置されていない場合には、データ構
造アロケーション手段１０５は、そのデータを、ステッ
プＤ３で配置されたデータ構造が配置された部分以外の
領域に配置する（ステップＤ６）。If not already arranged, the data structure allocation means 105 arranges the data in an area other than the portion where the data structure arranged in step D3 is arranged (step D6).

【００８４】なお、ステップＤ５において、既に対象と
するデータ（ロック中に参照されるデータ）が何れかの
キャッシュラインに配置されていると判別された場合に
は、コピーデータ配置手段１０６は、そのデータのコピ
ーを作成する（ステップＢ５）。If it is determined in step D5 that the target data (data referred to during lock) has already been placed in any cache line, the copy data placement means 106 determines that A copy of the data is created (step B5).

【００８５】そして、コピーデータ配置手段１０６は、
コピーによって作成したデータを、そのデータが参照の
みの場合に限って、データ構造が配置された同じキャッ
シュライン中に配置する（ステップＤ８）。Then, the copy data placement means 106
The data created by copying is arranged in the same cache line in which the data structure is arranged only when the data is reference only (step D8).

【００８６】図１３は共有メモリ２のキャッシュライン
にデータが配置される様子を具体的に説明するための図
である。なお、図１３（ａ）に示すプログラムは、便宜
上、原始プログラムの形式によって示している。FIG. 13 is a diagram for specifically explaining how data is arranged in the cache line of shared memory 2. Note that the program shown in FIG. 13A is shown in the format of a source program for convenience.

【００８７】図１３（ａ）に示すプログラム中の関数１
は、まず「ロック変数Ａ」をロックし、「データＢを参
照」し、「データ構造Ｃを更新」し、「ロック変数Ａを
アンロック」する。続いて「ロック変数Ｄをロック」し
て、「データＢを参照」し、「データ構造Ｅを更新」
し、「ロック変数Ｄをアンロック」するものである。Function 1 in the program shown in FIG. 13 (a)
First locks "lock variable A", "references data B", "updates data structure C", and "unlocks lock variable A". Then, "lock lock variable D", "reference data B", and "update data structure E"
Then, the “lock variable D is unlocked”.

【００８８】この場合、ステップＤ３において異なるロ
ック変数Ａ及びＣでそれぞれ保護されたデータ構造Ｃ及
びＥは、各々異なるキャッシュライン中に配置される。
また、図１３（ｂ）に示すように、データ構造Ｃ及びＥ
を保護するロック変数Ａ及びＤを獲得してから解放する
までの間に参照される読み込みのみの対象となるデータ
Ｂを、データ構造Ｃが配置されているキャッシュライン
の残りの部分に配置し、またデータＢのコピーをデータ
構造Ｅが配置されているキャッシュラインの残りの部分
にそれぞれ配置する。In this case, the data structures C and E protected by different lock variables A and C in step D3 are arranged in different cache lines.
Further, as shown in FIG. 13B, the data structures C and E
The data B, which is a read-only target that is referenced between the acquisition and release of the lock variables A and D that protect the data, is arranged in the remaining portion of the cache line in which the data structure C is arranged, Also, a copy of the data B is placed in each of the remaining portions of the cache line where the data structure E is placed.

【００８９】データ構造Ｃとデータ構造Ｅを異なるキャ
ッシュラインに強制的に配置することにより、一方のデ
ータ構造がロックされたために、他方のデータ構造も使
用できなくことがなくなり、さらにデータ構造を保護す
るロック変数を獲得してから解放するまでの間に参照さ
れるデータも同じキャッシュラインに配置されることか
ら、データ構造をキャッシュに格納した際に、既に参照
されるデータがキャッシュ中に存在するので共有メモリ
２から改めてデータを転送する必要がない。さらに、デ
ータ構造がロックされたとしても、参照されるデータ自
身はデータ構造毎にキャッシュに存在しているので、参
照することができる。By forcibly arranging the data structure C and the data structure E in different cache lines, it is possible to prevent the other data structure from becoming unavailable because one data structure is locked, and the data structure is further protected. The data that is referenced from the time the lock variable is acquired to the time it is released is also placed in the same cache line, so when the data structure is stored in the cache, the data that is already referenced already exists in the cache. Therefore, there is no need to transfer the data from the shared memory 2 again. Further, even if the data structure is locked, the referenced data itself exists in the cache for each data structure, and therefore can be referenced.

【００９０】なお、前述した第１〜第４実施例において
は、データあるいはデータ構造をキャッシュラインに配
置する場合について説明したが、入出力処理用のバッフ
ァとして用いる領域についても同様にして配置すること
ができる。In the above-mentioned first to fourth embodiments, the case where the data or data structure is arranged in the cache line has been described, but the areas used as the buffers for input / output processing should be arranged in the same manner. You can

【００９１】図１４はプログラム解析部及びコード生成
部から構成されるコンパイラによって行われるデータ配
置の様子を示す図である。図１４（ａ）に示すプログラ
ムは、「ｂｕｆｆｅｒ１」及び「ｂｕｆｆｅｒ２」を入
出力処理のバッファとして使用するものである。プログ
ラム解析部は、コンパイルの対象とするプログラムを解
析して、バッファの定義を検索し、コード生成部は、検
索された各バッファについて、図１４（ｂ）に示すよう
に、１つのキャッシュライン中に１個以下の入出力バッ
ファしか配置されないようにする。すなわち、あるキャ
ッシュラインを配置した後、キャッシュライン中に空き
の領域があっても他のキャッシュラインの配置を行わな
いようにする。FIG. 14 is a diagram showing how data is arranged by a compiler composed of a program analysis section and a code generation section. The program shown in FIG. 14A uses "buffer1" and "buffer2" as buffers for input / output processing. The program analysis unit analyzes the program to be compiled and retrieves the definition of the buffer, and the code generation unit, for each retrieved buffer, as shown in FIG. Make sure that no more than one I / O buffer is allocated to each. That is, after a certain cache line is arranged, another cache line is not arranged even if there is an empty area in the cache line.

【００９２】入出力処理用のバッファを１つのキャッシ
ュライン中に１個以下しかアロケーションしないように
することにより、例えば異なるバッファを用いるたびに
キャッシュラインのデータがキャッシュメモリ間で転送
されるようなことが発生しない。By allocating no more than one buffer for input / output processing in one cache line, data in the cache line is transferred between cache memories each time a different buffer is used. Does not occur.

【００９３】[0093]

【発明の効果】以上詳述したように本発明によれば、そ
れぞれのプロセッサに対応してキャッシュが設けられた
マルチプロセッサシステムにおいて、キャッシュの能力
を活かして性能を上げることが可能となるものである。As described in detail above, according to the present invention, in a multiprocessor system in which a cache is provided for each processor, it is possible to improve the performance by utilizing the capacity of the cache. is there.

[Brief description of drawings]

【図１】本発明の一実施例に係わるコンピュータシステ
ムの概略構成を示すブロック図。FIG. 1 is a block diagram showing a schematic configuration of a computer system according to an embodiment of the present invention.

【図２】本発明の第１実施例における最適化手段の機能
構成を示すブロック図。FIG. 2 is a block diagram showing a functional configuration of an optimizing unit according to the first exemplary embodiment of the present invention.

【図３】本発明の第１実施例の最適化手段５ａにおける
データ配置手段１５の処理の流れを示すフローチャー
ト。FIG. 3 is a flowchart showing a processing flow of a data arranging means 15 in the optimizing means 5a according to the first embodiment of the present invention.

【図４】本発明の第１実施例におけるキャッシュライン
にデータが配置される様子を具体的に説明するための
図。FIG. 4 is a diagram for specifically explaining how data is arranged in a cache line in the first embodiment of the present invention.

【図５】本発明の第２実施例における最適化手段の機能
構成を示すブロック図。FIG. 5 is a block diagram showing a functional configuration of an optimizing unit according to a second exemplary embodiment of the present invention.

【図６】本発明の第２実施例の最適化手段４０ａにおけ
るデータ配置手段４５及びコピーデータ配置手段４６の
処理の流れを示すフローチャート。FIG. 6 is a flowchart showing a processing flow of a data arranging unit 45 and a copy data arranging unit 46 in the optimizing unit 40a according to the second embodiment of the present invention.

【図７】本発明の第２実施例におけるキャッシュライン
にデータが配置される様子を具体的に説明するための
図。FIG. 7 is a diagram for specifically explaining how data is arranged in a cache line according to the second embodiment of the present invention.

【図８】本発明の第３実施例における最適化手段の機能
構成を示すブロック図。FIG. 8 is a block diagram showing a functional configuration of an optimizing unit according to a third exemplary embodiment of the present invention.

【図９】本発明の第３実施例の動作を説明するためのフ
ローチャート。FIG. 9 is a flowchart for explaining the operation of the third embodiment of the present invention.

【図１０】本発明の第３実施例におけるキャッシュライ
ンにデータが配置される様子を具体的に説明するための
図。FIG. 10 is a diagram for specifically explaining how data is arranged in a cache line according to the third embodiment of the present invention.

【図１１】本発明の第４実施例における最適化手段の機
能構成を示すブロック図。FIG. 11 is a block diagram showing a functional configuration of an optimizing unit according to a fourth exemplary embodiment of the present invention.

【図１２】本発明の第４実施例の動作を説明するための
フローチャート。FIG. 12 is a flowchart for explaining the operation of the fourth embodiment of the present invention.

【図１３】本発明の第４実施例におけるキャッシュライ
ンにデータが配置される様子を具体的に説明するための
図。FIG. 13 is a diagram for specifically explaining how data is arranged in a cache line according to the fourth embodiment of the present invention.

【図１４】本発明における入出力処理用のバッファとし
て用いる領域がキャッシュラインに配置される様子を具
体的に説明するための図。FIG. 14 is a diagram for specifically explaining how an area used as a buffer for input / output processing according to the present invention is arranged in a cache line.

[Explanation of symbols]

１１，４１，７１，１０１…プログラム解析部、１２，
４２…データ認識手段、１３，４３，７４，１０４…コ
ード生成部、１４，４４…キャッシュラインサイズ認識
手段、１５，４５…データ配置手段、４６，１０６…コ
ピーデータ配置手段、７２，，１０２…ロック変数検索
手段、７３，１０３…データ構造検索手段、７５，１０
５…データ構造アロケーション手段。11, 41, 71, 101 ... Program analysis unit 12,
42 ... Data recognition means, 13, 43, 74, 104 ... Code generation section, 14, 44 ... Cache line size recognition means, 15, 45 ... Data placement means, 46, 106 ... Copy data placement means, 72, 102 ... Lock variable search means, 73, 103 ... Data structure search means, 75, 10
5 ... Data structure allocation means.

Claims

[Claims]

1. A multiprocessor system in which a plurality of processors each having a cache are connected, and data in a memory is operated in a predetermined cache line unit, which is executable for a source program. Data recognition means that recognizes the data to be read only when performing analysis to create the target program, cache size recognition means that recognizes the size of the cache line in the local system, and code depending on the analysis result. At the time of generation, based on the recognition result by the data recognizing means and the cache size recognizing means, the read-only data is read so that a plurality of data accessed simultaneously by the program are simultaneously stored in the cache from the memory. At the same time as the A multiprocessor system comprising: a data arranging unit arranged on the same cache line where the read / write data to be accessed is arranged.

2. When arranging the data by the data arranging means, when there are a plurality of cache lines in which the read-only data should be arranged, the read-only data is copied into a plurality of cache lines. 2. The multiprocessor system according to claim 1, further comprising copy data arranging means for arranging.

3. When a lock variable is added to a data structure and the data structure is accessed, the lock variable is set, the data structure is accessed, and the lock variable is reset after the access to the data structure is completed. Due to
In a multiprocessor system that exclusively controls the data structure shared by multiple processors, the data structure used in the program is exclusively controlled when performing analysis to create an executable target program for the source program. Lock variable searching means for searching a lock variable, a data structure searching means for searching a data structure protected by the lock variable searched by the lock variable searching means, and code generation according to an analysis result. At the time of execution, a data structure allocation means for allocating no more than one data structure protected by a lock variable in one cache line based on the search results by the lock variable search means and the data structure search means. And a machine characterized by Multiprocessor system.

4. Copy data allocating a copy of read-only data to be accessed in a portion other than an allocated data structure in a cache line between setting and resetting a lock for protecting the data structure. 4. The multiprocessor system according to claim 3, further comprising arranging means.

5. A multiprocessor system in which a plurality of processors each having a cache are connected, and data in a memory is operated in a predetermined cache line unit, which is executable for a source program. Buffer confirmation means for recognizing an input / output processing buffer when performing analysis for creating an object program, and input / output based on the recognition result by the buffer recognizing means when code generation is performed according to the analysis result. A multiprocessor system comprising: allocation means for allocating no more than one processing buffer in one cache line.

6. A multiprocessor system in which a plurality of processors each having a cache are connected, and data in a memory is operated in a predetermined cache line unit, which can be executed for a source program. Recognized when the data to be read is recognized when performing the analysis for creating the target program, the size of the cache line in the local system is recognized, and when the code is generated according to the analysis result. Depending on the size of the cache line, read-only data that is recognized at the same time as read-only data can be accessed at the same time as the read-only data that was previously recognized so that multiple data that the program accesses simultaneously can be stored in the cache from memory at the same time. Placed on the same cache line where is placed A method for optimizing memory allocation, comprising:

7. When the read-only data is arranged on the same cache line where the read / write data accessed at the same time as the read-only data is arranged, the cache line to which the read-only data should be arranged is 7. The memory allocation optimizing method according to claim 6, wherein when there are a plurality of data, the read-only data is copied and arranged in a plurality of cache lines.

8. When a lock variable is added to the data structure and the data structure is accessed, the lock variable is set, the data structure is accessed, and the lock variable is reset after the access to the data structure is completed. Due to
In a multiprocessor system that exclusively controls the data structure shared by multiple processors, the data structure used in the program is exclusively controlled when performing analysis to create an executable target program for the source program. Search the lock variable to be searched, search the data structure protected by the searched lock variable, and generate the code according to the analysis result. A method for optimizing memory allocation, characterized in that only one or less is allocated in the memory.

9. Allocating a copy of read-only data that is accessed between setting and resetting a lock protecting the data structure in a portion of the cache line other than the allocated data structure. 9. The method of optimizing memory allocation according to claim 8.

10. A multiprocessor system in which a plurality of processors each having a cache are connected, and data in a memory is operated in a predetermined cache line unit, which can be executed for a source program. Recognize the I / O processing buffer when performing analysis to create the target program, and recognize the I / O processing buffer in one cache line when generating code according to the analysis result.
A method for optimizing memory allocation, characterized in that only less than or equal to each number is allocated.