JP6303632B2

JP6303632B2 - Arithmetic processing device and control method of arithmetic processing device

Info

Publication number: JP6303632B2
Application number: JP2014046912A
Authority: JP
Inventors: 石井　寛之; 寛之石井
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2014-03-10
Filing date: 2014-03-10
Publication date: 2018-04-04
Anticipated expiration: 2034-03-10
Also published as: JP2015170313A

Description

本発明は、演算処理装置および演算処理装置の制御方法に関する。 The present invention relates to an arithmetic processing unit and a control method for the arithmetic processing unit.

メインフレーム等、情報処理装置の演算処理装置には、主記憶の一定記憶領域ごと、例えば４ＫＢごとにＫＥＹデータという主記憶管理のためのデータが存在するものがある。一定領域ごとのＫＥＹデータは、例えば、それぞれ対応する一定領域の参照（Ｒｂｉｔ）と更新（Ｃｂｉｔ）を表現する。演算処理装置のＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）や、ＣＰＵに含まれるコアが主記憶内のある記憶領域へストア（ＳＴ）命令を発行すると、ストア命令の発行に伴って、以下の処理がなされる。すなわち、ＲｂｉｔとＣｂｉｔを更新する更新要求（ＳＥＴ−ＲＣと呼ぶことにする）がＫＥＹデータを管理するＫＥＹ制御部へ発行される。以下、ＣＰＵあるいはＣＰＵに含まれるコアをプロセッサ、あるいは演算処理装置ともいう。ただし、演算処理装置が複数のＣＰＵを有し、各ＣＰＵが複数のコアを有する場合もある。さらに、以下、ＫＥＹデータを管理データと呼ぶことにする。 Some arithmetic processing devices such as mainframes have information for main memory management called KEY data for every fixed storage area of the main memory, for example, every 4 KB. The KEY data for each fixed area represents, for example, the reference (R bit) and update (C bit) of the corresponding fixed area. When a CPU (Central Processing Unit) of the arithmetic processing unit or a core included in the CPU issues a store (ST) instruction to a certain storage area in the main memory, the following processing is performed along with the issue of the store instruction. That is, an update request (referred to as SET-RC) for updating R bit and C bit is issued to the KEY control unit that manages the KEY data. Hereinafter, the CPU or the core included in the CPU is also referred to as a processor or an arithmetic processing unit. However, the arithmetic processing unit may have a plurality of CPUs, and each CPU may have a plurality of cores. Hereinafter, the KEY data is referred to as management data.

実際に記憶領域にストアされたデータと、データがストアされた記憶領域の管理データには、整合性が保たれていることが求められる。ストア対象のデータを別のプロセッサが参照したとき、書き込み後の最新データに伴って発行された管理データの更新要求が処理されていることが、例えば、システム管理上要求される。なお、演算処理装置において、管理データ（ＫＥＹデータ）を参照する参照命令としては、ＩＳＫ（ＩｎｓｅｒｔＳｔｏｒａｇｅＫｅｙ）命令が例示される。例えば、ＩＳＫ命令は、オペランドで指定されたレジスタに対して、指定された主記憶のアドレスに対応するＫＥＹデータを主記憶から読み出してセットする。ＯＳ（ＯｐｅｒａｔｉｎｇＳｙｓｔｅｍ）は、例えば、ＩＳＫ命令を用いて主記憶のＲｂｉｔ、Ｃｂｉｔ等を読み出し、仮想記憶におけるページの入れ替え時に、どのページを先に追い出すか等の判断に使用する。ＯＳは、例えば、頻繁に更新、参照されるページは極力追い出さないようにページを管理すればよい。 The data actually stored in the storage area and the management data of the storage area in which the data is stored are required to be consistent. When another processor refers to the data to be stored, for example, it is requested in terms of system management that a management data update request issued with the latest data after writing is being processed. In the arithmetic processing unit, an ISK (Insert Storage Key) instruction is exemplified as a reference instruction for referring to management data (KEY data). For example, the ISK instruction reads the KEY data corresponding to the address of the designated main memory from the main memory and sets it in the register designated by the operand. An OS (Operating System) reads, for example, R bits and C bits in the main memory using an ISK instruction, and uses it to determine which page is first driven when a page is replaced in the virtual memory. For example, the OS may manage pages so that pages that are frequently updated and referenced are not driven out as much as possible.

演算処理装置においては、ロード（ＬＤ）命令で新データが参照された後に発行される管理データの参照命令によって、ストア（ＳＴ）命令発行前の管理データが参照できない制御がなされている。 In the arithmetic processing unit, control is performed such that management data before issuing a store (ST) instruction cannot be referred to by a management data reference instruction issued after new data is referenced by a load (LD) instruction.

例えば、演算処理装置が、複数のプロセッサとして、ＣＰＵ０とＣＰＵ１とを有する場合の主記憶へのストアとロードの処理は、以下のように例示される。
１．ＣＰＵ０ＳＴ（Ａ）・・・ＣＰＵ０がＡというアドレスにデータをストアした。ストアに伴って、ＳＥＴ−ＲＣ（Ａ）が発行される。
２．ＣＰＵ１ＬＤ（Ａ）・・・ＣＰＵ１がＡというアドレスのデータを読み出した。
３．ＣＰＵ１ＩＳＫ（Ａ）・・・ＣＰＵ１がＡというアドレスを含む一定範囲の記憶領域に対応するＫＥＹデータを読み出す命令を発行する。 For example, when the arithmetic processing unit has CPU 0 and CPU 1 as a plurality of processors, the store and load processing to the main memory is exemplified as follows.
1. CPU0 ST (A)... CPU0 stores data at address A. A SET-RC (A) is issued along with the store.
2. CPU1 LD (A)... CPU1 reads data at address A.
3. CPU1 ISK (A)... CPU1 issues a command to read KEY data corresponding to a certain range of storage area including the address A.

手順２でロードしたときに、手順１でストアしたデータが参照された場合、手順３で発行したＩＳＫ（管理データの参照命令）は、手順１でのＳＥＴ−ＲＣ（管理データの更新要求）後のＫＥＹデータ（管理データ）を参照する結果となるように、ＫＥＹデータの設定と参照との関係が制御されている。 When the data stored in step 1 is referenced when loaded in step 2, the ISK (management data reference command) issued in step 3 is the same as the SET-RC (management data update request) in step 1. The relationship between the setting and reference of the KEY data is controlled so as to result in referring to the KEY data (management data).

図１に、主記憶へのアクセスレイテンシがどのＣＰＵからも平等なＵＭＡ（Ｕｎｉｆｏ
ｒｍＭｅｍｏｒｙＡｃｃｅｓｓ）型の大規模ＳＭＰ（Ｓｙｍｍｅｔｒｉｃｍｕｌｔｉｐｒｏｃｅｓｓｉｎｇ）構成の演算処理装置の構成を例示する。図１の演算処理装置は、複数のシステムボードを有する。また、各システムボードは、複数のＣＰＵ（ＣＰＵ１−ＬＳＩ、ＣＰＵ２−ＬＳＩ、ＣＰＵ３−ＬＳＩ等）と、複数のＳＣ（ＳｙｓｔｅｍＣｏｎｔｒｏｌｌｅｒ；ＳＣ１−ＬＳＩ、ＳＣ２−ＬＳＩ、ＳＣ３−ＬＳＩ等）と、複数に区分された主記憶（ＭＥＭ、メインストレージ）とを有する。さらに、各ＣＰＵは、例えば、複数の演算コアと、キャッシュ制御部と、外部インターフェース部とを有する。一方、各ＳＣは、ＣＰＵポート部と、キャッシュＴＡＧ情報部と、メモリコントローラと、ＫＥＹ制御部とを有する。各ＣＰＵの外部インターフェース部と、それぞれのＣＰＵに対応するＳＣのＣＰＵポート部とが接続されている。また、各システムボードのＳＣは、他のシステムボードのＳＣと、例えば、インターコネクトと呼ばれる伝送路で接続される。 FIG. 1 shows a UMA (Unifo) where the access latency to the main memory is equal from any CPU.
An example of a configuration of an arithmetic processing device having a large-scale SMP (Symmetric multiprocessing) configuration of an rm Memory Access) type is illustrated. The arithmetic processing apparatus of FIG. 1 has a plurality of system boards. Each system board includes a plurality of CPUs (CPU1-LSI, CPU2-LSI, CPU3-LSI, etc.), a plurality of SCs (System Controller; SC1-LSI, SC2-LSI, SC3-LSI, etc.), and a plurality of CPUs. It has a divided main memory (MEM, main storage). Further, each CPU has, for example, a plurality of arithmetic cores, a cache control unit, and an external interface unit. On the other hand, each SC has a CPU port unit, a cache TAG information unit, a memory controller, and a KEY control unit. The external interface unit of each CPU is connected to the CPU port unit of the SC corresponding to each CPU. In addition, the SC of each system board is connected to the SCs of other system boards through, for example, a transmission line called an interconnect.

主記憶（図中ＭＥＭ）に対するアクセス命令は演算コアから発行される。アクセス命令の対象データが演算コア内に配置されたＬ１キャッシュでヒットしなかった場合には、アクセス命令がＬ２キャッシュへのアクセスとなって転送される。さらに、アクセス命令の対象データがＬ２キャッシュでヒットしなかった場合、ＣＰＵの外に別チップで存在するＳＣへと要求が伝達される。 An access instruction to the main memory (MEM in the figure) is issued from the arithmetic core. When the target data of the access instruction does not hit in the L1 cache arranged in the arithmetic core, the access instruction is transferred as an access to the L2 cache. Further, when the target data of the access instruction does not hit in the L2 cache, the request is transmitted to the SC existing in another chip outside the CPU.

外部インターフェース部は、演算コアからＳＣへの要求を送り出す。要求は外部インターフェース部経由でＳＣへと伝えられる。ＳＣは、キャッシュＴＡＧ情報と呼ばれる情報を保持している。キャッシュＴＡＧ情報は、アクセス対象のデータがどのＳＣの管理下にあるかを示す情報を含む。ＳＣは、例えば、ローカルパイプラインと呼ばれるパイプライン処理によってキャッシュＴＡＧ情報を参照する。さらに、ＳＣは、他のシステムボードに実装される別のＳＣと通信を行いながら、最終的に要求されたデータを格納するメモリを管理するメモリコントローラやＫＥＹ制御部へ要求を伝達する。ＳＣは、キャッシュＴＡＧ情報の示すアドレスを管理するメモリコントローラにアクセス命令を送る。メモリコントローラへのアクセス要求、あるいは他のＳＣとの通信は、グローバルパイプラインと呼ばれるパイプライン処理によって行われる。各ＳＣのグローバルパイプラインは、互いに同期して実行される。 The external interface unit sends a request from the computation core to the SC. The request is transmitted to the SC via the external interface unit. The SC holds information called cache TAG information. The cache TAG information includes information indicating which SC is under management of data to be accessed. The SC refers to the cache TAG information by pipeline processing called local pipeline, for example. Further, the SC communicates a request to a memory controller or a KEY control unit that manages a memory that stores the finally requested data while communicating with another SC mounted on another system board. The SC sends an access command to the memory controller that manages the address indicated by the cache TAG information. An access request to the memory controller or communication with another SC is performed by pipeline processing called a global pipeline. The global pipeline of each SC is executed in synchronization with each other.

なお、メモリコントローラはメインストレージとのデータのやり取りを行うユニットであり、ＫＥＹ制御部はＫＥＹデータを管理するユニットである。ＫＥＹデータに関係する要求においてはＳＣの内部は後続の要求が追い越すことはなく一旦受け付けられた要求は全てのＳＣで同期して動作することで到着順に処理される。
ところで、上述のように要求がＳＣに入ってからは、ＫＥＹデータに関係する要求の実行順序が保証される。一方、要求の発行元であるＣＰＵ側でもＫＥＹデータに関係する要求の実行順序を保証する仕組みが求められる。ところで、ＣＰＵの外部インターフェース部では要求種別ごとにキューが分かれている。つまり、ＣＰＵの外部インターフェース部は、要求された順序とは無関係にＳＣに対して、要求種別間で平等に要求を発行する。 The memory controller is a unit that exchanges data with the main storage, and the KEY control unit is a unit that manages KEY data. In the request related to the KEY data, the subsequent request is not overtaken in the SC, and the received request is processed in the order of arrival by operating synchronously in all the SCs.
By the way, after the request enters the SC as described above, the execution order of the request related to the KEY data is guaranteed. On the other hand, a mechanism for guaranteeing the execution order of requests related to the KEY data is also required on the CPU side which is a request issuer. By the way, in the external interface unit of the CPU, a queue is divided for each request type. In other words, the external interface unit of the CPU issues requests equally between the request types to the SC regardless of the requested order.

図２に、演算処理装置内での要求の流れを例示する。図２において、ＣＰＵの外部インターフェース部の中にある２つの四角が要求種別毎に存在するキューを例示している。図２では先に説明したストア命令（ＳＴ）がＣＰＵ０で発行された後に、ＣＰＵ１からロード命令（ＬＤ）が発行された場合を例示している。図２のＳ１ではＣＰＵ０の演算コアがストア命令（ＳＴ）を実行し、それに伴い管理データ（ＫＥＹ）のＲＣｂｉｔ更新要求であるＳＥＴ−ＲＣ要求が発行される。ＳＥＴ−ＲＣ要求は外部インターフェース部内のキューへセットされる。しかしこのＳＥＴ−ＲＣ要求は外部インターフェース部内のＳＥＴ−ＲＣキューの混み具合によっては直ちにＳＣへ発行されるとは限らない。その間に別のＣＰＵであるＣＰＵ１が、ＣＰＵ０によってストア（ＳＴ）されたアドレスに対してロード（ＬＤ）命令の要求（ＬＤ要求）を発行したとする（Ｓ２）。ＬＤ要求はＳＣを経由
して（Ｓ３）、実際にデータを保持しているＣＰＵ０へとデータの読み出し要求（吐き出し要求ともいう）が発行される（Ｓ４）。吐き出し要求を受けたＣＰＵ０の演算コアによって読み出されたデータは外部インターフェース部を経由してＳＣへと応答される。このときストア（ＳＴ）されて更新されたデータの読み出しがＳＥＴ−ＲＣを追い抜いてしまうと、管理データ（ＫＥＹ）が更新されていない状態でデータの更新が他のＣＰＵへ伝わってしまう。そのため、外部インターフェース部内のキュー取り出し部は、管理データ（ＫＥＹ）の設定要求（ＳＥＴ−ＲＣ）がキュー内からなくなるまで読み出し要求に対するデータの応答をＳＣへは発行しない。 FIG. 2 illustrates a request flow in the arithmetic processing unit. FIG. 2 illustrates a queue in which two squares in the external interface unit of the CPU exist for each request type. FIG. 2 illustrates a case where the load instruction (LD) is issued from the CPU 1 after the store instruction (ST) described above is issued by the CPU 0. In S1 of FIG. 2, the arithmetic core of CPU0 executes a store instruction (ST), and accordingly, a SET-RC request that is an RC bit update request for management data (KEY) is issued. The SET-RC request is set in a queue in the external interface unit. However, this SET-RC request is not always issued to the SC immediately depending on the congestion of the SET-RC queue in the external interface unit. In the meantime, it is assumed that CPU1 which is another CPU issues a load (LD) instruction request (LD request) to the address stored (ST) by CPU0 (S2). The LD request is sent via the SC (S3), and a data read request (also called a discharge request) is issued to the CPU 0 that actually holds the data (S4). The data read by the computation core of the CPU 0 that has received the discharge request is returned to the SC via the external interface unit. At this time, if reading of the stored (ST) and updated data passes the SET-RC, the update of the data is transmitted to the other CPU in a state where the management data (KEY) is not updated. Therefore, the queue extraction unit in the external interface unit does not issue a data response to the read request to the SC until the management data (KEY) setting request (SET-RC) disappears from the queue.

特表２００３−５１２６７３号公報Japanese translation of PCT publication No. 2003-512673 特開２００３−６７３５７号公報JP 2003-67357 A 特開平１１−１６７５５７号公報JP-A-11-167557 特開２００３−３２３４１５号公報JP 2003-323415 A

上述のように、従来技術では、ＵＭＡ型のＳＭＰ構成の演算処理装置において、要求がＳＣ等のＣＰＵ外の制御部に入ってから、その制御部が他の制御部と連携することで後続の要求が先行する要求を追い抜かないようになっている。一方、要求がＳＣ等の制御部に入る前の順序保証は、ＣＰＵ側で追い越しが起こらない制御によってなされていた。しかしながら、図２のようにどのＣＰＵからもメモリが等距離にあるような構造（ＵＭＡ）である場合、主記憶へのアクセスに制御部（ＳＣ）間の連携通信が入るため、レイテンシが問題となる。つまり、ＣＰＵの半導体集積度の向上に伴い、このような主記憶のレイテンシの問題や部品点数の削減などの観点から次のような課題が生じる。すなわち、例えば、ＣＰＵ外で管理データ（ＫＥＹ）を制御する制御部（ＳＣ）をなくするとともに、複数ＣＰＵ間で主記憶の領域ごとにアクセス時間が均一とは限らないＮＵＭＡ（Non-Uniform Memory Access、ヌマ）型構造を取り入れることが望まれている。しかしながら、複数のコ
ア間、あるいは複数のプロセッサ間で管理データの順序を制御する制御部（ＳＣ）をなくそうとすると、管理データ（ＫＥＹ）特有の順序保証、例えば、主記憶へのアクセスに対応する管理データの設定前に、当該管理データを読み出さないようにするような制御をどのように実現するか、という課題が生じる。 As described above, according to the conventional technology, in a UMA-type arithmetic processing unit having an SMP configuration, after a request enters a control unit outside the CPU, such as an SC, the control unit cooperates with another control unit, and the subsequent control unit. The request does not overtake the preceding request. On the other hand, the order guarantee before the request enters the control unit such as the SC is made by the control that does not cause overtaking on the CPU side. However, in the case of a structure (UMA) in which the memory is equidistant from any CPU as shown in FIG. 2, the cooperative communication between the control units (SC) enters the access to the main memory, so that the latency is a problem. Become. In other words, with the improvement of the semiconductor integration degree of the CPU, the following problems arise from the viewpoint of such a problem of main memory latency and a reduction in the number of parts. That is, for example, the control unit (SC) that controls the management data (KEY) outside the CPU is eliminated, and the access time is not necessarily uniform for each area of the main memory among a plurality of CPUs. ), It is desirable to adopt a type structure. However, if an attempt is made to eliminate the control unit (SC) that controls the order of management data between a plurality of cores or between a plurality of processors, the order guarantee unique to the management data (KEY), for example, access to the main memory is supported. Before setting the management data to be performed, there arises a problem of how to realize control that prevents the management data from being read.

開示の技術の一側面は、自装置に対応する主記憶装置をそれぞれ管理するともに、他の演算処理装置が管理する他の主記憶装置に前記他の演算処理装置を通じてアクセス可能な演算処理装置によって例示される。本演算処理装置は、前記自装置に対応する主記憶装置と前記他の主記憶装置のそれぞれの記憶領域ごとに、前記記憶領域がアクセスされたときに設定される管理データの設定または読み出しを行う管理データ制御部と、前記自装置に対応する主記憶装置又は前記他の主記憶装置のいずれかの記憶領域にアクセスされた場合、アクセスされた前記記憶領域に対応する管理データの読み出しよりも管理データの設定を優先して実行させる要求処理部と、を備える。 One aspect of the disclosed technology is that each of the main storage devices corresponding to its own device is managed by an arithmetic processing device that can access another main storage device managed by the other arithmetic processing device through the other arithmetic processing device. Illustrated. The arithmetic processing unit sets or reads management data set when the storage area is accessed for each storage area of the main storage device corresponding to the own device and the other main storage device. When the management data control unit and the storage area of the main storage device corresponding to the own device or the other main storage device are accessed, management is performed rather than reading of the management data corresponding to the accessed storage area. A request processing unit that prioritizes data setting.

本演算処理装置によれば、複数の演算処理装置間で、管理データの順序保証、例えば、主記憶へのアクセスに対応する管理データの設定前に、当該管理データを読み出さないようにするような制御が可能となる。 According to this arithmetic processing unit, the management data is not read before setting the management data corresponding to the guarantee of the order of the management data, for example, access to the main memory, between the plurality of arithmetic processing units. Control becomes possible.

ＵＭＡ型の大規模ＳＭＰ構成の演算処理装置の構成を例示する図である。It is a figure which illustrates the structure of the arithmetic processing apparatus of a UMA type large-scale SMP structure. 演算処理装置内での要求の流れを例示する図である。It is a figure which illustrates the flow of a request in an arithmetic processing unit. 実施例１に係るＮＵＭＡ型の演算処理装置１０の構成を例示する図である。1 is a diagram illustrating a configuration of a NUMA type arithmetic processing apparatus 10 according to a first embodiment. ＫＥＹリクエスト処理部の構成を例示する図である。It is a figure which illustrates the structure of a KEY request process part. 比較例に係る演算処理装置の制御シーケンスチャートを例示する図である。It is a figure which illustrates the control sequence chart of the arithmetic processing unit which concerns on a comparative example. 比較例に係る演算処理装置の制御シーケンスチャートを例示する図である。It is a figure which illustrates the control sequence chart of the arithmetic processing unit which concerns on a comparative example. 実施例１の方式を適用したシーケンスチャートを例示する図である。It is a figure which illustrates the sequence chart to which the system of Example 1 is applied. 実施例２に係るキーリクエスト処理部の構成を例示する図である。FIG. 10 is a diagram illustrating a configuration of a key request processing unit according to a second embodiment.

以下、図面を参照して、一実施形態に係る演算処理装置について説明する。以下の実施形態の構成は例示であり、本演算処理装置は実施形態の構成には限定されない。以下の実施例１、２では、メモリコントローラやＫＥＹ制御部などの機能をＣＰＵに内蔵し、メモリをＣＰＵ直結にすることでメモリレイテンシを削減しＮＵＭＡ（Ｎｏｎ−ＵｎｉｆｏｒｍＭｅｍｏｒｙＡｃｃｅｓｓ）構成が実現される。ただし、ＮＵＭＡ構成ではＳＣが存在しない。このため、図２に例示した仕組みでは、ＫＥＹデータ（管理データに相当）への要求の順序保証を行うことができない。そこで、以下の実施例１、２では、ＫＥＹデータへの要求の順序制御を行う新たな回路が提案される。 Hereinafter, an arithmetic processing apparatus according to an embodiment will be described with reference to the drawings. The configuration of the following embodiment is an exemplification, and the arithmetic processing apparatus is not limited to the configuration of the embodiment. In the following first and second embodiments, functions such as a memory controller and a KEY control unit are built in the CPU, and the memory is reduced by connecting the memory directly to the CPU, thereby realizing a NUMA (Non-Uniform Memory Access) configuration. . However, there is no SC in the NUMA configuration. For this reason, the mechanism illustrated in FIG. 2 cannot guarantee the order of requests to KEY data (corresponding to management data). Therefore, in the following first and second embodiments, a new circuit for performing the order control of requests for KEY data is proposed.

［実施例１］
図３から図７を参照して、実施例１に係る演算処理装置を説明する。図３に、実施例１に係るＮＵＭＡ型の演算処理装置１０の構成を例示する。演算処理装置１０は、複数のＣＰＵ０，ＣＰＵ１等と、ＣＰＵ０およびＣＰＵ１等からアクセスされる主記憶（ＭＥＭ）２を有する。ただし、図３では、ＣＰＵ０の管理化にある主記憶２の部分を主記憶２Ａといい、ＣＰＵ１の管理化にある主記憶の部分を主記憶２Ｂという。主記憶２Ａ、２Ｂを総称する場合には、主記憶２ということにする。主記憶２Ａ、２Ｂが主記憶装置の一例である。ここで、主記憶２Ａ（２Ｂ）がＣＰＵ０（ＣＰＵ１）の管理化にあるとは、例えば、ＣＰＵ０（ＣＰＵ１）が、主記憶２Ａ（２Ｂ）へのデータ書き込み、および読み出しを行うとともに、主記憶２Ａ（２Ｂ）の状態を管理していることをいう。また、主記憶２Ａ（２Ｂ）の状態としては、主記憶２Ａ（２Ｂ）のデータがＣＰＵ０（ＣＰＵ１）以外の他のＣＰＵにＣＰＵ０（ＣＰＵ１）を介して引き渡されて、読み出し中であるような状態が例示される。ＣＰＵ０（ＣＰＵ１）は、自身が管理する主記憶２Ａ（２Ｂ）にアクセスするとともに、相手のＣＰＵ１（ＣＰＵ０）を介して、相手のＣＰＵ１（ＣＰＵ０）が管理する主記憶２Ｂ（２Ａ）にアクセスする。したがって、演算処理装置１０は、ＣＰＵ０，ＣＰＵ１等から主記憶２Ａ、２Ｂへのアクセス時間が非均等なシステムであるＮＵＭＡ型のシステムであるということができる。なお、ＣＰＵ０、ＣＰＵ１は、演算処理部の一例である。ただし、ＣＰＵ０、ＣＰＵ１は、演算処理装置の一例ということもできる。 [Example 1]
The arithmetic processing apparatus according to the first embodiment will be described with reference to FIGS. FIG. 3 illustrates the configuration of the NUMA type arithmetic processing apparatus 10 according to the first embodiment. The arithmetic processing unit 10 has a plurality of CPU0, CPU1, etc., and a main memory (MEM) 2 accessed from the CPU0, CPU1, etc. However, in FIG. 3, the portion of the main memory 2 managed by the CPU 0 is referred to as a main memory 2A, and the portion of the main memory managed by the CPU 1 is referred to as a main memory 2B. When the main memories 2A and 2B are collectively referred to as the main memory 2. The main memories 2A and 2B are examples of the main storage device. Here, the main memory 2A (2B) is under the management of the CPU 0 (CPU 1). For example, the CPU 0 (CPU 1) writes and reads data to and from the main memory 2A (2B) , and the main memory 2A. It means that the state of (2B) is managed. Further, the state of the main memory 2A (2B) is a state in which data in the main memory 2A (2B) is being transferred to another CPU other than the CPU0 (CPU1) via the CPU0 (CPU1) and is being read. Is exemplified. The CPU 0 (CPU 1) accesses the main memory 2A (2B) managed by itself, and accesses the main memory 2B (2A) managed by the other CPU 1 (CPU 0) via the other CPU 1 (CPU 0). Therefore, it can be said that the arithmetic processing unit 10 is a NUMA type system which is a system in which the access times from the CPU 0, CPU 1, etc. to the main memories 2A, 2B are not uniform. CPU0 and CPU1 are an example of an arithmetic processing unit. However, CPU0 and CPU1 can also be referred to as examples of arithmetic processing devices.

＜ＣＰＵの構成＞
演算処理装置１０内の各ＣＰＵ、例えば、ＣＰＵ０とＣＰＵ１とは、インターコネクトと呼ばれる伝送路３により接続されている。 <Configuration of CPU>
Each CPU in the arithmetic processing unit 10, for example, CPU0 and CPU1, is connected by a transmission path 3 called an interconnect.

さらに、例えば、ＣＰＵ０は、演算コア１１Ａ、メモリコントローラ１２Ａ、キャッシュ制御部１３Ａ、ＫＥＹ制御部１４Ａ、およびＫＥＹリクエスト処理部１５Ａを有している。なお、他のＣＰＵ、例えば、ＣＰＵ１も、ＣＰＵ０と同様に、演算コア１１Ｂ、メモリコントローラ１２Ｂ、キャッシュ制御部１３Ｂ、ＫＥＹ制御部１４Ｂ、およびＫＥＹリクエスト処理部１５Ｂを有している。以下、ＣＰＵ０を例にして、各ＣＰＵの構成および作用を例示する。なお、総称する場合には、ＣＰＵ０、ＣＰＵ１をＣＰＵと呼び、演算コア１１Ａ、１１Ｂを演算コア１１と呼び、メモリコントローラ１２Ａ、１２Ｂをメモリコントローラ１２と呼ぶことにする。同様に、総称する場合には、キャッシュ制御部１３Ａ
、１３Ｂをキャッシュ制御部１３と呼び、ＫＥＹ制御部１４Ａ、１４ＢをＫＥＹ制御部１４と呼び、ＫＥＹリクエスト処理部１５Ａ、１５ＢをＫＥＹリクエスト処理部１５と呼ぶことにする。 Further, for example, the CPU 0 includes an arithmetic core 11A, a memory controller 12A, a cache control unit 13A, a KEY control unit 14A, and a KEY request processing unit 15A. Other CPUs, for example, CPU 1, similarly to CPU 0, have an arithmetic core 11 B, a memory controller 12 B, a cache control unit 13 B, a KEY control unit 14 B, and a KEY request processing unit 15 B. Hereinafter, the configuration and operation of each CPU will be exemplified with the CPU 0 as an example. When collectively referred to, CPU0 and CPU1 are referred to as CPU, arithmetic cores 11A and 11B are referred to as arithmetic core 11, and memory controllers 12A and 12B are referred to as memory controller 12. Similarly, when collectively referring to, the cache control unit 13A.
, 13B are called the cache control unit 13, the KEY control units 14A, 14B are called the KEY control unit 14, and the KEY request processing units 15A, 15B are called the KEY request processing unit 15.

演算コア１１Ａは、主記憶２Ａに実行可能に展開されたコンピュータプログラムの命令を実行することにより、メモリコントローラ１２Ａを介して、ＣＰＵ０が管理する主記憶２ＡあるいはＣＰＵ１等が管理する主記憶２Ｂ等に格納されたデータを処理する。なお、演算コア１１Ａは、ＣＰＵ１等が管理する主記憶２Ｂ等に格納されたデータを処理する場合には、適切なタイミングで伝送路３を介して、ＣＰＵ１から処理対象のデータを取得し、適切なタイミングで、ＣＰＵ１へ処理後のデータを引き渡す。 The arithmetic core 11A executes the instructions of the computer program that is executed in the main memory 2A so as to be executed, and the memory core 12A manages the main memory 2A managed by the CPU 0 or the main memory 2B managed by the CPU 1 or the like via the memory controller 12A. Process the stored data. In addition, when processing data stored in the main memory 2B and the like managed by the CPU 1 and the like, the arithmetic core 11A acquires data to be processed from the CPU 1 via the transmission path 3 at an appropriate timing. The processed data is delivered to the CPU 1 at a proper timing.

メモリコントローラ１２Ａは、主記憶２Ａのデータを管理する。例えば、メモリコントローラ１２Ａは、ＣＰＵ０の要求に応じて、主記憶２Ａのデータを取得し、ＣＰＵ０に引き渡す。 The memory controller 12A manages data in the main memory 2A. For example, the memory controller 12A acquires the data in the main memory 2A in response to a request from the CPU 0 and transfers it to the CPU 0.

キャッシュ制御部１３Ａは、図示しないキャッシュメモリへのデータの格納、読み出し等を実行する。また、キャッシュ制御部１３Ａは、キャッシュＴＡＧ情報を保持し、演算コア１１Ａから要求されたデータの管理先を判定する。例えば、要求されたデータがＣＰＵ１で管理されている場合には、キャッシュ制御部１３Ａは、ＣＰＵ１のキャッシュ制御部１３Ｂを介して、ＣＰＵ１が管理する主記憶２Ｂのデータを取得する。また、キャッシュ制御部１３Ａは、ＣＰＵ１のキャッシュ制御部１３Ｂの要求に応じて、ＣＰＵ０が管理する主記憶２Ａに格納されたデータをキャッシュ制御部１３Ｂに引き渡す。さらに、キャッシュ制御部１３Ａは、適切なタイミングでキャッシュ制御部１３Ｂを介して、ＣＰＵ１で処理されたデータを受け取り、ＣＰＵ０が管理する主記憶２Ａに格納する。ＣＰＵ１のキャッシュ制御部１３Ｂの制御もキャッシュ制御部１３Ａの制御と同様である。また、ＣＰＵの数が３以上の場合も、キャッシュ制御部１３の制御は、キャッシュ制御部１３Ａと同様である。 The cache control unit 13A executes storage, reading, and the like of data in a cache memory (not shown). Further, the cache control unit 13A holds cache TAG information and determines a management destination of data requested from the arithmetic core 11A. For example, when the requested data is managed by the CPU 1, the cache control unit 13A acquires data in the main memory 2B managed by the CPU 1 via the cache control unit 13B of the CPU 1. In addition, the cache control unit 13A delivers the data stored in the main memory 2A managed by the CPU 0 to the cache control unit 13B in response to a request from the cache control unit 13B of the CPU 1. Further, the cache control unit 13A receives the data processed by the CPU 1 via the cache control unit 13B at an appropriate timing, and stores it in the main memory 2A managed by the CPU 0. The control of the cache control unit 13B of the CPU 1 is the same as the control of the cache control unit 13A. Further, when the number of CPUs is 3 or more, the control of the cache control unit 13 is the same as that of the cache control unit 13A.

ＫＥＹ制御部１４Ａは、ＣＰＵ０の演算コア１１Ａ、キャッシュ制御部１３Ａ、あるいは他のＣＰＵからのリクエストにしたがって、主記憶２ＡへのＫＥＹデータの設定、および主記憶２ＡからのＫＥＹデータの読み出しを実行する。なお、ＫＥＹデータ等の管理データが主記憶に書き込まれる場合には、管理データを書き込むための書き込み要求が実際に完了したことを示す応答が返されない突き放し型と呼ばれる書き込み命令（posted write命令ともいう）が用いられる。ＫＥＹ制御部１４Ａ、１４Ｂが管理データ制御部の一例である。 The KEY control unit 14A executes setting of the KEY data in the main memory 2A and reading of the KEY data from the main memory 2A according to a request from the arithmetic core 11A of the CPU 0, the cache control unit 13A, or another CPU. . When management data such as KEY data is written to the main memory, a write command (also referred to as a posted write command) that does not return a response indicating that the write request for writing the management data is actually completed is returned. ) Is used. The KEY control units 14A and 14B are an example of the management data control unit.

＜ＫＥＹリクエスト処理部の構成＞
本実施例１では、各ＣＰＵチップは、ＫＥＹリクエスト処理部というモジュールを有する。図４に、ＣＰＵ０のＫＥＹリクエスト処理部１５Ａの構成を例示する。ＫＥＹリクエスト処理部１５Ａは、ＫＥＹデータの設定または読み出し等の要求（以下、リクエストという）を受け付けるポート部１５１と、どのリクエストを受理するかを決定するプライオリティ部１５２と、選ばれたリクエストをＫＥＹ制御部１４Ａへと発行するための出力部である出力バッファ１５３を有する。 <Configuration of KEY request processing unit>
In the first embodiment, each CPU chip has a module called a KEY request processing unit. FIG. 4 illustrates the configuration of the KEY request processing unit 15A of the CPU0. The KEY request processing unit 15A includes a port unit 151 that receives a request for setting or reading KEY data (hereinafter referred to as a request), a priority unit 152 that determines which request is to be received, and a KEY control for the selected request. An output buffer 153 serving as an output unit for issuing to the unit 14A.

ポート部１５１と出力バッファ１５３はＦＩＦＯ（ＦｉｒｓｔＩｎＦｉｒｓｔＯｕｔ）によってリクエストを保持する。したがって、ポート部１５１と出力バッファ１５３においては、後続リクエストによる先行リクエストの追い越しは起こらない。ところで、リクエストには、自チップ内のコアから発生するローカルリクエストと他チップのコアが発行するリモートリクエストが存在する。これらのリクエストはＣＰＵチップ間接続インターフェースおよび図３の伝送路３によってＣＰＵ間でやり取りされる。さらに、リク
エストは２つのグループに分類される。２つの分類は、ＭＯ系リクエスト（Ｍｏｖｅ−Ｏｕｔ：ＫＥＹ書き込み系要求）とＭＩ系リクエスト（Ｍｏｖｅ−Ｉｎ：ＫＥＹ参照系要求）と呼ばれる。ＭＯ系リクエストが、設定要求の一例である。また、ＭＩ系リクエストが読み出し要求の一例である。 The port unit 151 and the output buffer 153 hold requests by FIFO (First In First Out). Therefore, the port unit 151 and the output buffer 153 do not pass the preceding request by the subsequent request. By the way, the request includes a local request generated from a core in its own chip and a remote request issued by a core of another chip. These requests are exchanged between the CPUs via the inter-CPU chip connection interface and the transmission path 3 in FIG. Furthermore, requests are classified into two groups. The two classifications are called MO request (Move-Out: KEY write request) and MI request (Move-In: KEY reference request). An MO system request is an example of a setting request. An MI-type request is an example of a read request.

したがって、ポート部１５１は、ＭＯ系のリクエストを記憶するローカルＭＯ用のポート（ＬＭＯＰＴ）及リモートＭＯ用のポート（ＲＭＯＰＴ）と、ＭＩ系のリクエストを記憶するローカルＭＩ用のポート（ＬＭＩＰＴ）およびリモートＭＩ用のポート（ＲＭＩＰＴ）とを有する。なお、ローカルＭＯ用のポート（ＬＭＯＰＴ）とローカルＭＩ用のポート（ＬＭＩＰＴ）とは、自装置、つまりＣＰＵ０の演算コア１１Ａあるいはキャッシュ制御部１３Ａで発生したリクエストを記憶する。一方、リモートＭＯ用のポート（ＲＭＯＰＴ）とリモートＭＩ用のポート（ＲＭＩＰＴ）とは、ＣＰＵ０以外の他のＣＰＵ（ＣＰＵ１等）で発生したリクエストを記憶する。ＭＯ系のリクエストを記憶するローカルＭＯ用のポート（ＬＭＯＰＴ）及リモートＭＯ用のポート（ＲＭＯＰＴ）が、第１の先入れ先出し記憶部の一例である。また、ＭＩ系のリクエストを記憶するローカルＭＩ用のポート（ＬＭＩＰＴ）およびリモートＭＩ用のポート（ＲＭＩＰＴ）が、第２の先入れ先出し記憶部の一例である。 Therefore, the port unit 151 includes a local MO port (LMOT) and a remote MO port (RMOTT) for storing MO requests, a local MI port (LMIP) and a remote port for storing MI requests. And a port for MI (RMIPT). The local MO port (LMOTT) and the local MI port (LMIPT) store requests generated by the own device, that is, the arithmetic core 11A of the CPU 0 or the cache control unit 13A. On the other hand, the remote MO port (RMOTT) and the remote MI port (RMIPT) store requests generated by CPUs other than CPU0 (CPU1 and the like). A local MO port (LMOTT) and a remote MO port (RMOTT) that store MO-related requests are examples of the first first-in first-out storage unit. A local MI port (LMIPT) and a remote MI port (RMIPT) that store MI-related requests are examples of the second first-in first-out storage unit.

図４において、プライオリティ部１５２は２段の制御回路を有する。プライオリティ部１５２の第１段階の制御回路１５２１、１５２２は、ＭＯ系グループとＭＩ系グループに分けてリクエストを受ける。例えば、制御回路１５２１には、ローカルのＭＯとリモートのＭＯが入力される。また、制御回路１５２２には、ローカルのＭＩとリモートのＭＩが入力される。制御回路１５２１、１５２２のそれぞれが、各グループ内で優先されるポートを、例えば、ＬＲＵ（ＬｅａｓｔＲｅｃｅｎｔｌｙＵｓｅｄ）の手順で決定する。つまり、制御回路１５２１、１５２２は、リクエストを保持しているポートの中で最も長い間選択されていなかったポートが選択されるような論理で動作する。制御回路１５２１、１５２２は、ＬＲＵによりポート間で偏り無くＫＥＹ制御部にリクエストを発行することを可能にしている。ただし、実施例１を含むいずれの実施形態においても、第１段階の制御回路のポート決定手順がＬＲＵに限定される訳ではない。 In FIG. 4, the priority unit 152 has a two-stage control circuit. The control circuits 1521 and 1522 in the first stage of the priority unit 152 receive the request separately for the MO group and the MI group. For example, the control circuit 1521 receives a local MO and a remote MO. The control circuit 1522 receives a local MI and a remote MI. Each of the control circuits 1521 and 1522 determines a priority port in each group, for example, according to a procedure of LRU (Least Recently Used). That is, the control circuits 1521 and 1522 operate with a logic such that the port that has not been selected for the longest time among the ports holding the request is selected. The control circuits 1521 and 1522 make it possible to issue a request to the KEY control unit without deviation between ports by LRU. However, in any embodiment including the first embodiment, the port determination procedure of the first-stage control circuit is not limited to the LRU.

次に、第２段階の制御回路１５２３において、ＭＯ系とＭＩ系のプライオリティ選択が行われる。第２段階の制御回路１５２３は、ＭＯ系がＭＩ系より優先されて選択されるように制御する。以上のようなプライオリティ部１５２の構成により、ＫＥＹデータの更新系要求（ＭＯ系リクエスト）とＫＥＹの参照系要求（ＭＩ系リクエスト）との間の順序が保証されている。第１段階の制御回路１５２１、１５２２は、例えば、論理ゲートと、ＬＲＵのための処理の履歴を管理するカウンタ、フラグ等で実現される。また、第２段階の制御回路１５２３は、論理ゲート等で実現される。ＫＥＹリクエスト処理部１５が要求処理部の一例である。 Next, the second-stage control circuit 1523 performs priority selection between the MO system and the MI system. The second-stage control circuit 1523 performs control so that the MO system is selected with priority over the MI system. With the configuration of the priority unit 152 as described above, the order between the KEY data update system request (MO system request) and the KEY reference system request (MI system request) is guaranteed. The first-stage control circuits 1521 and 1522 are implemented by, for example, logic gates, counters that manage processing histories for LRUs, flags, and the like. The second-stage control circuit 1523 is realized by a logic gate or the like. The KEY request processing unit 15 is an example of a request processing unit.

＜比較例のシーケンス＞
図５、図６に比較例に係る演算処理装置の制御シーケンスチャートを例示する。図５はＣＰＵ０が行ったストア（ＳＴ）命令の操作により発行されたＳＥＴ−ＲＣ命令がＣＰＵ０の外部インターフェース部内の送信バッファ部に残っている状態でＣＰＵ１からアドレス（Ａ）に対してデータのロード（ＬＤ）命令が発行された場合の動作を例示している。図のように要求されたデータの応答はＣＰＵ０内の外部インターフェース部で止められ、ＳＥＴ−ＲＣ（Ａ）が処理されＳＣへ発行されるまでロード（ＬＤ）命令が先に処理されることはない。 <Sequence of Comparative Example>
5 and 6 illustrate control sequence charts of the arithmetic processing device according to the comparative example. FIG. 5 shows that the CPU 1 loads data to the address (A) in a state where the SET-RC command issued by the store (ST) command operation performed by the CPU 0 remains in the transmission buffer unit in the external interface unit of the CPU 0. The operation when a (LD) instruction is issued is illustrated. As shown in the figure, the response of the requested data is stopped by the external interface unit in the CPU 0, and the load (LD) instruction is not processed first until the SET-RC (A) is processed and issued to the SC. .

図６は図５の状態の続きでＳＥＴ−ＲＣ（Ａ）が処理された後のシーケンスを例示している。ＣＰＵ０の外部インターフェース部でロード（ＬＤ）命令の応答をＳＣへ返すこと
ができ、要求データは要求元であるＣＰＵ１へと応答される。応答データの到着を期にＣＰＵ１上でＬＤ（Ａ）命令を発行したプログラムの後続命令であるＩＳＫ（Ａ）命令が発行される。ＩＳＫ（Ａ）命令はメインストレージ上にあるＫＥＹデータを読み出してＣＰＵのレジスタにセットする動作を行う。比較例のシーケンスでは、ＩＳＫ（Ａ）命令が発行されるとき、既にＳＥＴ−ＲＣ（Ａ）は処理された後なので、ＫＥＹデータが突き放し型の書き込み命令（posted write命令）で設定される場合でも、ＫＥＹデータの順序を保証した動作が可能となる。 FIG. 6 illustrates a sequence after SET-RC (A) is processed following the state of FIG. A response of a load (LD) command can be returned to the SC in the external interface unit of the CPU 0, and the request data is returned to the CPU 1 that is the request source. An ISK (A) instruction, which is a subsequent instruction of the program that issued the LD (A) instruction on the CPU 1 at the arrival of the response data, is issued. The ISK (A) instruction reads out the KEY data on the main storage and sets it in the CPU register. In the sequence of the comparative example, when the ISK (A) instruction is issued, since the SET-RC (A) has already been processed, even when the KEY data is set by a write-off type write instruction (posted write instruction) Thus, an operation that guarantees the order of the KEY data becomes possible.

＜実施例１のシーケンス＞
図７に実施例１の方式を適用したシーケンスチャートを例示する。上記比較例での処理と同様に、ＣＰＵ０がストア（ＳＴ）命令を実行することによってＳＥＴ−ＲＣ（Ａ）命令がＫＥＹリクエスト処理部１５へ送られる。このとき、ＳＥＴ−ＲＣ（Ａ）命令の処理を担当するＫＥＹリクエスト処理部１５はＣＰＵ０側にあってもよいし、ＣＰＵ１側にあってもよい。例えば、自ＣＰＵが管理する主記憶の部分に含まれないアドレスへのＫＥＹリクエスト処理部へはチップ間インターコネクトを使って、ＳＥＴ−ＲＣ（Ａ）命令のアドレス（Ａ）を管理するＣＰＵにリクエストが送出される。なお、主記憶の部分、例えば、主記憶２Ａを管理するＣＰＵと、主記憶２ＡのＫＥＹを管理するＣＰＵが一致しなくてもよい。例えば、ＣＰＵ０が主記憶２Ａを管理し、ＣＰＵ１が主記憶２ＡのＫＥＹデータを管理してもよい。 <Sequence of Example 1>
FIG. 7 illustrates a sequence chart to which the method of the first embodiment is applied. Similar to the process in the comparative example, the CPU 0 executes the store (ST) instruction, so that the SET-RC (A) instruction is sent to the KEY request processing unit 15. At this time, the KEY request processing unit 15 responsible for processing the SET-RC (A) instruction may be on the CPU0 side or on the CPU1 side. For example, a request is sent to the CPU that manages the address (A) of the SET-RC (A) instruction using the inter-chip interconnect to the KEY request processing unit for addresses not included in the main memory portion managed by the CPU. Sent out. Note that the main memory portion, for example, the CPU that manages the main memory 2A and the CPU that manages the KEY of the main memory 2A do not have to match. For example, the CPU 0 may manage the main memory 2A, and the CPU 1 may manage the KEY data in the main memory 2A.

図７では（Ａ）というアドレスを担当しているのがＣＰＵ０のＫＥＹリクエスト処理部１５Ａだった場合を表している。この時点では、リクエストは、ＫＥＹリクエスト処理部１５Ａのポート部１５１にセットされているが、実際のＳＥＴ−ＲＣの操作は実行されていない。ここでは仮に他のＫＥＹリクエスト処理が集中してこのＳＥＴ−ＲＣ（Ａ）がポート部１５１内に滞留しているものと仮定する。このとき、ＣＰＵ１側からロード（ＬＤ（Ａ））命令が発行されると、図７の処理は図５、６の処理とは異なり、ＣＰＵ０は、ＳＥＴ−ＲＣ（Ａ）の処理を待たずに、ロード（ＬＤ（Ａ））命令の応答を要求元のＣＰＵ１に返すことができる。ＣＰＵ１上でロード（ＬＤ（Ａ））命令を発行したプログラムはロード（ＬＤ（Ａ））命令の応答を受け取ったのを期にＩＳＫ（Ａ）命令を発行する。ＩＳＫ（Ａ）命令はアドレス（Ａ）を担当するＣＰＵ０のＫＥＹリクエスト処理部１５Ａへチップ間インターコネクトである伝送路３を経由して通知される。 FIG. 7 shows a case where the key request processing unit 15A of the CPU 0 is in charge of the address (A). At this time, the request is set in the port unit 151 of the KEY request processing unit 15A, but the actual SET-RC operation is not executed. Here, it is assumed that other KEY request processes are concentrated and this SET-RC (A) stays in the port unit 151. At this time, when a load (LD (A)) instruction is issued from the CPU 1 side, the process of FIG. 7 is different from the processes of FIGS. 5 and 6, and the CPU 0 does not wait for the process of SET-RC (A). , A response to the load (LD (A)) command can be returned to the requesting CPU 1. The program that has issued a load (LD (A)) instruction on the CPU 1 issues an ISK (A) instruction upon receipt of a response to the load (LD (A)) instruction. The ISK (A) instruction is notified to the KEY request processing unit 15A of the CPU 0 in charge of the address (A) via the transmission line 3 which is an interchip interconnect.

したがって、図７の例では、ＫＥＹリクエスト処理部１５ＡにはＳＥＴ−ＲＣ（Ａ）とＩＳＫ（Ａ）がポートに同時に存在することになる。しかし、実施例１の優先順序を制御するプライオリティ部１５２の制御回路１５２１、１５２２、および１５２３によりＭＩ系リクエストがＭＯ系リクエストを追い抜くことはできない。このため、図７の例では、ＳＥＴ−ＲＣ（Ａ）が先に処理される。Ｍ０系リクエストに対応するＳＥＴ−ＲＣ（Ａ）命令の処理順序はＫＥＹリクエスト処理部１５Ａのポート部１５１にセットされた時点でＭＩ系リクエストより先に処理されることが確定する。このため、ＳＥＴ−ＲＣ（Ａ）命令を発行したＣＰＵ０はＫＥＹリクエスト処理部１５Ａのポート部１５１にセットした時点で、ＳＥＴ−ＲＣ（Ａ）命令の処理完了を待つことなく、後続の処理、例えば、ＣＰＵ１からのロード（ＬＤ（Ａ））命令に対応する応答をＣＰＵ１に返す処理を続けることができる。プライオリティ部１５２は、選択回路の一例である。 Therefore, in the example of FIG. 7, SET-RC (A) and ISK (A) exist simultaneously in the port in the KEY request processing unit 15A. However, the MI system request cannot pass the MO system request by the control circuits 1521, 1522, and 1523 of the priority unit 152 that controls the priority order of the first embodiment. For this reason, in the example of FIG. 7, SET-RC (A) is processed first. It is determined that the processing order of the SET-RC (A) instruction corresponding to the M0 system request is processed before the MI system request when it is set in the port unit 151 of the KEY request processing unit 15A. For this reason, the CPU 0 that issued the SET-RC (A) command sets the port 151 of the KEY request processing unit 15A without waiting for the completion of the processing of the SET-RC (A) command. The process of returning a response corresponding to the load (LD (A)) command from the CPU 1 to the CPU 1 can be continued. The priority unit 152 is an example of a selection circuit.

＜実施例１の効果＞
実施例１のリクエスト処理部１５によれば、ＫＥＹリクエスト処理部１５のポート部１５にリクエストがセットされた時点で、処理順序を守って、リクエストが処理されることが保証できる。したがって、例えば、ＣＰＵは、ＫＥＹデータを突き放し型の書き込み命令（posted write命令）で設定する場合でも、リクエストをポート部１５１にセットした時点で、リクエストの処理完了とみなして次のリクエストの発行、処理の続行等が可能と
なる。 <Effect of Example 1>
According to the request processing unit 15 of the first embodiment, when a request is set in the port unit 15 of the KEY request processing unit 15, it can be guaranteed that the request is processed in accordance with the processing order. Therefore, for example, even when the CPU sets the KEY data with a write-type write command (posted write command), when the request is set in the port unit 151, it is considered that the request has been processed, and the next request is issued. The processing can be continued.

また、図３に例示したように、実施例１の演算処理装置１０は、ＳＣの機能をＣＰＵに内蔵し、ＣＰＵ外部のＳＣを無くした。このため、ＬＳＩ間通信が減りレイテンシが削減される。また部品点数が削減される。 Further, as illustrated in FIG. 3, the arithmetic processing unit 10 according to the first embodiment incorporates the SC function in the CPU and eliminates the SC outside the CPU. For this reason, communication between LSIs is reduced and latency is reduced. Also, the number of parts is reduced.

比較例の方式ではＣＰＵ０でストア（ＳＴ）された後のデータをＣＰＵ１がロード（ＬＤ）したものをＣＰＵ外部インターフェース部で順序を保証していたが、実施例１の方式では後続のＫＥＹデータの参照要求はＫＥＹデータ書き込み要求の後に処理されることが保証される。このため、例えば、図７に例示したように、実施例１のＣＰＵは、少なくとも自身が管理する主記憶の部分のデータの読み出し要求に対しては、読み出し要求対象のデータに対応するＫＥＹデータの書き込みと無関係に、読み出し要求対象のデータを応答可能である。 In the method of the comparative example, the order after the data stored by the CPU 0 (ST) is loaded (LD) by the CPU 1 is guaranteed by the CPU external interface unit. However, in the method of the first embodiment, the subsequent KEY data The reference request is guaranteed to be processed after the KEY data write request. Therefore, for example, as illustrated in FIG. 7, the CPU of the first embodiment at least in response to a read request for data in the main memory managed by the CPU of the KEY data corresponding to the read request target data. Regardless of the writing, it is possible to respond to the read request target data.

［実施例２］
図８を参照して、実施例２にかかる演算処理装置のＫＥＹリクエスト処理部１５Ｃを説明する。上記実施例１では、ポート部１５１に入力されたリクエストをプライオリティ部１５２により、ＭＩ系リクエストよりＭ０系リクエストを優先して処理するＫＥＹリクエスト処理部１５について説明した。実施例１のポート部１５１は、図４のように、ローカルＭＯ用のポート、リモートのＭＯ用のポート、ローカルのＭＩ用のポート、およびリモートＭＩ用のポートを有している。実施例２では、実施例１よりもさらに複雑なポート部を有するＫＥＹリクエスト処理部を例示する。実施例２の演算処理装置の他の構成は、実施例１の演算処理装置１０と同様である。そこで、同一の構成要素については、同一の符号を付してその説明を省略する。 [Example 2]
With reference to FIG. 8, a KEY request processing unit 15C of the arithmetic processing apparatus according to the second embodiment will be described. In the first embodiment, the KEY request processing unit 15 has been described in which the request input to the port unit 151 is processed by the priority unit 152 with priority given to the M0 system request over the MI system request. As shown in FIG. 4, the port unit 151 according to the first embodiment includes a local MO port, a remote MO port, a local MI port, and a remote MI port. In the second embodiment, a KEY request processing unit having a more complicated port unit than that in the first embodiment is illustrated. Other configurations of the arithmetic processing apparatus of the second embodiment are the same as those of the arithmetic processing apparatus 10 of the first embodiment. Therefore, the same components are denoted by the same reference numerals and the description thereof is omitted.

図８は、実施例２に係るキーリクエスト処理部１５Ｃの構成を例示する。実施例２のリクエスト処理部１５Ｃも、実施例１の場合と同様、ポート部１５１Ｃ、プライオリティ部１５２Ｃ、出力バッファ１５３Ｃ、１５３Ｄを有する。 FIG. 8 illustrates the configuration of the key request processing unit 15C according to the second embodiment. Similarly to the case of the first embodiment, the request processing unit 15C of the second embodiment also includes a port unit 151C, a priority unit 152C, and output buffers 153C and 153D.

図８のように、実施例２のキーリクエスト処理部１５Ｃでは、実施例１のキーリクエスト処理部１５と比較して、ポート部１５１Ｃの構成が複雑となっている。すなわち、ポート部１５１Ｃは、ローカルのＭＯ／ＭＩ用のポート（ＬＭＯＰＴ、ＬＭＩＰＴ）、リモートのＭＯ／ＭＩ用のポート（ＲＭＯＰＴ、ＲＭＩＰＴ）に加えて、ＬＫＲＰＴが４つ追加されている（ＬＫＲＰＴ００、ＬＫＲＰＴ０１、ＬＫＲＰＴ１０、ＬＫＲＰＴ１１）。実施例１で説明したように、ＬＭＯＰＴ、ＬＭＩＰＴ、ＲＭＯＰＴ、ＲＭＩＰＴには、それぞれ、ローカルのＣＰＵのコアから発行されるＭＯ系リクエスト、ＭＩ系リクエスト、リモートのＣＰＵのコアから発行されるＭＯ系リクエスト、ＭＩ系リクエストが入力される。 As shown in FIG. 8, in the key request processing unit 15C of the second embodiment, the configuration of the port unit 151C is more complicated than that of the key request processing unit 15 of the first embodiment. That is, in the port unit 151C, four LLKRPs (LKRPT00, LKRPT01) are added in addition to the local MO / MI ports (LMOTT, LMIPT) and remote MO / MI ports (RMOTT, RMPT). , LKRPT10, LKRPT11). As described in the first embodiment, each of the LMOTT, LMIPT, RMOTT, and RMIPT includes an MO system request issued from the local CPU core, an MI system request, and an MO system request issued from the remote CPU core. , An MI request is input.

また、図８では、Ｌ２キャッシュは、４つにアドレスインターリーブされ、ＳＸ００、ＳＸ０１、ＳＸ１０、ＳＸ１１で例示され、ＫＥＹリクエスト処理部１５Ｃに接続されている。上記４つのポートＬＫＲＰＴ００、ＬＫＲＰＴ０１、ＬＫＲＰＴ１０、ＬＫＲＰＴ１１には、Ｌ２キャッシュから発行されるメモリアクセスに付随するＫＥＹアクセスの要求が入力される。すなわち、ＣＰＵからの主記憶へのアクセス要求は、インターリーブされたＬ２キャッシュを通じて、各ポートにセットされる（ＬＫＲＰＴ００／０１／１０／１１）。ローカルのＭＯ用のポート（ＬＭＯＰＴ）、リモートのＭＯの用ポート（ＲＭＯＰＴ）が第１の先入れ先出し記憶部の一例である。また、ローカルのＭＩ用のポート（ＬＭＩＰＴ）、リモートのＭＩ用のポート（ＲＭＩＰＴ）、および４つのポートＬＫＲＰＴが、第２の先入れ先出し記憶部の一例である。また、キーリクエスト処理部１５Ｃが、要求処理部の一例である。 In FIG. 8, the L2 cache is interleaved into four addresses, exemplified by SX00, SX01, SX10, and SX11, and connected to the KEY request processing unit 15C. A key access request accompanying a memory access issued from the L2 cache is input to the four ports LKRPT00, LKRPT01, LKRPT10, and LKRPT11. That is, the access request to the main memory from the CPU is set to each port through the interleaved L2 cache (LKRPT00 / 01/10/11). The local MO port (LMOTT) and the remote MO port (RMOPT) are examples of the first first-in first-out storage unit. Further, the local MI port (LMIPT), the remote MI port (RMIPT), and the four ports LKRPT are examples of the second first-in first-out storage unit. The key request processing unit 15C is an example of a request processing unit.

図８では、さらに、リモートのＬＳＩ（ＬａｒｇｅＳｃａｌｅＩｎｔｅｇｒａｔｅｄｃｉｒｃｕｉｔ）のルータ（ＲＴ）がＫＥＹリクエスト処理部１５Ｃに接続されている。ＭＯ系ＫＥＹリクエストは、ＲＴ（ルータ）受信部となるＲＭＯＰＴにおいて、後続のパケットに追い越されないことが保証される。ＲＭＯＰＴにおいて、パケットの追い越しが発生しないことが、データとＫＥＹの順序性保証のための要件となる。そのため、送信側ＬＳＩのルータ（ＲＴ）から受信側ＬＳＩのＲＭＯＰＴに受信されるパケットがＲＭＯＰＴのビジーによりＲＴ内で待たされることがない制御が求められる。 In FIG. 8, a router (RT) of a remote LSI (Large Scale Integrated circuit) is further connected to the KEY request processing unit 15C. The MO system KEY request is guaranteed not to be overtaken by the subsequent packet in the RMOTT serving as an RT (router) receiver. In RMOT, no overtaking of packets is a requirement for guaranteeing the order of data and KEY. Therefore, control packets received RMOPT of the receiving LSI is not to be kept waiting in the RT by busy RMOPT from the transmission side LSI router (RT) is determined.

そこで、例えば、ＲＭＯＰＴは到着する全てのリクエストを受けきるように制御される。到着する全てのリクエストを受けきるようにするために、送信側のリクエスト発行部（例えば、Ｌ２キャッシュ制御部）はリクエスト送出からＣＰＬＴ（Ｃｏｍｐｌｅｔｅ、完了応答）受信までクレジット管理を行う。一方、受信側のキーリクエスト処理部１５Ｃは、クレジット数と同じ数だけのエントリ数のＲＭＯＰＴを持つようにすればよい。 Therefore, for example, RMOT is controlled to receive all incoming requests. In order to receive all incoming requests, a request issuing unit (for example, L2 cache control unit) on the transmission side performs credit management from request transmission to CPLT (Complete response) reception. On the other hand, the key request processing unit 15C on the receiving side may have RMOT with the same number of entries as the number of credits.

実施例２のプライオリティ部１５２Ｃは、実施例１と同様に、３つの制御回路１５２１Ｃ、１５２２Ｃ、１５２３Ｃを有する。制御回路１５２１Ｃには、ポート部１５１Ｃのうち、ＭＯ系のリクエストを保持するポートＬＭＯＰＴ、ＲＭＯＰＴが接続される。また、制御回路１５２２Ｃには、ＬＭＯＰＴ、ＲＭＯＰＴ以外のポートが接続される。そして、実施例１と同様、制御回路１５２１Ｃ、１５２２Ｃ内では、それぞれ、たとえば、ＬＲＵにしたがってリクエストが処理される。そして、制御回路１５２３Ｃは、制御回路１５２２Ｃからのリクエスト（ＭＩ系リクエストおよびＬ２キャッシュからのリクエスト）よりも、制御回路１５２１ＣからのＭＯ系のリクエストを優先して、出力バッファ１５３Ｃ、１５３Ｄに出力する。プライオリティ部１５２Ｃは、選択回路の一例である。 Similar to the first embodiment, the priority unit 152C according to the second embodiment includes three control circuits 1521C, 1522C, and 1523C. The control circuit 1521C, of the ports unit 151C, the port LMOPT for holding a request for M O system, RMOPT is connected. The control circuit 152 2 C is connected to ports other than LMOT and RMOT. In the control circuits 1521C and 152 2 C, as in the first embodiment, requests are processed according to, for example, LRU. Then, the control circuit 1523C gives priority to the MO system request from the control circuit 1521C over the request from the control circuit 1522C (MI system request and request from the L2 cache), and outputs it to the output buffers 153C and 153D. The priority unit 152C is an example of a selection circuit.

出力バッファ１５３Ｃ、１５３Ｄは、アドレスインターリーブされた主記憶に対応している。出力バッファ１５３Ｃ、１５３Ｄは、受けたリクエストをそれぞれアドレスインターリーブされた主記憶に対応するＫＥＹ制御部ＫＸ０、ＫＸ１に発行する。図中ＫＸ０／ＫＸ１とはＫＥＹ制御部であり、内部に最近使用したＫＥＹデータを保持できるキャッシュを備えてもよい。ＫＸ０／ＫＸ１は、例えば、メモリコントローラ（ＭＡＣ）に接続されており、ＭＡＣ経由で図示しない主記憶へと要求を発行する。 The output buffers 153C and 153D correspond to the address-interleaved main memory. The output buffers 153C and 153D issue the received requests to the KEY control units KX0 and KX1 corresponding to the address-interleaved main memory, respectively. In the figure, KX0 / KX1 is a KEY control unit, and may have a cache capable of holding recently used KEY data. KX0 / KX1 is connected to a memory controller (MAC), for example, and issues a request to a main memory (not shown) via the MAC.

以上述べたように、図８に示したキーリクエスト処理部１５Ｃは、ＭＯ系のリクエストを他のＫＥＹデータへのリクエストよりも優先して処理する。そのため、実施例２のＫＲＹリクエスト処理部１５Ｃを有する演算処理装置のＣＰＵは、ポート部１５１Ｃにリクエストをセットした時点で、リクエストの処理完了とみなして次のリクエストの発行、あるいは処理の続行等が可能となる。演算処理装置のＣＰＵは、例えば、ＫＥＹデータを突き放し型の書き込み命令（posted write命令）で設定する場合でも、ポート部１５１Ｃにリクエストをセットした時点で、リクエストの処理完了とみなして次のリクエストの発行、あるいは処理の続行等が可能となる。 As described above, the key request processing portion 15C shown in FIG. 8, the processing in preference to the request of the request of the M O system to another KEY data. Therefore, when the CPU of the arithmetic processing unit having the KRY request processing unit 15C according to the second embodiment sets a request in the port unit 151C, it is considered that the processing of the request is completed, and the next request is issued or the processing is continued. It becomes possible. For example, even when the CPU of the arithmetic processing unit sets the request data to the port unit 151C even if the KEY data is set with a write-type write command (posted write command), the processing of the next request is considered. Issuing or processing can be continued.

ＣＰＵ０、ＣＰＵ１ＣＰＵ
２Ａ、２Ｂ主記憶
３伝送路
１１Ａ、１１Ｂ演算コア
１２Ａ、１２Ｂメモリコントローラ
１３Ａ、１３Ｂキャッシュ制御部
１４Ａ、１４ＢＫＥＹ制御部
１５Ａ、１５ＢＫＥＹリクエスト処理部
１５１、１５１Ｃポート部
１５２、１５２Ｃプライオリティ部
１５３、１５３Ｃ出力バッファ CPU0, CPU1 CPU
2A, 2B Main memory 3 Transmission path 11A, 11B Operation core 12A, 12B Memory controller 13A, 13B Cache control unit 14A, 14B KEY control unit 15A, 15B KEY request processing unit 151, 151C Port unit 152, 152C Priority unit 153, 153C Output buffer

Claims

In the arithmetic processing device that manages the main storage device corresponding to its own device and can access the other main storage device managed by the other arithmetic processing device through the other arithmetic processing device,
A control unit that executes writing or reading of data stored in either the main storage device corresponding to the own device or the other main storage device;
Wherein for each of the storage area when the control unit of the control unit or the other processing unit has been written storage data in a storage area of the main storage equipment corresponding to the own apparatus of its own apparatus, at least the storage area A management data control unit configured to set management data for managing the frequency of writing to the main storage device corresponding to the own device and to read the set management data in response to a management data read request ; ,
Wherein when the control unit of the control unit or the other processing unit has performed a reading of the writing and the stored data stored in the storage data in a storage area of the main storage equipment corresponding to the own apparatus of its own apparatus, wherein of the read setting and the management data of the management data corresponding to the storage area write and read are such, in the management data controller in preference settings of the management data than the read of the management data And a request processing unit to be executed.

The request processing unit includes a first first-in first-out storage unit that stores a setting request for requesting the management data control unit to set the management data;
A second first-in first-out storage unit that stores a read request for requesting the management data control unit to read out the management data;
The arithmetic processing apparatus according to claim 1, further comprising: a selection circuit that preferentially selects a setting request from the first first-in first-out storage unit over a read request from the second first-in first-out storage unit.

The arithmetic processing unit according to claim 1, wherein the management data control unit sets the management data in the main storage device according to an instruction that does not return a response indicating that the write request is completed.

In the control method of the arithmetic processing unit capable of managing the main storage device corresponding to the own device and accessing the other main storage device managed by the other arithmetic processing device through the other arithmetic processing device,
And executing the writing or reading of data stored in either the main memory and the other main storage device corresponding to the own device,
When the own device or the other arithmetic processing device writes storage data to the storage area of the main storage device corresponding to the own device , at least the frequency of writing to the storage area is set for each storage area. A setting step for setting management data to be managed in a main storage device corresponding to the device;
A read step of reading the set management data from a main storage device corresponding to the own device in response to a management data read request;
Wherein when the own device or the other processing unit has performed a reading of the writing and the stored data stored in the storage data in a storage area of the main memory corresponding to the own device, the writing and reading Re name of It was among the set of management data and the management data read corresponding to the storage area, the processing unit having the processing steps that run in preference settings of the management data than the read of the management data Control method.

The processing step includes
Storing a setting request for requesting setting of the management data in the setting step in a first first-in first-out storage unit;
Storing a read request for requesting reading of the management data in the read step in a second first-in first-out storage unit;
5. The control method of the arithmetic processing unit according to claim 4, further comprising a step of selecting a setting request from the first first-in first-out storage unit in preference to a read request from the second first-in first-out storage unit.

It said setting step, the instruction to write request response indicating the completion is not returned, the control method of the arithmetic processing apparatus according to claim 4 or 5, have a step of writing the management data into the main storage device.

An arithmetic processing device that has a plurality of arithmetic processing units that manage the main storage devices corresponding to their own arithmetic processing units and that can access other main storage devices managed by other arithmetic processing units through the other arithmetic processing units. And each of the arithmetic processing units
A control unit that executes writing or reading of data stored in either the main storage device corresponding to the self-processing unit or the other main storage device;
When the control unit of the own calculation processing unit or the control unit of the other calculation processing unit writes the storage data to the storage area of the main storage device corresponding to the own calculation processing unit , for each storage area , Setting of management data for managing at least the frequency of writing to the storage area is performed in a main storage device corresponding to the self-processing unit, and reading of the set management data in response to a management data read request A management data control unit for performing
When the control unit of the own calculation processing unit or the control unit of the other calculation processing unit writes the storage data to the storage area of the main storage device corresponding to the own calculation processing unit and reads the stored data stored therein in the writing and reading such has been among the set of management data and the management data read corresponding to the storage area, said management data by giving priority to setting of the control data than the read of the management data An arithmetic processing device comprising: a request processing unit to be executed by the control unit .