JP6167193B1

JP6167193B1 - Processor

Info

Publication number: JP6167193B1
Application number: JP2016011234A
Authority: JP
Inventors: 茂幸高野
Original assignee: Dwango Co Ltd
Current assignee: Dwango Co Ltd
Priority date: 2016-01-25
Filing date: 2016-01-25
Publication date: 2017-07-19
Anticipated expiration: 2036-01-25
Also published as: JP2017134443A

Abstract

【課題】非数値計算処理を高速処理可能なプロセッサを提供する。【解決手段】プロセッサ１は、複数のプロセッシングエレメント１０−１〜１０−Ｎと実行管理部４０とを備える。各プロセッシングエレメントそれぞれは、割り当てられた単位命令列を一時記憶する一時記憶部１１を有し、それぞれの一時記憶部に記憶された単位命令列に含まれる命令を実行可能である。実行管理部は、アセンブリ言語以下のレベルの命令列であるプログラムを、途中に分岐命令を含まない命令列であって、分岐先の先頭命令を始端とし、分岐命令を終端とする命令列である単位命令列に分割し、分割した当該単位命令列を複数のプロセッシングエレメントのそれぞれに逐次割り当て、割り当てた単位命令列を複数のプロセッシングエレメントに並列実行させる。【選択図】図１A processor capable of high-speed processing of non-numerical calculation processing is provided. A processor 1 includes a plurality of processing elements 10-1 to 10-N and an execution management unit 40. Each processing element has a temporary storage unit 11 for temporarily storing an assigned unit instruction sequence, and can execute an instruction included in the unit instruction sequence stored in each temporary storage unit. The execution management unit is an instruction sequence that does not include a branch instruction in the middle of a program that is an instruction sequence of the assembly language or lower, and is an instruction sequence that starts with the first instruction at the branch destination and ends with the branch instruction. The unit instruction sequence is divided, the divided unit instruction sequence is sequentially assigned to each of the plurality of processing elements, and the assigned unit instruction sequence is executed in parallel by the plurality of processing elements. [Selection] Figure 1

Description

本発明は、プロセッサに関する。 The present invention relates to a processor.

近年、プログラムを命令レベルで並列処理してプロセッサの高速処理を実現する技術が知られている（例えば、非特許文献１を参照）。このような技術を使用した従来のプロセッサでは、より多くの命令を並列処理するために、例えば、コンパイラによりプログラムを最適化する静的最適化手法や、分岐予測及び投機的実行を行うなどの動的手法が行われている。 In recent years, a technique for realizing high-speed processing of a processor by processing a program in parallel at an instruction level is known (for example, see Non-Patent Document 1). In a conventional processor using such a technique, in order to process more instructions in parallel, for example, a static optimization method in which a program is optimized by a compiler, a branch prediction and speculative execution are performed. Method is used.

G. S. Sohi, S. Breach, and T. N. Vijaykumar , Multiscalar Processors, 22th International Symposium on Computer Architecture (ISCA-22), 1995G. S. Sohi, S. Breach, and T. N. Vijaykumar, Multiscalar Processors, 22th International Symposium on Computer Architecture (ISCA-22), 1995

しかしながら、上述した従来のプロセッサでは、例えば、プロセッサの一般的な応用である非数値計算処理に対して充分に高速処理が得られない場合があり、非数値計算処理に対して高速処理が可能なプロセッサが求められている。 However, with the conventional processor described above, for example, there may be cases where sufficiently high-speed processing cannot be obtained for non-numerical calculation processing, which is a general application of the processor, and high-speed processing is possible for non-numeric calculation processing A processor is sought.

本発明は、上記問題を解決すべくなされたもので、その目的は、非数値計算処理を高速処理することができるプロセッサを提供することにある。 The present invention has been made to solve the above problems, and an object of the present invention is to provide a processor capable of performing high-speed non-numeric calculation processing.

上記問題を解決するために、本発明の一態様は、それぞれが、割り当てられた単位命令列を一時記憶する一時記憶部を有し、それぞれの前記一時記憶部に記憶された前記単位命令列に含まれる命令を実行可能な複数のプロセッシングエレメントと、アセンブリ言語以下のレベルの命令列であるプログラムを、途中に分岐命令を含まない命令列であって、分岐先の先頭命令を始端とし、分岐命令を終端とする命令列である前記単位命令列に分割し、分割した当該単位命令列を前記複数のプロセッシングエレメントのそれぞれに逐次割り当て、割り当てた前記単位命令列を前記複数のプロセッシングエレメントに並列実行させる実行管理部と、前記プログラムにおける前記単位命令列の先頭命令の位置を示す先頭位置情報と、当該単位命令列が有効であるか否かを示す有効情報と、次に実行する前記プロセッシングエレメントを示す次実行情報とを対応付けた管理情報を複数有する管理テーブルを記憶する管理テーブル記憶部とを備え、前記実行管理部は、分割した前記単位命令列が前記複数のプロセッシングエレメントのいずれの前記一時記憶部にも記憶されていない場合に、当該単位命令列を前記複数のプロセッシングエレメントのいずれかに割り当て、分割した前記単位命令列が前記複数のプロセッシングエレメントのいずれかの前記一時記憶部に既に記憶されている場合に、当該プロセッシングエレメントに、当該一時記憶部に既に記憶されている前記単位命令列を並列実行させ、前記プログラムに分岐命令が検出された際に、前記管理テーブル記憶部が記憶する前記管理テーブルに基づいて、次に実行する前記単位命令列が前記複数のプロセッシングエレメントのいずれかの前記一時記憶部に既に記憶されており、且つ有効な状態であるか否かを判定することを特徴とするプロセッサである。 In order to solve the above problem, according to one aspect of the present invention, each has a temporary storage unit that temporarily stores an assigned unit instruction sequence, and the unit instruction sequence stored in each temporary storage unit includes A branch instruction that includes multiple processing elements that can execute the included instructions and a program that is an instruction sequence at a level lower than assembly language, and that does not include a branch instruction in the middle, starting from the first instruction at the branch destination Is divided into the unit instruction sequence that is an instruction sequence that ends with, and the divided unit instruction sequence is sequentially assigned to each of the plurality of processing elements, and the assigned unit instruction sequence is executed in parallel by the plurality of processing elements. an execution management unit, and the head position information indicating a position of the first instruction of the unit instruction sequence in the program, the unit instruction sequence Yes Comprising a whether a valid indicating information whether there are, then the management table storage unit that stores the management table in which a plurality having a management information associating the following execution information indicating the processing elements that perform, the execution management unit Is, when the divided unit instruction sequence is not stored in any of the temporary storage units of the plurality of processing elements, the unit instruction sequence is assigned to any of the plurality of processing elements, and the unit is divided When the instruction sequence is already stored in the temporary storage unit of any of the plurality of processing elements, the processing element is caused to execute the unit instruction sequence already stored in the temporary storage unit in parallel, The management table storage unit stores the management table when a branch instruction is detected in the program. Based on the Le, then the unit instruction sequence to be executed has already been stored in the temporary storage unit of any one of said plurality of processing elements, and wherein the determining whether and effective state Processor.

また、本発明の一態様は、上記のプロセッサにおいて、前記実行管理部は、前記プログラムに分岐命令が検出された際に、前記管理テーブル記憶部が記憶する前記管理テーブルから、前記次実行情報に基づいて当該分岐命令の次に実行する前記単位命令列に対応する前記先頭位置情報及び前記有効情報を取得し、取得した前記先頭位置情報の示す位置と、次に実行する前記プログラムの実行位置とが一致し、且つ、取得した前記有効情報が有効を示す場合に、前記次実行情報に一致する前記プロセッシングエレメントが、前記単位命令列を前記一時記憶部に記憶しており、且つ、有効な状態であると判定することを特徴とする。 In addition, according to one aspect of the present invention, in the processor, the execution management unit may change the next execution information from the management table stored in the management table storage unit when a branch instruction is detected in the program. Based on the start position information and the valid information corresponding to the unit instruction sequence to be executed next to the branch instruction, and the position indicated by the acquired start position information, the execution position of the program to be executed next, And when the acquired valid information indicates valid, the processing element that matches the next execution information stores the unit instruction sequence in the temporary storage unit and is in a valid state. It is determined that it is.

また、本発明の一態様は、上記のプロセッサにおいて、前記実行管理部は、取得した前記先頭位置情報の示す位置と、次に実行する前記プログラムの実行位置とが一致しない、又は取得した前記有効情報が無効を示す場合に、前記複数のプロセッシングエレメントの１つに次に実行する前記単位命令列を割り当てるとともに、割り当てた前記単位命令列に対応させて、前記管理テーブルを更新することを特徴とする。 Further, according to one aspect of the present invention, in the above processor, the execution management unit does not match a position indicated by the acquired head position information and an execution position of the program to be executed next, or acquires the effective When the information indicates invalidity, the unit instruction sequence to be executed next is assigned to one of the plurality of processing elements, and the management table is updated in correspondence with the assigned unit instruction sequence. To do.

また、本発明の一態様は、上記のプロセッサにおいて、前記実行管理部は、前記複数のプロセッシングエレメントの１つに次に実行する前記単位命令列を割り当てる場合に、前記管理テーブルの当該単位命令列に対応する前記先頭位置情報を記憶させるとともに、前記有効情報を、無効を示す値に更新し、前記プロセッシングエレメントが、当該単位命令列の実行を完了した場合に、当該単位命令列に対応する前記有効情報を、有効を示す値に更新することを特徴とする。 Further, according to one aspect of the present invention, in the above processor, when the execution management unit assigns the unit instruction sequence to be executed next to one of the plurality of processing elements, the unit instruction sequence of the management table And the valid information is updated to a value indicating invalidity, and when the processing element completes execution of the unit instruction sequence, the unit instruction sequence corresponding to the unit instruction sequence is stored. The valid information is updated to a value indicating validity.

また、本発明の一態様は、上記のプロセッサにおいて、前記プロセッシングエレメントは、パイプライン処理により前記命令を実行することを特徴とする。 One embodiment of the present invention is the processor described above, wherein the processing element executes the instruction by pipeline processing.

また、本発明の一態様は、上記のプロセッサにおいて、前記複数のプロセッシングエレメントのそれぞれは、前記命令の実行に利用されるレジスタを有し、さらに、前記複数のプロセッシングエレメントの間で、前記レジスタの書き込み後の読み出しによるデータ依存であるＲＡＷハザードが発生した場合に、最も近い過去に当該レジスタに書き込みを実行している前記プロセッシングエレメントによる前記単位命令列の当該レジスタへの書き込みが完了するまで、当該レジスタの読み出しを実行する前記プロセッシングエレメントの命令実行を保留させるレジスタ管理部を備えることを特徴とする。
また、本発明の一態様は、それぞれが、割り当てられた単位命令列を一時記憶する一時記憶部を有し、それぞれの前記一時記憶部に記憶された前記単位命令列に含まれる命令を実行可能な複数のプロセッシングエレメントと、アセンブリ言語以下のレベルの命令列であるプログラムを、途中に分岐命令を含まない命令列であって、分岐先の先頭命令を始端とし、分岐命令を終端とする命令列である前記単位命令列に分割し、分割した当該単位命令列を前記複数のプロセッシングエレメントのそれぞれに逐次割り当て、割り当てた前記単位命令列を前記複数のプロセッシングエレメントに並列実行させる実行管理部とを備え、前記複数のプロセッシングエレメントのそれぞれは、前記命令の実行に利用されるレジスタを有し、さらに、前記複数のプロセッシングエレメントの間で、前記レジスタの書き込み後の読み出しによるデータ依存であるＲＡＷハザードが発生した場合に、最も近い過去に当該レジスタに書き込みを実行している前記プロセッシングエレメントによる前記単位命令列の当該レジスタへの書き込みが完了するまで、当該レジスタの読み出しを実行する前記プロセッシングエレメントの命令実行を保留させるレジスタ管理部を備えることを特徴とするプロセッサである。 One embodiment of the present invention is the processor described above, wherein each of the plurality of processing elements includes a register used for execution of the instruction, and further, the register of the register is between the plurality of processing elements. When a RAW hazard that is data-dependent due to reading after writing occurs, until the writing of the unit instruction sequence to the register by the processing element that has performed writing to the register in the nearest past is completed A register management unit that suspends instruction execution of the processing element that executes reading of a register is provided.
Further, according to one aspect of the present invention, each has a temporary storage unit that temporarily stores an assigned unit instruction sequence, and can execute an instruction included in the unit instruction sequence stored in each temporary storage unit An instruction sequence that does not include a branch instruction in the middle of multiple processing elements and a program that is an instruction sequence of the assembly language or lower level. The instruction sequence starts with the first instruction at the branch destination and ends with the branch instruction. And an execution management unit that sequentially assigns the divided unit instruction sequence to each of the plurality of processing elements and causes the plurality of processing elements to execute the assigned unit instruction sequence in parallel. , Each of the plurality of processing elements has a register used for execution of the instruction, and When a RAW hazard that is data-dependent due to reading after writing to the register occurs between the processing elements, the register of the unit instruction sequence by the processing element that has performed writing to the register in the nearest past The processor further comprises a register management unit that suspends instruction execution of the processing element that executes reading of the register until writing to the register is completed.

また、本発明の一態様は、上記のプロセッサにおいて、前記プロセッシングエレメントを識別するＰＥ識別情報と、前記単位命令列の実行において、前記レジスタが更新されたことを示す更新情報と、前記単位命令列における前記レジスタの最後の書き込みが完了したことを示す完了情報と、前記レジスタの書き込みが実行されたことを示す実行情報とを対応付けた更新履歴情報を前記プロセッシングエレメントの個数分有する更新履歴テーブルを記憶する更新履歴記憶部を備え、前記レジスタ管理部は、前記更新履歴記憶部が記憶する更新履歴テーブルに基づいて、前記ＲＡＷハザードが発生したか否かを判定するとともに、前記レジスタに対して読み込みを行う前記プロセッシングエレメントにおいて、前記プロセッシングエレメント内のＲＡＷハザードが発生せず、かつ、前記レジスタに対して読み込みを行う前記プロセッシングエレメントと当該レジスタに対して最も近い過去に書き込みを行う前記プロセッシングエレメントとの間で、前記プロセッシングエレメント間のＲＡＷハザードが発生した場合には、最も近い過去に書き込みを行う前記プロセッシングエレメントについての当該書き込み対象のレジスタに関する前記完了情報に基づいて、前記レジスタに対して読み込みを行う前記プロセッシングエレメントの命令実行を保留するか否かを制御することを特徴とする。 Further, according to an aspect of the present invention, in the above processor, PE identification information for identifying the processing element, update information indicating that the register is updated in the execution of the unit instruction sequence, and the unit instruction sequence An update history table having update history information corresponding to the number of processing elements, which is associated with completion information indicating that the last write of the register in the memory is completed and execution information indicating that the write of the register is executed An update history storage unit for storing, and the register management unit determines whether or not the RAW hazard has occurred based on an update history table stored in the update history storage unit , and reads into the register In the processing element for performing the processing in the processing element No RAW hazard occurs, and a RAW hazard occurs between the processing elements between the processing element that reads the register and the processing element that writes to the register in the past. Whether or not to suspend instruction execution of the processing element that reads from the register based on the completion information regarding the register to be written for the processing element that writes to the nearest past It is characterized by controlling .

また、本発明の一態様は、上記のプロセッサにおいて、前記レジスタ管理部は、前記レジスタに対して読み込みを行う前記プロセッシングエレメントについての当該読み込み対象のレジスタに関する前記実行情報に基づいて、前記レジスタに対して読み込みを行う前記プロセッシングエレメントにおいて、前記プロセッシングエレメント内のＲＡＷハザードが発生したか否かを判定することを特徴とする。 Further, according to one aspect of the present invention, in the processor, the register management unit is configured to perform the processing on the register based on the execution information regarding the register to be read with respect to the processing element that reads the register. in the processing element to read Te, characterized that you determine whether RAW hazard within the processing element occurs.

また、本発明の一態様は、上記のプロセッサにおいて、前記複数のプロセッシングエレメントのそれぞれは、前記レジスタを複数備え、前記更新履歴記憶部は、前記複数のレジスタのそれぞれに対応する前記更新履歴テーブルを記憶することを特徴とする。 In one embodiment of the present invention, in the above processor, each of the plurality of processing elements includes a plurality of the registers, and the update history storage unit stores the update history table corresponding to each of the plurality of registers. It is memorized.

また、本発明の一態様は、上記のプロセッサにおいて、前記単位命令列に含まれる各命令に対応するローカル位置情報と、外部記憶部にアクセスした履歴を示すアクセス情報とを対応付けたストアバッファテーブルを、前記複数のプロセッシングエレメントの個数分記憶するストアバッファ記憶部と、前記プロセッシングエレメントから前記外部記憶部にアクセス要求があった場合に、前記ストアバッファ記憶部が記憶する当該プロセッシングエレメントに対応する前記ストアバッファテーブルに基づいて、前記外部記憶部とのアクセスを制御するストアバッファ制御部とを備えることを特徴とする。 Further, according to one aspect of the present invention, in the above processor, a store buffer table in which local position information corresponding to each instruction included in the unit instruction sequence is associated with access information indicating a history of accessing the external storage unit For each of the plurality of processing elements, and when there is an access request from the processing element to the external storage unit, the store buffer storage unit corresponds to the processing element stored in the store buffer storage unit. And a store buffer control unit that controls access to the external storage unit based on a store buffer table.

また、本発明の一態様は、上記のプロセッサにおいて、前記アクセス情報には、アクセスの種別を示す種別情報と、前記外部記憶部においてアクセスする記憶位置情報及びアクセスしたデータとが対応付けられており、前記ストアバッファ制御部は、前記外部記憶部からデータを読み出す前記アクセス要求に対して、前記ストアバッファテーブルに当該アクセス要求に対応する前記データが存在する場合に、当該データを前記プロセッシングエレメントに出力し、前記ストアバッファテーブルに当該アクセス要求に対応する前記データが存在しない場合に、前記外部記憶部から読み出した前記データを前記プロセッシングエレメントに出力することを特徴とする。 In one embodiment of the present invention, in the above processor, the access information is associated with type information indicating an access type, storage location information accessed in the external storage unit, and accessed data. The store buffer control unit outputs the data to the processing element when the data corresponding to the access request exists in the store buffer table in response to the access request for reading data from the external storage unit When the data corresponding to the access request does not exist in the store buffer table, the data read from the external storage unit is output to the processing element.

また、本発明の一態様は、上記のプロセッサにおいて、前記ストアバッファ制御部は、前記外部記憶部にデータを書き込む前記アクセス要求に対して、前記プロセッシングエレメントに対応する前記ストアバッファテーブルに、前記ローカル位置情報と、当該アクセス情報とを対応付けて記憶させるとともに、前記外部記憶部に当該データを記憶させることを特徴とする。 Further, according to one aspect of the present invention, in the processor, the store buffer control unit may store the local buffer in the store buffer table corresponding to the processing element in response to the access request for writing data in the external storage unit. The position information and the access information are stored in association with each other, and the data is stored in the external storage unit.

本発明によれば、非数値計算処理を高速処理することができる。 According to the present invention, non-numeric calculation processing can be performed at high speed.

本実施形態によるプロセッサの一例を示すブロック図である。It is a block diagram which shows an example of the processor by this embodiment. 本実施形態におけるアトミックブロックの一例を説明する図である。It is a figure explaining an example of an atomic block in this embodiment. 本実施形態におけるＰＥの構成例を示すブロック図である。It is a block diagram which shows the structural example of PE in this embodiment. 本実施形態における実行管理ユニットの構成例を示すブロック図である。It is a block diagram which shows the structural example of the execution management unit in this embodiment. 本実施形態における更新履歴記憶部の構成例を示す図である。It is a figure which shows the structural example of the update log | history memory | storage part in this embodiment. 本実施形態におけるレジスタ管理ユニットの一例を示すブロック図である。It is a block diagram which shows an example of the register management unit in this embodiment. 本実施形態におけるストアバッファ記憶部の構成例を示す図である。It is a figure which shows the structural example of the store buffer memory | storage part in this embodiment. 本実施形態におけるエントリテーブル記憶部の構成例を示す図である。It is a figure which shows the structural example of the entry table memory | storage part in this embodiment. 本実施形態におけるＰＥの状態遷移の一例を示す図である。It is a figure which shows an example of the state transition of PE in this embodiment. 本実施形態によるプロセッサのパイプライン処理の一例を示す図である。It is a figure which shows an example of the pipeline process of the processor by this embodiment. 本実施形態によるＰＥの並列処理の一例を示す図である。It is a figure which shows an example of the parallel processing of PE by this embodiment. 本実施形態によるプロセッサのＰＥ間のＲＡＷハザード処理の一例を示す図である。It is a figure which shows an example of the RAW hazard process between PE of the processor by this embodiment. 本実施形態によるプロセッサの自身のＰＥ内のＲＡＷハザード処理の一例を示す図である。It is a figure which shows an example of the RAW hazard process in own PE of the processor by this embodiment.

以下、本発明の一実施形態によるプロセッサについて図面を参照して説明する。
図１は、本実施形態によるプロセッサ１の一例を示すブロック図である。
図１に示すように、プロセッサ１は、複数（例えば、Ｎ個）のプロセッシングエレメント（１０−１、１０−２、・・・、１０−Ｎ）と、ＰＣ（プログラムカウンタ）更新部２０と、命令フェッチ部３０と、実行管理ユニット４０と、レジスタ管理ユニット５０と、ストアバッファユニット６０と、メモリ管理ユニット７０とを備えている。プロセッサ１は、メモリ管理ユニット７０を介してデータメモリ７１に接続されている。 A processor according to an embodiment of the present invention will be described below with reference to the drawings.
FIG. 1 is a block diagram illustrating an example of a processor 1 according to the present embodiment.
As shown in FIG. 1, the processor 1 includes a plurality of (for example, N) processing elements (10-1, 10-2,..., 10-N), a PC (program counter) update unit 20, An instruction fetch unit 30, an execution management unit 40, a register management unit 50, a store buffer unit 60, and a memory management unit 70 are provided. The processor 1 is connected to the data memory 71 via the memory management unit 70.

なお、「プロセッシングエレメント」は、以下の説明において、「ＰＥ」（Processing Element）と表記する。また、ＰＥ１０−１、ＰＥ１０−２、・・・１０−Ｎのそれぞれは、同一の構成であり、本実施形態において、プロセッサ１が備える任意のＰＥを示す場合、又は特に区別しない場合には、ＰＥ１０として説明する。なお、本実施形態において、ＰＥ１０の個数は、例えば、８個である。 The “processing element” is expressed as “PE” (Processing Element) in the following description. Moreover, each of PE10-1, PE10-2, ... 10-N has the same configuration, and in the present embodiment, when any PE included in the processor 1 is indicated, or when not particularly distinguished, This will be described as PE10. In the present embodiment, the number of PEs 10 is, for example, 8.

ＰＣ更新部２０は、ＰＣ（プログラムカウンタ）２１を有し、ＰＣ２１を更新する。ここで、ＰＣ２１は、プログラムを記憶するプログラムメモリ（不図示）の命令実行位置を示すアドレス（位置情報）を格納するレジスタである。ＰＣ更新部２０は、通常の命令実行ごとに、ＰＣ２１を“＋ＩＬ”するとともに、分岐命令が実行された場合には、指定された分岐先のアドレスにＰＣ２１を更新する。ここで、“ＩＬ”は、命令固定長を示す。また、ＰＣ更新部２０は、ＰＣ２１が記憶するＰＣ値を、アドレスとしてプログラムメモリに出力する。 The PC update unit 20 has a PC (program counter) 21 and updates the PC 21. Here, the PC 21 is a register that stores an address (position information) indicating an instruction execution position of a program memory (not shown) that stores a program. The PC update unit 20 "+ ILs" the PC 21 every time a normal instruction is executed, and updates the PC 21 to the designated branch destination address when a branch instruction is executed. Here, “IL” indicates a fixed instruction length. The PC updating unit 20 outputs the PC value stored in the PC 21 to the program memory as an address.

命令フェッチ部３０は、ＩＲ（命令レジスタ）３１を有し、ＰＣ更新部２０から出力されたアドレスに応じて、プログラムメモリが出力した命令（Instruction）をＩＲ３１に記憶（格納）させる。ここで、ＩＲ３１は、プログラムメモリから読み出した命令を格納するレジスタである。命令フェッチ部３０は、ＩＲ３１に記憶させた命令を実行管理ユニットに出力するとともに、命令ネットワークＮＷ１を介して、ＰＥ１０に供給する。 The instruction fetch unit 30 includes an IR (instruction register) 31, and stores (stores) an instruction (Instruction) output from the program memory in the IR 31 in accordance with the address output from the PC update unit 20. Here, IR 31 is a register for storing an instruction read from the program memory. The instruction fetch unit 30 outputs the instruction stored in the IR 31 to the execution management unit and supplies it to the PE 10 via the instruction network NW1.

独立した複数のＰＥ１０のそれぞれは、アセンブリ言語以下のレベルの命令列であるプログラムを分割したアトミックブロックを、展開連結リスト（Unrolled Linked List）のリストエントリ（入力データ）として、アトミックブロックに含まれる命令を実行する。ここで、アトミックブロック（単位命令列の一例）とは、アセンブリ言語以下のレベルの命令列であるプログラムを、途中に分岐命令を含まない命令列であって、分岐先の先頭命令を始端とし、分岐命令を終端とする命令列である。 Each of the plurality of independent PEs 10 includes an instruction included in the atomic block as a list entry (input data) of an expanded linked list (unrolled linked list) obtained by dividing a program that is a sequence of instructions at a level lower than assembly language. Execute. Here, an atomic block (an example of a unit instruction sequence) is a program that is an instruction sequence at a level lower than assembly language, an instruction sequence that does not include a branch instruction in the middle, and the first instruction at the branch destination is the starting point, It is an instruction sequence that ends with a branch instruction.

図２は、本実施形態におけるアトミックブロックの一例を説明する図である。
図２（ａ）は、アセンブリ言語で記述されたプログラム（ＰＲ１）の一例を示すリストである。また、図２（ｂ）は、図２（ａ）に示すプログラム（ＰＲ１）をアトミックブロックに分割して、展開連結リストにした一例である。
図２（ｂ）に示すように、プログラム（ＰＲ１）は、アトミックブロックＡＢ＿Ａ〜ＡＢ＿Ｅに分割される。例えば、アトミックブロックＡＢ＿Ｄは、始点（始端）が、分岐先の先頭命令である“ａｄｄｒ２，ｒ２，ｒ１”（命令Ｉ１）であり、終点（終端）が、分岐命令である“ｂｎｅｑｌｏｏｐ”（命令Ｉ２）である。 FIG. 2 is a diagram illustrating an example of an atomic block in the present embodiment.
FIG. 2A is a list showing an example of a program (PR1) described in assembly language. FIG. 2B is an example in which the program (PR1) shown in FIG. 2A is divided into atomic blocks to form an expanded linked list.
As shown in FIG. 2B, the program (PR1) is divided into atomic blocks AB_A to AB_E. For example, in the atomic block AB_D, the start point (start end) is “add r2, r2, r1” (instruction I1) which is the first instruction of the branch destination, and the end point (end) is “bneq loop” (instruction which is the branch instruction). Instruction I2).

このように、本実施形態によるプロセッサ１では、プログラムをアトミックブロックに分割し、図２（ｃ）に示すような展開連結リストにし、各アトミックブロックを各ＰＥ１０に実行させる。図２（ｃ）に示す例は、図２（ａ）に示すプログラムを展開連結リストにしたものである。ここで、プロセッサ１は、「アトミックブロックＡ」（ＡＢ＿Ａ）から「アトミックブロックＢ」（ＡＢ＿Ｂ）が実行させ、「アトミックブロックＢ」（ＡＢ＿Ｂ）において、「アトミックブロックＣ」（ＡＢ＿Ｃ）と、「アトミックブロックＤ」（ＡＢ＿Ｄ）とに分岐さる処理を実行する。また、プロセッサ１は、「アトミックブロックＢ」（ＡＢ＿Ｂ）と続く「アトミックブロックＤ」（ＡＢ＿Ｄ）とにより条件付きループを実行させる。なお、図２（ｃ）において、斜線部分は、分岐命令を示している。 As described above, in the processor 1 according to the present embodiment, the program is divided into atomic blocks, the expanded linked list as shown in FIG. 2C is obtained, and each PE 10 is caused to execute each atomic block. The example shown in FIG. 2C is an expanded linked list of the program shown in FIG. Here, the processor 1 causes the “atomic block A” (AB_A) to be executed by the “atomic block B” (AB_B). In the “atomic block B” (AB_B), the “atomic block C” (AB_C) A process branching to “Block D” (AB_D) is executed. Further, the processor 1 causes a conditional loop to be executed by “atomic block B” (AB_B) and subsequent “atomic block D” (AB_D). In FIG. 2C, the shaded area indicates a branch instruction.

図１に戻り、複数のＰＥ１０は、それぞれが、割り当てられたアトミックブロックを一時記憶する命令キャッシュ部１１を有し、それぞれの命令キャッシュ部１１に記憶されたアトミックブロックに含まれる命令を実行可能である。
ここで、図３を参照して、ＰＥ１０の構成について説明する。 Returning to FIG. 1, each of the plurality of PEs 10 has an instruction cache unit 11 that temporarily stores an assigned atomic block, and can execute an instruction included in the atomic block stored in each instruction cache unit 11. is there.
Here, the configuration of the PE 10 will be described with reference to FIG.

図３は、本実施形態におけるＰＥ１０の構成例を示すブロック図である。
図３に示すように、ＰＥ１０は、命令キャッシュ部１１と、レジスタファイル部１２と、命令実行部１３とを備えている。
命令キャッシュ部１１（一時記憶部の一例）は、命令フェッチ部３０から出力された命令をキャッシュとして記憶するレジスタ群である。命令キャッシュ部１１は、例えば、キャッシュレジスタ（Ｃ１）〜キャッシュレジスタ（ＣＭ）のＭ個のレジスタを有し、アトミックブロックを記憶する。ここで、キャッシュレジスタの個数Ｍは、例えば、１つのアトミックブロックが格納可能なサイズであり、例えば、１６である。 FIG. 3 is a block diagram illustrating a configuration example of the PE 10 in the present embodiment.
As shown in FIG. 3, the PE 10 includes an instruction cache unit 11, a register file unit 12, and an instruction execution unit 13.
The instruction cache unit 11 (an example of a temporary storage unit) is a register group that stores the instruction output from the instruction fetch unit 30 as a cache. The instruction cache unit 11 has, for example, M registers, a cache register (C1) to a cache register (CM), and stores an atomic block. Here, the number M of cache registers is, for example, a size that can store one atomic block, for example, 16.

レジスタファイル部１２は、ＰＥ１０による演算結果やデータメモリから読み出したデータなどを記憶するレジスタ群であり、命令の実行に利用される。レジスタファイル部１２は、複数の汎用レジスタ（例えば、Ｒｅｇ１〜ＲｅｇＲ）、及び制御レジスタを備えている。すなわち、レジスタファイル部１２は、汎用レジスタ（Ｒｅｇ１）〜汎用レジスタ（ＲｅｇＲ）のＲ個のレジスタ、及び制御レジスタを有し、命令の実行に利用される。なお、各ＰＥ１０が備えるレジスタファイル部１２の間のデータ通信は、オペランドデータネットワークＮＷ２を介して実行される。なお、以下の説明において、汎用レジスタ及び制御レジスタを単に「レジスタ」と表記して説明する。 The register file unit 12 is a group of registers for storing calculation results by the PE 10, data read from the data memory, and the like, and is used for executing instructions. The register file unit 12 includes a plurality of general purpose registers (for example, Reg1 to RegR) and a control register. In other words, the register file unit 12 includes R registers from the general-purpose register (Reg1) to the general-purpose register (RegR), and a control register, and is used for executing instructions. Note that data communication between the register file units 12 included in each PE 10 is executed via the operand data network NW2. In the following description, the general-purpose register and the control register will be described simply as “register”.

命令実行部１３は、例えば、命令デコーダ及びＡＬＵ（Arithmetic Logic Unit）などを有し、命令フェッチ部３０から出力された命令、又は命令キャッシュ部１１が記憶する命令を実行する。
なお、ＰＥ１０は、アトミックブロックの初回の実行において、命令フェッチ部３０から出力された命令を命令実行部１３に実行させつつ、順番に命令キャッシュ部１１に記憶させるようにしてもよいし、命令フェッチ部３０から出力された命令を順番に命令キャッシュ部１１に記憶させ、当該命令キャッシュ部１１から命令を読み出して、命令実行部１３に実行させてもよい。また、ＰＥ１０は、アトミックブロックの２回目の実行において、命令キャッシュ部１１から順番に命令を読み出して、命令実行部１３に実行させる。
また、ＰＥ１０は、命令キャッシュ部１１が記憶する命令をパイプライン処理により、並列に実行可能に構成されているものとする。 The instruction execution unit 13 includes, for example, an instruction decoder and an ALU (Arithmetic Logic Unit), and executes an instruction output from the instruction fetch unit 30 or an instruction stored in the instruction cache unit 11.
Note that, in the first execution of the atomic block, the PE 10 may cause the instruction execution unit 13 to execute the instruction output from the instruction fetch unit 30 and store the instruction in the instruction cache unit 11 in order. The instructions output from the unit 30 may be stored in the instruction cache unit 11 in order, read out from the instruction cache unit 11, and executed by the instruction execution unit 13. In addition, in the second execution of the atomic block, the PE 10 reads instructions in order from the instruction cache unit 11 and causes the instruction execution unit 13 to execute the instructions.
The PE 10 is configured to be able to execute instructions stored in the instruction cache unit 11 in parallel by pipeline processing.

再び、図１に戻り、実行管理ユニット４０（実行管理部の一例）は、プログラムをアトミックブロックに分割し、分割した当該アトミックブロックを複数のＰＥ１０のそれぞれに逐次割り当て、割り当てたアトミックブロックを複数のＰＥ１０に並列実行させる。実行管理ユニット４０は、分割したアトミックブロックが複数のＰＥ１０のいずれの命令キャッシュ部１１にも記憶されていない場合（キャッシュミスの場合）に、当該アトミックブロックを複数のＰＥ１０のいずれかに割り当てる。また、実行管理ユニット４０は、分割したアトミックブロックが複数のＰＥ１０のいずれかの命令キャッシュ部１１に既に記憶されている場合（キャッシュヒットの場合）に、当該ＰＥ１０に、当該命令キャッシュ部１１に既に記憶されているアトミックブロックを並列実行させる。 1 again, the execution management unit 40 (an example of the execution management unit) divides the program into atomic blocks, sequentially assigns the divided atomic blocks to each of the plurality of PEs 10, and assigns the assigned atomic blocks to the plurality of atomic blocks. The PE10 is executed in parallel. When the divided atomic block is not stored in any instruction cache unit 11 of the plurality of PEs 10 (in the case of a cache miss), the execution management unit 40 assigns the atomic block to any one of the plurality of PEs 10. Further, when the divided atomic block is already stored in one of the instruction cache units 11 of the plurality of PEs 10 (in the case of a cache hit), the execution management unit 40 has already stored the PE 10 in the instruction cache unit 11. Run stored atomic blocks in parallel.

また、実行管理ユニット４０は、プログラムに分岐命令が検出された際に、後述する管理テーブル記憶部４１が記憶する管理テーブルに基づいて、次に実行するアトミックブロックが複数のＰＥ１０のいずれかの命令キャッシュ部１１に既に記憶されており、且つ有効な状態であるか否かを判定する。実行管理ユニット４０は、次に実行するアトミックブロックが複数のＰＥ１０のいずれかの命令キャッシュ部１１に既に記憶されており、且つ有効な状態であると判定した場合（キャッシュヒットの場合）に。当該ＰＥ１０に、命令キャッシュ部１１に記憶されているアトミックブロックを実行させる。 Further, the execution management unit 40, when a branch instruction is detected in the program, based on a management table stored in a management table storage unit 41 to be described later, the atomic block to be executed next is one of the instructions of the plurality of PEs 10. It is determined whether it is already stored in the cache unit 11 and is in a valid state. When the execution management unit 40 determines that the atomic block to be executed next is already stored in one of the instruction cache units 11 of the plurality of PEs 10 and is in a valid state (in the case of a cache hit). The PE 10 is caused to execute the atomic block stored in the instruction cache unit 11.

ここで、図４を参照して、本実施形態における実行管理ユニット４０の構成について説明する。
図４は、本実施形態における実行管理ユニット４０の構成例を示すブロック図である。
図４に示すように、実行管理ユニット４０は、管理テーブル記憶部４１と、キャッシュミスハンドラ部４２と、プリデコーダ部４３とを備えている。 Here, the configuration of the execution management unit 40 in the present embodiment will be described with reference to FIG.
FIG. 4 is a block diagram illustrating a configuration example of the execution management unit 40 in the present embodiment.
As shown in FIG. 4, the execution management unit 40 includes a management table storage unit 41, a cache miss handler unit 42, and a predecoder unit 43.

管理テーブル記憶部４１は、例えば、レジスタにより構成される記憶であり、「ＰＥ＿ＮＯ」と、「ＢＴＡ」と、「有効フラグ」と、「ＬＧＴ」と、「ＰＴＲ１」と、「ＰＴＲ２」とを対応付けた管理情報を複数有する管理テーブルを記憶する。ここで。「ＰＥ＿ＮＯ」は、ＰＥ１０の番号を示し、管理情報の数は、ＰＥ１０の個数であるＮ個である。また、「ＢＴＡ」は、ブランチターゲットアドレスであり、分岐ターゲット命令のアドレスを示す。すなわち、「ＢＴＡ」は、プログラムにおけるアトミックブロックの先頭命令の位置を示す先頭位置情報（先頭アドレス）である。 The management table storage unit 41 is, for example, a storage composed of registers, and corresponds to “PE_NO”, “BTA”, “valid flag”, “LGT”, “PTR1”, and “PTR2”. A management table having a plurality of attached management information is stored. here. “PE_NO” indicates the number of the PE 10, and the number of management information is N, which is the number of PEs 10. “BTA” is a branch target address and indicates the address of a branch target instruction. That is, “BTA” is head position information (head address) indicating the position of the head instruction of the atomic block in the program.

また、「有効フラグ」は、当該管理情報（当該ＰＥ１０）が有効であるか否か示すフラグ情報である。すなわち、「有効フラグ」は、当該アトミックブロックが有効であるか否かを示す有効情報である。「有効フラグ」は、例えば、“１”である場合に有効を示し、“０”である場合に無効を示す。また、「ＬＧＴ」は、アトミックブロックのブロック長を示す。
また、「ＰＴＲ１」及び「ＰＴＲ２」は、次に実行されるアトミックブロックの当該管理テーブルにおける行番号（ポインタ）である。すなわち、「ＰＴＲ１」及び「ＰＴＲ２」は、次に実行するＰＥ１０を示す次実行情報である。なお、本実施形態では、分岐命令が、例えば、２つの分岐先を持つ場合の一例であるため、「ＰＴＲ１」及び「ＰＴＲ２」の２つのポインタを有しているが、例えば、分岐命令が、Ｍ個の分岐先を持つ場合には、Ｍ個のポインタを有することになる。 The “valid flag” is flag information indicating whether or not the management information (the PE 10) is valid. That is, the “valid flag” is valid information indicating whether or not the atomic block is valid. The “valid flag” indicates, for example, valid when “1”, and invalid when “0”. “LGT” indicates the block length of the atomic block.
“PTR1” and “PTR2” are row numbers (pointers) in the management table of the atomic block to be executed next. That is, “PTR1” and “PTR2” are next execution information indicating the PE 10 to be executed next. In this embodiment, since the branch instruction is an example in the case of having two branch destinations, for example, it has two pointers “PTR1” and “PTR2”. If there are M branch destinations, it will have M pointers.

図４に示す例では、例えば、「ＰＥ＿ＮＯ」が“１”であるＰＥ１０には、「ＢＴＡ」（先頭アドレス）が“ＸＸＸＸ”であり、「有効フラグ」が“１”、「ＬＧＴ」（ブロック長）が、“１０”であるアトミックブロックが割り当てられていることを示している。また、当該アトミックブロックが次に実行する「ＰＴＲ１」が“２”であり、「ＰＴＲ２」が“３”であることを示している。すなわち、次に実行するＰＲ１０が、「ＰＲ＿ＮＯ」が“２”又は“３”であることを示している。 In the example illustrated in FIG. 4, for example, for PE 10 with “PE_NO” being “1”, “BTA” (head address) is “XXXX”, “Valid flag” is “1”, and “LGT” (block This indicates that an atomic block whose length is “10” is allocated. Further, “PTR1” to be executed next by the atomic block is “2”, and “PTR2” is “3”. That is, the PR 10 to be executed next indicates that “PR_NO” is “2” or “3”.

プリデコーダ部４３は、命令フェッチ部３０から出力された命令をデコードして、分岐命令を判定する。
キャッシュミスハンドラ部４２は、アトミックブロックのＰＥ１０への割り当て及び実行を制御する。キャッシュミスハンドラ部４２は、例えば、アトミックブロックがキャッシュミスである場合に、キャッシュミスハンドラ部４２は、複数のＰＥ１０のうちの１つに、アトミックブロックを割り当て、当該ＰＥ１０に実行させる。また、キャッシュミスハンドラ部４２は、次に実行するアトミックブロックがキャッシュヒットである場合に、当該ヒットしたＰＥ１０に、既に命令キャッシュ部１１に記憶されているアトミックブロックを実行させる。
また、プリデコーダ部４３は、比較回路４２１と、ＡＮＤ（アンド）回路４２２と、ＡＮＤ回路４２３と、ＬＲＵ（Least Recently Used）アービタ４２４と、ハンドラ部４２５とを備えている。 The predecoder unit 43 decodes the instruction output from the instruction fetch unit 30 and determines a branch instruction.
The cache miss handler unit 42 controls assignment and execution of atomic blocks to the PE 10. For example, when the atomic block is a cache miss, the cache miss handler unit 42 allocates an atomic block to one of the plurality of PEs 10 and causes the PE 10 to execute. Further, when the atomic block to be executed next is a cache hit, the cache miss handler unit 42 causes the PE 10 that has hit to execute the atomic block that is already stored in the instruction cache unit 11.
The predecoder unit 43 includes a comparison circuit 421, an AND circuit 422, an AND circuit 423, an LRU (Least Recently Used) arbiter 424, and a handler unit 425.

比較回路４２１は、ＰＣ２１の値であるＰＣ値と、管理テーブル記憶部４１が記憶する次に実行予定のアトミックブロック（ＰＥ１０）の「ＢＴＡ」とが一致するか否かを比較する。比較回路４２１は、例えば、一致する場合に、Ｈ状態（Ｈｉｇｈ状態（“１”状態））を出力し、不一致の場合に、Ｌ状態（Ｌｏｗ状態（“０”状態））を出力する。 The comparison circuit 421 compares the PC value, which is the value of the PC 21, with the “BTA” of the atomic block (PE 10) to be executed next stored in the management table storage unit 41. For example, the comparison circuit 421 outputs an H state (High state (“1” state)) when they match, and outputs an L state (Low state (“0” state)) when they do not match.

ＡＮＤ回路４２２は、比較回路４２１の出力と、次に実行予定のアトミックブロック（ＰＥ１０）の「有効フラグ」との論理積演算の結果を出力する。
ＡＮＤ回路４２３は、ＡＮＤ回路４２２の出力と、プリデコーダ部４３から出力される分岐命令を示す分岐命令信号との論理積演算の結果を、ＨＩＴ信号として出力する。このＨＩＴ信号は、次の実行予定のＰＥ１０に対してキャッシュヒットであることを示す信号である。すなわち、ＡＮＤ回路４２３は、キャッシュヒットである場合に、Ｈ状態を出力する。 The AND circuit 422 outputs the result of the logical product operation of the output of the comparison circuit 421 and the “valid flag” of the atomic block (PE10) to be executed next.
The AND circuit 423 outputs a logical product operation result of the output of the AND circuit 422 and the branch instruction signal indicating the branch instruction output from the predecoder unit 43 as a HIT signal. This HIT signal is a signal indicating that there is a cache hit for the next PE 10 to be executed. That is, the AND circuit 423 outputs an H state when there is a cache hit.

ＬＲＵアービタ４２４は、次に実行予定のアトミックブロック（ＰＥ１０）がキャッシュミスである場合に、複数のＰＥ１０の中から、次に使用するＰＥ１０を選択する。ＬＲＵアービタ４２４は、例えば、「使われてから最も長い時間が経ったもの」又は「参照される頻度が最も低いもの」のＰＥ１０を選択する。 The LRU arbiter 424 selects the PE 10 to be used next from the plurality of PEs 10 when the atomic block (PE 10) to be executed next is a cache miss. The LRU arbiter 424 selects, for example, the PE 10 that “has been used for the longest time” or “the one that is least frequently referenced”.

ハンドラ部４２５は、キャッシュミスハンドラ部４２を制御する。ハンドラ部４２５は、例えば、キャッシュヒットしたＰＥ１０に、命令キャッシュ部１１に記憶されている命令を実行させる。また、ハンドラ部４２５は、例えば、次の実行予定のＰＥ１０の「ＬＧＴ」をＰＣ更新部２０に出力させて、ＰＣ２１の値を、「ＬＧＴ」の値に更新させる。
また、ハンドラ部４２５は、キャッシュミスの場合に、ＬＲＵアービタ４２４によって選択されたＰＥ１０に対応する管理情報を更新する。すなわち、ハンドラ部４２５は、キャッシュミスの場合に、管理テーブル記憶部４１の対応する「ＰＥ＿ＮＯ」の行に、上述した「ＢＴＡ」と、「有効フラグ」と、「ＬＧＴ」と、「ＰＴＲ１」と、「ＰＴＲ２」とを対応付けた管理情報を記憶させる。 The handler unit 425 controls the cache miss handler unit 42. For example, the handler unit 425 causes the PE 10 having a cache hit to execute an instruction stored in the instruction cache unit 11. Further, for example, the handler unit 425 causes the PC update unit 20 to output “LGT” of the next PE 10 to be executed, and updates the value of the PC 21 to the value of “LGT”.
Further, the handler unit 425 updates the management information corresponding to the PE 10 selected by the LRU arbiter 424 in the case of a cache miss. That is, in the case of a cache miss, the handler unit 425 displays “BTA”, “valid flag”, “LGT”, and “PTR1” in the corresponding “PE_NO” row of the management table storage unit 41. , Management information associated with “PTR2” is stored.

このように、実行管理ユニット４０は、プログラムに分岐命令が検出された際に、管理テーブル記憶部４１が記憶する管理テーブルから、次実行情報（「ＰＴＲ１」又は「ＰＴＲ２」）に基づいて当該分岐命令の次に実行するアトミックブロックに対応する「ＢＴＡ」（先頭位置情報）及び「有効フラグ」（有効情報）を取得する。実行管理ユニット４０は、取得した「ＢＴＡ」の示す位置と、次に実行するプログラムの実行位置（ＰＣ値）とが一致し、且つ、取得した「有効フラグ」が有効を示す場合に、次実行情報（「ＰＴＲ１」又は「ＰＴＲ２」）に一致するＰＥ１０が、アトミックブロックを命令キャッシュ部１１に記憶しており、且つ、有効な状態であると判定する。 As described above, when a branch instruction is detected in the program, the execution management unit 40 reads the branch from the management table stored in the management table storage unit 41 based on the next execution information (“PTR1” or “PTR2”). Obtain “BTA” (head position information) and “valid flag” (valid information) corresponding to the atomic block to be executed next to the instruction. The execution management unit 40 executes the next execution when the position indicated by the acquired “BTA” matches the execution position (PC value) of the program to be executed next and the acquired “valid flag” indicates validity. The PE 10 that matches the information (“PTR1” or “PTR2”) stores the atomic block in the instruction cache unit 11 and determines that it is in a valid state.

また、実行管理ユニット４０は、取得した「ＢＴＡ」の示す位置と、次に実行するプログラムの実行位置とが一致しない、又は取得した有効情報が無効を示す場合に、複数のＰＥ１０の１つに次に実行するアトミックブロックを割り当てるとともに、割り当てたアトミックブロックに対応させて、管理テーブルを更新する。
また、実行管理ユニット４０は、複数のＰＥ１０の１つに次に実行するアトミックブロックを割り当てる場合に、管理テーブルの当該アトミックブロックに対応する「ＬＧＴ」を記憶させるとともに、「有効フラグ」を、無効を示す値（例えば、“０”）に更新する。実行管理ユニット４０は、ＰＥ１０が、当該アトミックブロックの実行を完了した場合に、当該アトミックブロックに対応する「有効フラグ」を、有効を示す値（例えば、“１”）に更新する。 In addition, the execution management unit 40 sets one of the plurality of PEs 10 when the acquired position indicated by “BTA” does not match the execution position of the program to be executed next or when the acquired valid information indicates invalidity. The atomic block to be executed next is allocated, and the management table is updated in correspondence with the allocated atomic block.
In addition, when the execution management unit 40 assigns the next atomic block to be executed to one of the plurality of PEs 10, the execution management unit 40 stores “LGT” corresponding to the atomic block in the management table and invalidates the “valid flag”. Is updated to a value (for example, “0”). When the PE 10 completes execution of the atomic block, the execution management unit 40 updates the “valid flag” corresponding to the atomic block to a value indicating validity (for example, “1”).

再び、図１に戻り、レジスタ管理ユニット５０（レジスタ管理部の一例）は、各ＰＥ１０が備えるレジスタファイル部１２のレジスタを管理する。レジスタ管理ユニット５０は、特定のレジスタについて、複数のアトミックブロックに横断してデータ依存性がある場合に、ＰＥ１０を制御して、データ依存性を回避する。レジスタ管理ユニット５０は、例えば、複数のＰＥ１０の間で、レジスタの書き込み後の読み出しによるデータ依存であるＲＡＷハザードが発生した場合に、最も近い過去に当該レジスタに書き込みを実行しているＰＥ１０によるアトミックブロックの当該レジスタへの書き込みが完了するまで、当該レジスタの読み出しを実行するＰＥ１０の命令実行を保留（ＳＴＡＬＬ（ストール））させる。レジスタ管理ユニット５０は、更新履歴記憶部５１を備えている。 Referring back to FIG. 1 again, the register management unit 50 (an example of a register management unit) manages the registers of the register file unit 12 included in each PE 10. When there is a data dependency across a plurality of atomic blocks for a specific register, the register management unit 50 controls the PE 10 to avoid the data dependency. For example, when a RAW hazard that is data-dependent due to reading after register writing occurs between the plurality of PEs 10, the register management unit 50 performs atomic processing by the PE 10 that has performed writing to the register in the nearest past. Until the writing of the block to the corresponding register is completed, the instruction execution of the PE 10 that executes the reading of the register is suspended (STALL). The register management unit 50 includes an update history storage unit 51.

更新履歴記憶部５１は、例えば、レジスタにより構成される記憶であり、図５に示すように、「ＦＩＦＯ」と、「ＵＰＤ」と、「Ｗ／Ａ」と、「ＲＡＷ」と、「ＡＣＴ」とを対応付けた更新履歴情報を記憶する。更新履歴記憶部５１は、更新履歴情報を複数有する更新履歴テーブルを記憶する。また、更新履歴記憶部５１は、ＦＩＦＯ（First In First Out）に構成されており、実行管理ユニット４０は、ＰＥ１０にアトミックブロックが割り当てられる際に、当該ＰＥ１０に対応する更新履歴情報を追加し、古い更新履歴情報を廃棄する。 The update history storage unit 51 is, for example, a storage constituted by a register. As shown in FIG. 5, the “FIFO”, “UPD”, “W / A”, “RAW”, and “ACT” Is stored. The update history storage unit 51 stores an update history table having a plurality of update history information. Further, the update history storage unit 51 is configured in a FIFO (First In First Out), and the execution management unit 40 adds update history information corresponding to the PE 10 when an atomic block is allocated to the PE 10, Discard old update history information.

図５は、本実施形態における更新履歴記憶部５１の構成例を示す図である。
図５において、「ＦＩＦＯ」は、ＰＥ１０を識別するＰＥ識別情報を示し、当該ＰＥ識別情報を実行順に記憶している。また、「ＵＰＤ」は、アトミックブロックの実行において、レジスタが更新されたことを示すフラグ（ＵＰＤフラグ）である。すなわち、「ＵＰＤ」は、レジスタが更新されたことを示す更新情報である。「ＵＰＤ」は、例えば、“１”である場合に更新されたことを示し、“０”である場合に更新されていないことを示す。 FIG. 5 is a diagram illustrating a configuration example of the update history storage unit 51 in the present embodiment.
In FIG. 5, “FIFO” indicates PE identification information for identifying the PE 10 and stores the PE identification information in the order of execution. “UPD” is a flag (UPD flag) indicating that the register has been updated in the execution of the atomic block. That is, “UPD” is update information indicating that the register has been updated. “UPD”, for example, indicates that it has been updated when it is “1”, and indicates that it has not been updated when it is “0”.

また、「Ｗ／Ａ」は、ＷフラグとＡフラグとを有している。Ｗフラグは、アトミックブロックにおけるレジスタの最後の書き込みが完了したことを示す完了情報である。また、Ａフラグは、レジスタの書き込みが実行されたことを示す実行情報である。Ｗフラグは、例えば、“１”である場合にレジスタの最後の書き込みが完了したことを示し、“０”である場合にレジスタの最後の書き込みが完了していないことを示す。また、Ａフラグは、例えば、“１”である場合にレジスタの書き込みが実行されたことを示し、“０”である場合にレジスタの書き込みが実行されていないことを示す。 “W / A” includes a W flag and an A flag. The W flag is completion information indicating that the last write of the register in the atomic block has been completed. The A flag is execution information indicating that the register write has been executed. For example, the W flag indicates that the last write of the register is completed when it is “1”, and indicates that the last write of the register is not completed when it is “0”. The A flag indicates, for example, that register writing has been executed when it is “1”, and indicates that register writing has not been executed when it is “0”.

また、「ＲＡＷ」は、自身のＰＥ１０内において、ＲＡＷハザードが発生したことを示すＲＡＷフラグである。「ＲＡＷ」は、例えば、“１”である場合に、自身のＰＥ１０内においてＲＡＷハザードが発生したことを示し、“０”である場合に自身のＰＥ１０内においてＲＡＷハザードが発生していないことを示す。また、「ＲＡＷ」は、自身のＰＥ１０内におけるＲＡＷハザードに対する処理と、ＰＥ１０間のＲＡＷハザードに対する処理とを切り替える制御フラグとして機能する。「ＲＡＷ」は、例えば、Ａフラグが“１”であり、且つ、当該レジスタに対する読み込み要求があった場合に、「ＲＡＷ」は、“１”に変更される。
また、「ＡＣＴ」は、当該ＰＥ１０が実行中であるか否かを示すＡＣＴフラグである。「ＡＣＴ」は、例えば、“１”である場合に当該ＰＥ１０が実行中であることを示し、“０”である場合に当該ＰＥ１０が実行中でないことを示す。 “RAW” is a RAW flag indicating that a RAW hazard has occurred in its own PE 10. “RAW”, for example, indicates that a RAW hazard has occurred in its own PE 10 when it is “1”, and indicates that a RAW hazard has not occurred in its own PE 10 when it is “0”. Show. Further, “RAW” functions as a control flag for switching between processing for a RAW hazard in its own PE 10 and processing for a RAW hazard between PEs 10. “RAW” is changed to “1”, for example, when the A flag is “1” and there is a read request for the register.
“ACT” is an ACT flag indicating whether or not the PE 10 is being executed. “ACT” indicates, for example, that the PE 10 is being executed when it is “1”, and indicates that the PE 10 is not being executed when it is “0”.

なお、更新履歴記憶部５１は、更新履歴情報をＰＥ１０の個数分（例えば、Ｎ個）有する更新履歴テーブルを記憶する。また、更新履歴記憶部５１は、複数のレジスタのそれぞれに対応する更新履歴テーブルを記憶する。すなわち、「ＵＰＤ」、「Ｗ／Ａ」、及び「ＲＡＷ」は、ＰＥ１０が備えるレジスタの個数に応じてビット数が決定される。例えば、「ＵＰＤ」は、ＲＮ×１ビットで構成され、「Ｗ／Ａ」は、ＲＮ×２ビットで構成される。また、「ＲＡＷ」は、ＲＮ×１ビットで構成される。ここで、ＲＮは、レジスタの個数（汎用レジスタのＲ個と制御レジスタの個数との合計数）である。
図５に示す例では、例えば、「ＦＩＦＯ」が“ＰＥ＿１”に対応する更新履歴情報は、「ＵＰＤ」が“１１・・・”であり、「Ｗ／Ａ」が“０／０・・・”であることを示している。また、「ＲＡＷ」が“００・・・”であり、「ＡＣＴ」が“１”であることを示している。 The update history storage unit 51 stores an update history table having update history information corresponding to the number of PEs 10 (for example, N). The update history storage unit 51 stores an update history table corresponding to each of the plurality of registers. That is, the number of bits for “UPD”, “W / A”, and “RAW” is determined according to the number of registers included in the PE 10. For example, “UPD” is composed of RN × 1 bits, and “W / A” is composed of RN × 2 bits. “RAW” is composed of RN × 1 bit. Here, RN is the number of registers (the total number of R general-purpose registers and the number of control registers).
In the example illustrated in FIG. 5, for example, in the update history information in which “FIFO” corresponds to “PE_1”, “UPD” is “11...” And “W / A” is “0/0. ". Further, “RAW” is “00...” And “ACT” is “1”.

また、図６は、本実施形態におけるレジスタ管理ユニット５０の一例を示すブロック図である。
図６に示すように、レジスタ管理ユニット５０は、更新履歴記憶部５１と、マルチプレクサ５２と、マルチプレクサ５３とを備えている。なお、図６に示す更新履歴記憶部５１は、上述した図５に示す更新履歴情報のうち、特定のレジスタ（例えば、ＲｅｇＸ）に関する「ＵＰＤ」、「Ｗ／Ａ」、及び「ＲＡＷ」のみを表記している。 FIG. 6 is a block diagram showing an example of the register management unit 50 in the present embodiment.
As illustrated in FIG. 6, the register management unit 50 includes an update history storage unit 51, a multiplexer 52, and a multiplexer 53. The update history storage unit 51 illustrated in FIG. 6 stores only “UPD”, “W / A”, and “RAW” regarding a specific register (for example, RegX) in the above-described update history information illustrated in FIG. It is written.

また、更新履歴情報ＥＮ２は、レジスタ管理ユニット５０に対して読み込み要求がされており、且つ、自身のＰＥ１０内のＲＡＷハザードが発生していないＰＥ１０−３（ＰＥ＿３）に対応する。また、更新履歴情報ＥＮ１は、当該ＰＥ１０−３（ＰＥ＿３）の処理に最も近い過去にレジスタＲｅｇＸに書き込みを実行しているＰＥ１０−１（ＰＥ＿１）に対応する。
マルチプレクサ５２は、「ＲＡＷ」の値に基づいて、更新履歴情報ＥＮ１のＷフラグと、更新履歴情報ＥＮ２のＡフラグとのいずれかを選択し、ＰＥ１０−３（ＰＥ＿３）をストールさせるＳＴＡＬＬＢ信号を出力する。なお、ＳＴＡＬＬＢ信号は、負論理の信号であり、“０”の場合にストールとなる。また、マルチプレクサ５２は、「ＲＡＷ」が“０”である場合に、更新履歴情報ＥＮ１のＷフラグの値をＳＴＡＬＬＢ信号として出力する。また、マルチプレクサ５２は、「ＲＡＷ」が“１”である場合に、更新履歴情報ＥＮ２のＡフラグとの値をＳＴＡＬＬＢ信号として出力する。 The update history information EN2 corresponds to the PE 10-3 (PE_3) for which a read request has been issued to the register management unit 50 and no RAW hazard has occurred in its own PE 10. The update history information EN1 corresponds to the PE 10-1 (PE_1) that has written to the register RegX in the past closest to the processing of the PE 10-3 (PE_3).
The multiplexer 52 selects either the W flag of the update history information EN1 or the A flag of the update history information EN2 based on the value of “RAW”, and outputs a STALLB signal that stalls the PE 10-3 (PE_3). To do. The STALLB signal is a negative logic signal and stalls when it is “0”. Further, when “RAW” is “0”, the multiplexer 52 outputs the value of the W flag of the update history information EN1 as a STALLB signal. Further, when “RAW” is “1”, the multiplexer 52 outputs the value of the A flag of the update history information EN2 as a STALLB signal.

マルチプレクサ５３は、「ＲＡＷ」の値に基づいて、データを読み込むレジスタＲｅｇＸを、ＰＥ１０−３（ＰＥ＿３）と、ＰＥ１０−１（ＰＥ＿１）とのいずれかを切り替える。マルチプレクサ５３は、「ＲＡＷ」が“０”である場合に、ＰＥ１０−１（ＰＥ＿１）のレジスタＲｅｇＸを読み込むレジスタとして指定する。また、マルチプレクサ５３は、「ＲＡＷ」が“１”である場合に、ＰＥ１０−３（ＰＥ＿３）のレジスタＲｅｇＸを読み込むレジスタとして指定する。 The multiplexer 53 switches the register RegX for reading data between PE10-3 (PE_3) and PE10-1 (PE_1) based on the value of “RAW”. The multiplexer 53 designates the register RegX of the PE 10-1 (PE_1) as a register to read when “RAW” is “0”. Further, when “RAW” is “1”, the multiplexer 53 designates the register RegX of the PE 10-3 (PE_3) as a register to be read.

このように、レジスタ管理ユニット５０は、更新履歴記憶部５１が記憶する更新履歴テーブルに基づいて、自身のＰＥ１０内のＲＡＷハザードが発生したか否かを判定し、自身のＰＥ１０内のＲＡＷハザードが発生した場合に、「ＲＡＷ」を“１”にする。そして、レジスタ管理ユニット５０は、更新履歴テーブルに含まれる、読み込み要求のあったＰＥ１０に対応する「ＲＡＷ」と、当該ＰＥ１０に対応するＡフラグと、最も近い過去にレジスタに書き込みを実行しているＰＥ１０に対応するＷフラグとに基づいて、読み込み要求のあったＰＥ１０の命令実行を保留させる。
例えば、読み込み要求のあったＰＥ１０に対応する「ＲＡＷ」が“１”である場合に、レジスタ管理ユニット５０は、マルチプレクサ５２によって当該ＰＥ１０に対応するＡフラグに基づいて、当該ＰＥ１０への命令のエントリを保留するか否かのＳＴＡＬＬＢ信号を生成する。また、この場合、レジスタ管理ユニット５０は、当該ＰＥ１０のレジスタ（ＲｅｇＸ）からデータを当該ＰＥ１０に読み込ませる。 As described above, the register management unit 50 determines whether or not a RAW hazard has occurred in its own PE 10 based on the update history table stored in the update history storage unit 51, and the RAW hazard in its own PE 10 is determined. When it occurs, “RAW” is set to “1”. Then, the register management unit 50 executes writing to the register in the past in the update history table, including “RAW” corresponding to the PE 10 having requested reading, the A flag corresponding to the PE 10, and the closest past. Based on the W flag corresponding to the PE 10, the instruction execution of the PE 10 that requested the reading is suspended.
For example, when “RAW” corresponding to the PE 10 requested to read is “1”, the register management unit 50 uses the multiplexer 52 to enter an instruction to the PE 10 based on the A flag corresponding to the PE 10. A STALLB signal is generated as to whether or not to hold. In this case, the register management unit 50 causes the PE 10 to read data from the register (RegX) of the PE 10.

また、例えば、読み込み要求のあったＰＥ１０に対応する「ＲＡＷ」が“０”である場合に、レジスタ管理ユニット５０は、マルチプレクサ５２によって最も近い過去にレジスタに書き込みを実行しているＰＥ１０に対応するＷフラグに基づいて、当該ＰＥ１０への命令のエントリを保留するか否かのＳＴＡＬＬＢ信号を生成する。また、この場合、レジスタ管理ユニット５０は、最も近い過去にレジスタに書き込みを実行しているＰＥ１０ＰＥ１０のレジスタ（ＲｅｇＸ）からデータを、読み込み要求のあったＰＥ１０に読み込ませる。 Further, for example, when “RAW” corresponding to the PE 10 that has made a read request is “0”, the register management unit 50 corresponds to the PE 10 that has been writing to the register in the past by the multiplexer 52. Based on the W flag, a STALLB signal is generated as to whether or not to hold entry of an instruction to the PE 10 concerned. Further, in this case, the register management unit 50 causes the PE 10 that has requested reading to read data from the register (RegX) of the PE 10 PE 10 that has been writing to the register in the nearest past.

例えば、図６に示す例では、更新履歴情報ＥＮ２に対応するＰＥ１０−３（ＰＥ＿３）にレジスタＲｅｇＸの読み込み要求があった場合に、「ＲＡＷ」が“０”であるため、マルチプレクサ５３によって、読み込み元として、ＰＥ１０−１（ＰＥ＿１）が選択される。また、ＰＥ１０−１（ＰＥ＿１）のＷフラグが“０”であるため、マルチプレクサ５２は、ＳＴＡＬＬＢ信号に“０”を出力して、ＰＥ１０−３（ＰＥ＿３）の命令実行を、ＰＥ１０−１（ＰＥ＿１）のＷフラグが“１”になるまで保留させる。 For example, in the example shown in FIG. 6, when a read request for the register RegX is made to the PE 10-3 (PE_3) corresponding to the update history information EN2, “RAW” is “0”. As a source, PE10-1 (PE_1) is selected. Since the W flag of PE10-1 (PE_1) is “0”, the multiplexer 52 outputs “0” to the STALLB signal, and executes the instruction of PE10-3 (PE_3). ) Until the W flag becomes “1”.

なお、レジスタ管理ユニット５０は、Ａフラグに基づいて、ＲＡＷハザードが発生したか否かを判定し、「ＵＰＤ」に基づいて、最も近い過去にレジスタに書き込みを実行しているＰＥ１０を選択する。
また、レジスタ管理ユニット５０は、アトミックブロックをＰＥ１０に割り当てた初回の命令実行において記憶された更新履歴情報のうち、Ｗフラグ、Ａフラグ、及び「ＲＡＷ」を“０”に変更し、キャッシュヒットが発生した２回目以降の命令実行において記憶された更新履歴情報の状態を維持する。 Note that the register management unit 50 determines whether or not a RAW hazard has occurred based on the A flag, and selects the PE 10 that has performed writing to the register in the nearest past based on “UPD”.
In addition, the register management unit 50 changes the W flag, the A flag, and “RAW” to “0” in the update history information stored in the first instruction execution in which the atomic block is assigned to the PE 10, and the cache hit is detected. The state of the update history information stored in the second and subsequent instruction execution is maintained.

再び、図１に戻り、ストアバッファユニット６０（ストアバッファ制御部の一例）は、ＰＥ１０からデータメモリ７１にアクセス要求があった場合に、後述するストアバッファ記憶部６１が記憶する当該ＰＥ１０に対応するストアバッファテーブルに基づいて、データメモリ７１とのアクセスを制御する。なお、ストアバッファユニット６０は、オペランドデータネットワークＮＷ２を介して、ＰＥ１０が備えるレジスタファイル部１２とデータの授受を行う。 Referring back to FIG. 1 again, the store buffer unit 60 (an example of the store buffer control unit) corresponds to the PE 10 stored in the store buffer storage unit 61 described later when an access request is made to the data memory 71 from the PE 10. Access to the data memory 71 is controlled based on the store buffer table. The store buffer unit 60 exchanges data with the register file unit 12 included in the PE 10 via the operand data network NW2.

また、ストアバッファユニット６０は、メモリ管理ユニット７０を介して、データメモリ７１とデータのアクセスを行う。ストアバッファユニット６０は、データメモリ７１からデータを読み出すアクセス要求に対して、ストアバッファ記憶部６１のストアバッファテーブルに当該アクセス要求に対応するデータが存在する場合に、当該データをＰＥ１０に出力する。また、ストアバッファユニット６０は、ストアバッファテーブルに当該アクセス要求に対応するデータが存在しない場合に、データメモリ７１から読み出したデータをＰＥ１０に出力する。
また、ストアバッファユニット６０は、ストアバッファ記憶部６１と、エントリテーブル記憶部６２とを備えている。 The store buffer unit 60 accesses the data memory 71 via the memory management unit 70. In response to an access request for reading data from the data memory 71, the store buffer unit 60 outputs the data to the PE 10 when there is data corresponding to the access request in the store buffer table of the store buffer storage unit 61. Further, the store buffer unit 60 outputs the data read from the data memory 71 to the PE 10 when there is no data corresponding to the access request in the store buffer table.
The store buffer unit 60 includes a store buffer storage unit 61 and an entry table storage unit 62.

ストアバッファ記憶部６１は、アトミックブロックに含まれる各命令に対応するローカル位置情報と、データメモリ７１にアクセスした履歴を示すアクセス情報とを対応付けたストアバッファテーブルを、複数のＰＥ１０の個数分（例えば、Ｎ個分）記憶する。
図７は、本実施形態におけるストアバッファ記憶部６１の構成例を示す図である。
図７に示すように、ストアバッファ記憶部６１は、Ｎ個のストアバッファテーブルを備えている。また、ストアバッファテーブルは、例えば、「ＬＰＣ」と、「Ｌｄ」と、「Ｓｔ」と、「Ａｄｄｒ」と、「Ｄａｔａ」と、「完了」と、「終了」とを対応付けて記憶する。なお、ストアバッファテーブルは、ＰＥ１０の命令キャッシュ部１１のサイズ分の行を有している。 The store buffer storage unit 61 stores a store buffer table in which local position information corresponding to each instruction included in an atomic block and access information indicating a history of accessing the data memory 71 are associated with each other by the number of PEs 10 ( (For example, N items) are stored.
FIG. 7 is a diagram illustrating a configuration example of the store buffer storage unit 61 in the present embodiment.
As illustrated in FIG. 7, the store buffer storage unit 61 includes N store buffer tables. Further, the store buffer table stores, for example, “LPC”, “Ld”, “St”, “Addr”, “Data”, “completed”, and “end” in association with each other. The store buffer table has as many lines as the size of the instruction cache unit 11 of the PE 10.

ここで、「ＬＰＣ」は、ＰＥ１０内のローカルプログラムカウンタ（ローカル位置情報）であり、ＰＥ１０の命令キャッシュ部１１のアドレスに対応する。「Ｌｄ」は、ロード命令が実行されたことを示すフラグであり、「Ｓｔ」は、ストア命令が実行されたことを示すフラグである。なお、ロード命令は、データメモリ７１からレジスタへデータを読み出す命令であり、ストア命令は、レジスタからデータメモリ７１にデータを書き込む命令である。また、「Ａｄｄｒ」は、データメモリ７１のアドレスを示し、「Ｄａｔａ」は、アクセスしたデータの値を示す。また、「完了」は、データメモリ７１へのアクセス（データの書き込み）が完了したことを示す完了フラグであり、「終了」は、命令の実行が完了（終了）したことを示す終了フラグである。ここで、「Ｌｄ」、「Ｓｔ」、「Ａｄｄｒ」、「Ｄａｔａ」、「完了」、及び「終了」は、データメモリ７１にアクセスした履歴を示すアクセス情報に対応する。 Here, “LPC” is a local program counter (local position information) in the PE 10 and corresponds to the address of the instruction cache unit 11 of the PE 10. “Ld” is a flag indicating that a load instruction has been executed, and “St” is a flag indicating that a store instruction has been executed. The load instruction is an instruction for reading data from the data memory 71 to the register, and the store instruction is an instruction for writing data from the register to the data memory 71. “Addr” indicates the address of the data memory 71, and “Data” indicates the value of the accessed data. “Complete” is a completion flag indicating that the access (data writing) to the data memory 71 is completed, and “End” is an end flag indicating that execution of the instruction is completed (terminated). . Here, “Ld”, “St”, “Addr”, “Data”, “Complete”, and “End” correspond to access information indicating a history of accessing the data memory 71.

例えば、図７に示す例では、「ＬＰＣ」が“０”の命令が、データメモリ７１の「Ａｄｄｒ」の “０ｘ０２”に「Ｄａｔａ」の“０ｘ０ｆ”を書き込むストア命令であり、命令の実行が完了しているが、データメモリ７１への書き込みが完了していない状態であることを示している。
なお、ストアバッファユニット６０は、データメモリ７１にデータを書き込むアクセス要求に対して、ＰＥ１０に対応するストアバッファテーブルに、「ＬＰＣ」（ローカル位置情報）と、当該アクセス情報とを対応付けて記憶させるとともに、データメモリ７１に当該データを記憶させる。 For example, in the example shown in FIG. 7, an instruction whose “LPC” is “0” is a store instruction that writes “0x0f” of “Data” to “0x02” of “Addr” of the data memory 71. Although it is completed, it indicates that writing to the data memory 71 is not completed.
The store buffer unit 60 stores “LPC” (local position information) and the access information in association with each other in the store buffer table corresponding to the PE 10 in response to an access request for writing data to the data memory 71. At the same time, the data is stored in the data memory 71.

再び、図１に戻り、エントリテーブル記憶部６２は、異なるＰＥ１０から同一のアドレスにアクセスがあった場合の優先順位を記憶する。エントリテーブル記憶部６２は、ＦＩＦＯに構成されており、ストアバッファユニット６０は、データメモリ７１にデータのアクセスを行う命令が実行された際に、当該命令を実行するＰＥ１０に対応する情報を追加し、データのアクセスが完了した古い情報を廃棄する。エントリテーブル記憶部６２は、例えば、図８に示すように、「ＦＩＦＯ」と、「エントリＮＯ」とを対応付けて記憶する。 Returning to FIG. 1 again, the entry table storage unit 62 stores the priority order when the same address is accessed from different PEs 10. The entry table storage unit 62 is configured as a FIFO, and the store buffer unit 60 adds information corresponding to the PE 10 that executes the instruction when the instruction to access the data to the data memory 71 is executed. , Discard old information that data access is complete. For example, as illustrated in FIG. 8, the entry table storage unit 62 stores “FIFO” and “entry NO” in association with each other.

図８は、本実施形態におけるエントリテーブル記憶部６２の構成例を示す図である。
図８において、「ＦＩＦＯ」は、ＰＥ１０を識別するＰＥ識別情報を示し、「エントリＮＯ」は、上述したストアバッファテーブルの番号を示している。
例えば、図８に示す例では、「ＦＩＦＯ」が“ＰＥ＿１”に対応する「エントリＮＯ」が“１”であり、最上行になるため、“ＰＥ＿１”に対応するＰＥ１０によるアクセスが、最も優先順位が高いことを示している。 FIG. 8 is a diagram illustrating a configuration example of the entry table storage unit 62 in the present embodiment.
In FIG. 8, “FIFO” indicates PE identification information for identifying the PE 10, and “Entry NO” indicates the number of the store buffer table described above.
For example, in the example shown in FIG. 8, “FIFO” is “1” in “Entry NO” corresponding to “PE_1”, and is the top row, so that the access by PE10 corresponding to “PE_1” has the highest priority. Is high.

再び、図１に戻り、メモリ管理ユニット７０は、ストアバッファユニット６０と、データメモリ７１との間に配置され、ストアバッファユニット６０と、データメモリ７１との間のデータのアクセスを中継する。また、データメモリ７１（外部記憶部の一例）は、データを記憶するメモリである。 1 again, the memory management unit 70 is disposed between the store buffer unit 60 and the data memory 71, and relays data access between the store buffer unit 60 and the data memory 71. The data memory 71 (an example of an external storage unit) is a memory that stores data.

次に、図面を参照して、本実施形態によるプロセッサ１の動作について説明する。
＜命令の実行処理＞
まず、本実施形態によるプロセッサ１の命令実行の処理について説明する。
プロセッサ１は、ＰＣ値をアドレスとしてプログラムメモリに出力し、命令フェッチ部３０が、プログラムメモリから出力された命令をＩＲ３１に記憶させる。プロセッサ１は、図２に示すようなアトミックブロックに分割して、複数のＰＥ１０のうちの１つに割り当てて、アトミックブロック単位の命令を当該ＰＥ１０に実行させる。この際に、実行管理ユニット４０は、ＰＥ１０の割り当て、及び、ＰＥ１０の並列処理を管理する。 Next, the operation of the processor 1 according to the present embodiment will be described with reference to the drawings.
<Instruction execution processing>
First, the instruction execution process of the processor 1 according to the present embodiment will be described.
The processor 1 outputs the PC value as an address to the program memory, and the instruction fetch unit 30 stores the instruction output from the program memory in the IR 31. The processor 1 divides into atomic blocks as shown in FIG. 2, assigns them to one of a plurality of PEs 10, and causes the PEs 10 to execute instructions in units of atomic blocks. At this time, the execution management unit 40 manages the assignment of the PE 10 and the parallel processing of the PE 10.

図９は、本実施形態によるプロセッサ１のＰＥ１０の状態遷移を示す図である。
この図において、状態Ｓ１は、命令実行中のアクティブ状態を示し、状態Ｓ２は、保留中（ストール中）のアクティブ状態を示している。また、状態Ｓ３は、アトミックブロックの命令実行が終了したスリープ状態を示している。
図９に示すように、実行管理ユニット４０は、アトミックブロックをＰＥ１０に割り当てて、命令を実行させることで、まず、ＰＥ１０を状態Ｓ１に遷移させる。実行管理ユニット４０は、管理テーブル記憶部４１が記憶する管理テーブルの当該ＰＥ１０に対応する「有効フラグ」を“１”にする。 FIG. 9 is a diagram showing a state transition of the PE 10 of the processor 1 according to the present embodiment.
In this figure, a state S1 indicates an active state during execution of an instruction, and a state S2 indicates an active state during suspension (during stalling). A state S3 indicates a sleep state in which the instruction execution of the atomic block is finished.
As shown in FIG. 9, the execution management unit 40 assigns an atomic block to the PE 10 and executes an instruction, thereby first causing the PE 10 to transition to the state S1. The execution management unit 40 sets the “valid flag” corresponding to the PE 10 in the management table stored in the management table storage unit 41 to “1”.

状態Ｓ１において、実行管理ユニット４０は、ＰＥ１０がレジスタの読み出し命令を実行した場合の問合せに対して、管理テーブル記憶部４１が記憶する管理テーブルに基づいて、ストールする必要があるか否かを判定し、ストールする必要がある場合に、当該ＰＥ１０にストール指示を送信し、状態Ｓ２に遷移させる。
また、状態Ｓ２において、ストールの要因が解決した場合に、実行管理ユニット４０は、当該ＰＥ１０にストール解除を送信し、状態Ｓ１に遷移させる。 In the state S1, the execution management unit 40 determines whether or not stalling is necessary based on the management table stored in the management table storage unit 41 in response to an inquiry when the PE 10 executes a register read instruction. If it is necessary to stall, a stall instruction is transmitted to the PE 10 to make a transition to the state S2.
In the state S2, when the cause of the stall is solved, the execution management unit 40 transmits a stall release to the PE 10 and makes a transition to the state S1.

また、状態Ｓ１において、ＰＥ１０に分岐命令が実行された場合に、実行管理ユニット４０は、当該ＰＥ１０を状態Ｓ３に遷移させる。
また、状態Ｓ３において、前回実行したアトミックブロックが再実行される場合（キャッシュヒットの場合）に、実行管理ユニット４０は、再び状態Ｓ２に遷移させて、命令キャッシュ部１１に記憶されている命令を実行させる。 In the state S1, when a branch instruction is executed for the PE 10, the execution management unit 40 changes the PE 10 to the state S3.
In addition, in the state S3, when the previously executed atomic block is re-executed (in the case of a cache hit), the execution management unit 40 makes a transition to the state S2 again and executes the instruction stored in the instruction cache unit 11. Let it run.

また、状態Ｓ３において、ＰＥ１０の空きがなく、ＬＲＵアービタ４２４によって、当該ＰＥ１０がＬＲＵ置換え対象であると判定された場合に、実行管理ユニット４０は、当該ＰＥ１０を無効化する。すなわち、実行管理ユニット４０は、管理テーブル記憶部４１が記憶する管理テーブルの当該ＰＥ１０に対応する「有効フラグ」を“０”にする。
また、キャッシュミスが発生した場合に、実行管理ユニット４０は、無効化されているＰＥ１０のうちの１つに、アトミックブロックを割り当てて、再び状態Ｓ１に遷移させる。
また、全てのＰＥ１０に対応する「有効フラグ」が“１”である場合には、実行管理ユニット４０は、ＰＥ１０が、状態Ｓ３（スリープ状態）又は「有効フラグ」が“０”になるまで、新たなアトミックブロックの実行を保留させる。 In the state S3, when the PE 10 is not available and the LRU arbiter 424 determines that the PE 10 is an LRU replacement target, the execution management unit 40 invalidates the PE 10. That is, the execution management unit 40 sets the “valid flag” corresponding to the PE 10 in the management table stored in the management table storage unit 41 to “0”.
Further, when a cache miss occurs, the execution management unit 40 allocates an atomic block to one of the invalidated PEs 10 and makes a transition to the state S1 again.
When the “valid flag” corresponding to all the PEs 10 is “1”, the execution management unit 40 determines that the PE 10 is in the state S3 (sleep state) or the “valid flag” is “0”. Suspend execution of new atomic block.

なお、実行管理ユニット４０は、新たにアトミックブロックをＰＥ１０に割り当てる際に、管理テーブル記憶部４１が記憶する管理テーブルを更新する。具体的には、実行管理ユニット４０は、当該ＰＥ１０の「ＢＴＡ」にＰＣ値を記憶させ、「ＬＧＴ」にアトミックブロックのブロック長を記憶させる。また、実行管理ユニット４０は、当該ＰＥ１０の「ＰＴＲ１」及び「ＰＴＲ２」にアトミックブロックの分岐先のＰＥ１０に対応する「ＰＥ＿ＮＯ」（行番号）を記憶させ、「有効フラグ」を“１”にする。 The execution management unit 40 updates the management table stored in the management table storage unit 41 when a new atomic block is allocated to the PE 10. Specifically, the execution management unit 40 stores the PC value in “BTA” of the PE 10 and stores the block length of the atomic block in “LGT”. Further, the execution management unit 40 stores “PE_NO” (line number) corresponding to the branch destination PE10 of the atomic block in “PTR1” and “PTR2” of the PE10, and sets the “valid flag” to “1”. .

次に、図４を参照して、実行管理ユニット４０によるキャッシュヒットの判定について説明する。
図４において、行Ｌ１に対応するＰＥ１０が、分岐命令を実行すると、比較回路４２１が、ＰＣ値と、次に実行予定のアトミックブロックに対応するＰＥ１０（行Ｌ２）の「ＢＴＡ」とを比較し、一致する場合に、Ｈ状態を出力する。また、ＡＮＤ回路４２２は、比較回路４２１の出力と、次に実行予定のアトミックブロックに対応するＰＥ１０の「有効フラグ」との論理積演算の結果を出力する。図４に示す例では、行Ｌ２の「有効フラグ」が“１”（＝Ｈ状態）であるので、ＡＮＤ回路４２２は、Ｈ状態を出力する。そして、ＡＮＤ回路４２３が、プリデコーダ部４３から出力される分岐命令を示す分岐命令信号（ここでは、Ｈ状態）と、ＡＮＤ回路４２２の出力（Ｈ状態）との論理積演算の結果（Ｈ状態）をＨＩＴ信号として出力する。また、実行管理ユニット４０は、「ＢＴＡ」が静的に決まっているとき、行Ｌ２の「ＢＴＡ」をＰＣ更新部２０に出力して、ＰＣ２１の値を「ＢＴＡ」の値に更新させる。すなわち、図４に示す例では、実行管理ユニット４０は、ＨＩＴ信号にＨ状態を出力し、「ＰＥ＿ＮＯ」が“６”のＰＥ１０に命令キャッシュ部１１に記憶されている命令を実行させる。 Next, the cache hit determination by the execution management unit 40 will be described with reference to FIG.
In FIG. 4, when the PE 10 corresponding to the row L1 executes the branch instruction, the comparison circuit 421 compares the PC value with “BTA” of the PE 10 (row L2) corresponding to the atomic block to be executed next. If they match, the H state is output. The AND circuit 422 outputs the result of the logical product operation of the output of the comparison circuit 421 and the “valid flag” of the PE 10 corresponding to the atomic block to be executed next. In the example shown in FIG. 4, since the “valid flag” in the row L2 is “1” (= H state), the AND circuit 422 outputs the H state. Then, the AND circuit 423 performs a logical product operation result (H state) between the branch instruction signal (here, H state) indicating the branch instruction output from the predecoder unit 43 and the output (H state) of the AND circuit 422. ) As a HIT signal. Further, when “BTA” is statically determined, the execution management unit 40 outputs “BTA” in the row L2 to the PC update unit 20 to update the value of the PC 21 to the value of “BTA”. That is, in the example shown in FIG. 4, the execution management unit 40 outputs an H state to the HIT signal, and causes the PE 10 whose “PE_NO” is “6” to execute the instruction stored in the instruction cache unit 11.

次に、図１０を参照して、ＰＥ１０のパイプライン処理について説明する。
図１０は、本実施形態によるプロセッサ１のパイプライン処理の一例を示す図である。
図１０に示すように、プロセッサ１の各ＰＥ１０は、「ＩＦ」、「ＩＤ」、「ＲＲ」、「ＥＸ」、及び「ＷＲ」の５つの処理（サイクル）を並列に処理するパイプライン処理を実行する。 Next, the pipeline processing of the PE 10 will be described with reference to FIG.
FIG. 10 is a diagram illustrating an example of pipeline processing of the processor 1 according to the present embodiment.
As shown in FIG. 10, each PE 10 of the processor 1 performs pipeline processing for processing five processes (cycles) of “IF”, “ID”, “RR”, “EX”, and “WR” in parallel. Run.

ここで、「ＩＦ」は、命令のフェッチ処理であり、「ＩＤ」は、命令デコード処理（制御信号の生成処理）である。また、「ＲＲ」は、レジスタからのオペランドの読み込み処理であり、「ＥＸ」は、演算の実行処理である。また、「ＷＲ」は、レジスタへの演算結果の書き込み処理である。
図１０に示すように、ＰＥ１０は、例えば、「命令１」、「命令２」、・・・の順に、５つの処理（サイクル）をずらして実行して、パイプライン処理を実行する。 Here, “IF” is an instruction fetch process, and “ID” is an instruction decode process (control signal generation process). “RR” is an operand reading process from the register, and “EX” is an operation execution process. “WR” is a process of writing a calculation result to a register.
As shown in FIG. 10, for example, the PE 10 executes pipeline processing by shifting and executing five processes (cycles) in the order of “instruction 1”, “instruction 2”,.

次に、図１１を参照して、本実施形態によるプロセッサ１の命令の実行処理について説明する。
図１１は、本実施形態によるＰＥ１０の並列処理の一例を示す図である。
図１１（ａ）は、「アトミックブロックＡ」〜「アトミックブロックＤ」（ＡＢ＿Ａ〜ＡＢ＿Ｄ）の展開連結リストにされたプログラムの一例を示している。また、図１１（ｂ）は、プロセッサ１が有するＰＥ１０（１０−１、１０−２、１０−３、１０−４、・・・、１０−Ｎ）を示している。 Next, an instruction execution process of the processor 1 according to the present embodiment will be described with reference to FIG.
FIG. 11 is a diagram illustrating an example of parallel processing of the PE 10 according to the present embodiment.
FIG. 11A shows an example of a program in the expanded linked list of “atomic block A” to “atomic block D” (AB_A to AB_D). Moreover, FIG.11 (b) has shown PE10 (10-1, 10-2, 10-3, 10-4, ..., 10-N) which the processor 1 has.

また、図１１（ｃ）は、プロセッサ１が、図１１（ａ）に示すアトミックブロック（ＡＢ＿Ａ〜ＡＢ＿Ｄ）を実行した場合の一例を示している。
図１１（ｃ）に示すように、時刻Ｔ１において、プロセッサ１は、まず、ＰＥ１０−１（ＰＥ＿１）に、「アトミックブロックＡ」を割り当て、ＰＥ１０−１が、「アトミックブロックＡ」の命令を逐次実行する。 FIG. 11C shows an example when the processor 1 executes the atomic blocks (AB_A to AB_D) shown in FIG.
As shown in FIG. 11C, at time T1, the processor 1 first assigns “atomic block A” to the PE 10-1 (PE_1), and the PE 10-1 sequentially assigns the instruction of “atomic block A”. Run.

次に、時刻Ｔ２において、ＰＥ１０−１による「アトミックブロックＡ」の実行が完了すると、プロセッサ１は、ＰＥ１０−２（ＰＥ＿２）に、「アトミックブロックＢ」を割り当て、ＰＥ１０−２が、「アトミックブロックＢ」の命令を逐次実行する。ここで、ＰＥ１０−１の命令キャッシュ部１１には、「アトミックブロックＡ」が記憶された状態であり、管理テーブル記憶部４１には、ＰＥ１０−１に対応する管理情報が「有効フラグ」が“１”の状態で記憶されている。 Next, when execution of the “atomic block A” by the PE 10-1 is completed at time T2, the processor 1 assigns “atomic block B” to the PE 10-2 (PE_2), and the PE 10-2 receives the “atomic block A”. The instruction “B” is executed sequentially. Here, the instruction cache unit 11 of the PE 10-1 is in a state where “atomic block A” is stored, and the management information corresponding to the PE 10-1 is stored in the management table storage unit 41 as “valid flag” “ It is stored in the state of 1 ″.

次に、時刻Ｔ３において、ＰＥ１０−２による「アトミックブロックＢ」の実行が完了すると、プロセッサ１は、ＰＥ１０−３（ＰＥ＿３）に、「アトミックブロックＤ」を割り当て、ＰＥ１０−２が、「アトミックブロックＢ」の命令を逐次実行する。ここで、ＰＥ１０−２の命令キャッシュ部１１には、「アトミックブロックＢ」が記憶された状態であり、管理テーブル記憶部４１には、ＰＥ１０−２に対応する管理情報が「有効フラグ」が“１”の状態で記憶されている。 Next, when execution of the “atomic block B” by the PE 10-2 is completed at time T3, the processor 1 assigns “atomic block D” to the PE 10-3 (PE_3), and the PE 10-2 receives the “atomic block B”. The instruction “B” is executed sequentially. Here, the instruction cache unit 11 of the PE 10-2 is in a state where “atomic block B” is stored, and the management information corresponding to the PE 10-2 is stored in the management table storage unit 41 as “valid flag”. It is stored in the state of 1 ″.

次に、時刻Ｔ４において、ＰＥ１０−３による「アトミックブロックＤ」の実行が完了すると、プロセッサ１は、ＰＥ１０−２（ＰＥ＿２）に、命令キャッシュ部１１に記憶されている「アトミックブロックＢ」の命令を実行させる。ここで、実行管理ユニット４０は、上述した図４に示す処理によりキャッシュヒットを判定し、ＰＥ１０−２（ＰＥ＿２）に「アトミックブロックＢ」を再実行させる。
また、時刻Ｔ５において、プロセッサ１は、さらに、ＰＥ１０−３（ＰＥ＿３）に、命令キャッシュ部１１に記憶されている「アトミックブロックＤ」の命令を実行させる。ここで、実行管理ユニット４０は、上述した図４に示す処理によりキャッシュヒットを判定し、ＰＥ１０−３（ＰＥ＿３）に「アトミックブロックＤ」を再実行させる。これにより、ＰＥ１０−２とＰＥ１０−３とが並列実行される。 Next, when execution of the “atomic block D” by the PE 10-3 is completed at the time T4, the processor 1 stores the instruction of the “atomic block B” stored in the instruction cache unit 11 in the PE 10-2 (PE_2). Is executed. Here, the execution management unit 40 determines a cache hit by the process shown in FIG. 4 described above, and causes the PE 10-2 (PE_2) to re-execute “atomic block B”.
At time T5, the processor 1 further causes the PE 10-3 (PE_3) to execute the instruction of “atomic block D” stored in the instruction cache unit 11. Here, the execution management unit 40 determines a cache hit by the process shown in FIG. 4 described above, and causes the PE 10-3 (PE_3) to re-execute “atomic block D”. Thereby, PE10-2 and PE10-3 are executed in parallel.

続く時刻Ｔ６及び時刻Ｔ７において、プロセッサ１は、時刻Ｔ４及び時刻Ｔ５と同様の処理を実行し、ＰＥ１０−２とＰＥ１０−３とが並列実行される。
なお、図１１（ｃ）において、時刻Ｔ４以降の期間ＰＴ１は、並列実行の期間である。
このように、実行管理ユニット４０は、割り当てたアトミックブロック（「アトミックブロックＢ」及び「アトミックブロックＤ」を複数のＰＥ１０（１０−２、１０−３）に並列実行させる。 At subsequent time T6 and time T7, the processor 1 executes the same processing as at time T4 and time T5, and the PE 10-2 and PE 10-3 are executed in parallel.
In FIG. 11C, a period PT1 after time T4 is a period of parallel execution.
As described above, the execution management unit 40 causes the plurality of PEs 10 (10-2, 10-3) to execute the assigned atomic blocks (“atomic block B” and “atomic block D”) in parallel.

＜データ依存性の回避処理＞
次に、本実施形態によるプロセッサ１のデータ依存性の回避処理について説明する。
プロセッサ１のレジスタ管理ユニット５０は、以下の基本ルールに基づいて、ＰＥ１０のレジスタ間のデータ依存性を回避する処理を実行する。
（１）ＰＥ１０は、レジスタにデータの書き込みを行う際には、自身のレジスタにデータを書き込む。また、ＰＥ１０は、レジスタに対して、書き込みを行った場合には、レジスタ管理ユニット５０に更新履歴テーブルの更新を依頼し、レジスタ管理ユニット５０は、更新履歴記憶部５１の更新履歴テーブルを更新する。 <Data dependency avoidance process>
Next, the data dependence avoidance process of the processor 1 according to the present embodiment will be described.
The register management unit 50 of the processor 1 executes processing for avoiding data dependency between the registers of the PE 10 based on the following basic rules.
(1) When writing data to the register, the PE 10 writes data to its own register. In addition, when the PE 10 performs writing to the register, the PE 10 requests the register management unit 50 to update the update history table, and the register management unit 50 updates the update history table in the update history storage unit 51. .

（２）ＰＥ１０は、レジスタからデータの読み込みを行う際には、レジスタ管理ユニット５０に問合せを行う。レジスタ管理ユニット５０は、更新履歴記憶部５１の更新履歴テーブルに基づいて、命令実行の保留を指示（ＳＴＡＬＬＢ信号を出力）するとともに、保留の要因が解決（例えば、書き込み完了など）した場合に、保留の解除を指示する。レジスタ管理ユニット５０は、更新履歴テーブルに基づいて、自身のＰＥ１０のレジスタと、最も近い過去に実行されたＰＥ１０のレジスタとのいずれかからデータを、ＰＥ１０に読み込ませる。 (2) The PE 10 makes an inquiry to the register management unit 50 when reading data from the register. Based on the update history table of the update history storage unit 51, the register management unit 50 instructs the suspension of instruction execution (outputs the STALLB signal), and when the cause of the suspension is resolved (for example, write completion), Instruct to release the hold. Based on the update history table, the register management unit 50 causes the PE 10 to read data from either the register of its own PE 10 or the register of the PE 10 executed in the nearest past.

レジスタ管理ユニット５０は、例えば、自身のＰＥ１０内のＲＡＷハザードが発生していない（「ＲＡＷ」が“０”である）場合、且つ、最も近い過去に実行されたＰＥ１０におけるレジスタへの書き込みが完了した場合に、最も近い過去に実行されたＰＥ１０のレジスタのデータを、読み込み要求のあったＰＥ１０に読み込ませる。また、レジスタ管理ユニット５０は、自身のＰＥ１０内のＲＡＷハザードが発生している（「ＲＡＷ」が“１”である）場合に、読み込み要求のあったＰＥ１０に、自身のレジスタのデータを読み込ませる。 For example, when there is no RAW hazard in its own PE 10 (“RAW” is “0”), the register management unit 50 completes writing to the register in the PE 10 executed in the nearest past In this case, the data of the register of the PE 10 executed in the nearest past is read into the PE 10 that has requested reading. Further, the register management unit 50 causes the PE 10 that has made a read request to read the data of its own register when a RAW hazard has occurred in its own PE 10 (“RAW” is “1”). .

次に、図６を参照して、レジスタ管理ユニット５０によるＲＡＷハザードが発生した際のＳＴＡＬＬＢ信号を生成する動作について説明する。
図６において、更新履歴情報ＥＮ２に対応するＰＥ１０−３（ＰＥ＿３）において、レジスタＲｅｇＸに読み出し命令が実行された場合に、ＰＥ１０−３（ＰＥ＿３）は、レジスタ管理ユニット５０にストールするか否かの問合せを行う。レジスタ管理ユニット５０は、ＰＥ１０−３（ＰＥ＿３）に対応する更新履歴情報ＥＮ２のＡフラグが“０”であるため、更新履歴情報ＥＮ２の「ＲＡＷ」を“０”のままとし、マルチプレクサ５２により、更新履歴情報ＥＮ１のＷフラグと更新履歴情報ＥＮ２の「ＲＡＷ」とに基づいてＳＴＡＬＬＢ信号にＬ状態を出力する。その結果、当該ＰＥ１０−３（ＰＥ＿３）が、レジスタＲｅｇＸの読み出し命令の実行を保留（ストール）する。 Next, an operation for generating the STALLB signal when a RAW hazard occurs by the register management unit 50 will be described with reference to FIG.
In FIG. 6, in the PE 10-3 (PE_3) corresponding to the update history information EN2, whether or not the PE 10-3 (PE_3) stalls in the register management unit 50 when a read instruction is executed to the register RegX. Make an inquiry. Since the A flag of the update history information EN2 corresponding to PE10-3 (PE_3) is “0”, the register management unit 50 keeps “RAW” of the update history information EN2 at “0”. Based on the W flag of the update history information EN1 and “RAW” of the update history information EN2, the L state is output to the STALLB signal. As a result, the PE 10-3 (PE_3) suspends execution of the read instruction of the register RegX.

また、更新履歴情報ＥＮ１に対応するＰＥ１０において、レジスタＲｅｇＸにおける最後の書き込みが完了すると、レジスタ管理ユニット５０は、更新履歴情報ＥＮ１のＷフラグを“１”にする。その結果、レジスタ管理ユニット５０は、マルチプレクサ５２により、ＳＴＡＬＬＢ信号にＨ状態を出力し、当該ＰＥ１０が、レジスタＲｅｇＸの読み出し命令を再開する。 When the last write in the register RegX is completed in the PE 10 corresponding to the update history information EN1, the register management unit 50 sets the W flag of the update history information EN1 to “1”. As a result, the register management unit 50 outputs an H state to the STALLB signal by the multiplexer 52, and the PE 10 resumes the read instruction for the register RegX.

次に、図１２を参照して、本実施形態によるＰＥ１０間のプロセッサ１のＲＡＷハザード処理について説明する。
図１２は、本実施形態によるプロセッサ１のＰＥ１０間のＲＡＷハザード処理の一例を示す図である。
図１２は、２つのＰＥ１０（ＰＥ＿１、ＰＥ＿３）が並列に命令実行され、且つ、各ＰＥ１０がパイプライン処理を実行している状態を示している。また、図１２において、波形Ｗ１は、ＰＥ＿３の「ＲＡＷ」の論理状態を示し、波形Ｗ２は、ＰＥ＿３のＡフラグの論理状態を示している。また、波形Ｗ３は、ＰＥ＿１のＡフラグの論理状態を示し、波形Ｗ４は、ＰＥ＿１のＷフラグの論理状態を示し、波形Ｗ５は、ＰＥ＿３のＳＴＡＬＬＢ信号の論理状態を示している。なお、各波形において、横軸は、時間を示し、縦軸は、論理状態を示している。 Next, the RAW hazard processing of the processor 1 between the PEs 10 according to the present embodiment will be described with reference to FIG.
FIG. 12 is a diagram illustrating an example of a RAW hazard process between the PEs 10 of the processor 1 according to the present embodiment.
FIG. 12 shows a state in which two PEs 10 (PE_1, PE_3) execute instructions in parallel, and each PE 10 executes pipeline processing. In FIG. 12, a waveform W1 indicates the logic state of “RAW” of PE_3, and a waveform W2 indicates the logic state of the A flag of PE_3. Waveform W3 shows the logic state of the A flag of PE_1, waveform W4 shows the logic state of the W flag of PE_1, and waveform W5 shows the logic state of the STALLB signal of PE_3. In each waveform, the horizontal axis indicates time, and the vertical axis indicates a logical state.

例えば、時刻Ｔ１１において、ＰＥ１０−１（ＰＥ＿１）が、レジスタＲｅｇＸに書き込み命令を実行すると、レジスタ管理ユニット５０は、ＰＥ＿１に対応する管理情報のＡフラグを“１”（Ｈ状態）に更新する。
また、同様に時刻Ｔ１１において、ＰＥ１０−３（ＰＥ＿３）が、レジスタＲｅｇＸに読み出し命令を実行すると、レジスタ管理ユニット５０は、ＰＥ＿３に対応する管理情報の「ＲＡＷ」が“０”（Ｌ状態）であるため、マルチプレクサ５３によって、最も近い過去に実行されたＰＥ＿１をＲｅｇＸの読み込み元として選択する。また、レジスタ管理ユニット５０は、ＰＥ＿１に対応する管理情報のＷフラグと、ＰＥ＿３に対応する管理情報の「ＲＡＷ」とに基づいて、マルチプレクサ５２によって、ＰＥ＿３のＳＴＡＬＬＢ信号をＬ状態にする。その結果、ＰＥ＿３の命令実行が保留される。 For example, when the PE 10-1 (PE_1) executes a write instruction to the register RegX at time T11, the register management unit 50 updates the A flag of the management information corresponding to PE_1 to “1” (H state).
Similarly, at time T11, when the PE 10-3 (PE_3) executes a read instruction to the register RegX, the register management unit 50 sets “RAW” of the management information corresponding to PE_3 to “0” (L state). Therefore, the multiplexer 53 selects the PE_1 executed in the past in the past as the RegX reading source. Also, the register management unit 50 sets the STALLB signal of PE_3 to the L state by the multiplexer 52 based on the W flag of the management information corresponding to PE_1 and the “RAW” of the management information corresponding to PE_3. As a result, the instruction execution of PE_3 is suspended.

次に、時刻Ｔ１２において、ＰＥ１０−１（ＰＥ＿１）が、レジスタＲｅｇＸに対する最後の書き込み命令を実行すると、レジスタ管理ユニット５０は、ＰＥ＿１に対応する管理情報のＷフラグを“１”（Ｈ状態）に更新する。レジスタ管理ユニット５０は、ＰＥ＿１に対応する管理情報のＷフラグと、ＰＥ＿３に対応する管理情報の「ＲＡＷ」とに基づいて、ＰＥ＿３のＳＴＡＬＬＢ信号をＨ状態にする。その結果、ＰＥ＿３の命令実行の保留が解除される。ここで、時刻Ｔ１１〜時刻Ｔ１２までの期間ＳＴ１が、ストール期間となる。この場合、ＰＥ１０−３（ＰＥ＿３）は、ＰＥ１０−１（ＰＥ＿１）によるレジスタＲｅｇＸの書き込みが完了した後に、レジスタＲｅｇＸの値を読み出すため、正常に命令を実行することができる。 Next, at time T12, when the PE 10-1 (PE_1) executes the last write instruction for the register RegX, the register management unit 50 sets the W flag of the management information corresponding to PE_1 to “1” (H state). Update. Based on the W flag of the management information corresponding to PE_1 and “RAW” of the management information corresponding to PE_3, the register management unit 50 sets the STALLB signal of PE_3 to the H state. As a result, the instruction execution suspension of PE_3 is released. Here, a period ST1 from time T11 to time T12 is a stall period. In this case, the PE 10-3 (PE_3) reads the value of the register RegX after the writing of the register RegX by the PE 10-1 (PE_1) is completed, so that the instruction can be executed normally.

次に、図１３を参照して、本実施形態によるプロセッサ１の自身のＰＥ１０内のＲＡＷハザード処理について説明する。
図１３は、本実施形態によるプロセッサ１の自身のＰＥ１０内のＲＡＷハザード処理の一例を示す図である。
図１３は、ＰＥ１０−３（ＰＥ＿３）が命令実行され、且つ、パイプライン処理を実行している状態を示している。また、図１３において、波形Ｗ６は、ＰＥ＿３の「ＲＡＷ」の論理状態を示し、波形Ｗ７は、ＰＥ＿３のＡフラグの論理状態を示している。また、波形Ｗ８は、ＰＥ＿３のＳＴＡＬＬＢ信号の論理状態を示している。なお、各波形において、横軸は、時間を示し、縦軸は、論理状態を示している。 Next, with reference to FIG. 13, the RAW hazard processing in the PE 10 of the processor 1 according to the present embodiment will be described.
FIG. 13 is a diagram showing an example of a RAW hazard process in the PE 10 of the processor 1 according to the present embodiment.
FIG. 13 shows a state in which PE10-3 (PE_3) is executing an instruction and executing pipeline processing. In FIG. 13, a waveform W6 indicates the logic state of “RAW” of PE_3, and a waveform W7 indicates the logic state of the A flag of PE_3. Waveform W8 shows the logic state of the STALLB signal of PE_3. In each waveform, the horizontal axis indicates time, and the vertical axis indicates a logical state.

例えば、時刻Ｔ２１において、ＰＥ１０−３（ＰＥ＿３）が、レジスタＲｅｇＸに書き込み命令を実行すると、レジスタ管理ユニット５０は、ＰＥ＿３に対応する管理情報のＡフラグを“１”（Ｈ状態）に更新する。
次に、時刻Ｔ２２において、ＰＥ１０−３（ＰＥ＿３）が、レジスタＲｅｇＸに読み出し命令を実行すると、レジスタ管理ユニット５０は、管理情報のＡフラグが“１”（Ｈ状態）であるため、ＰＥ＿３に対応する管理情報の「ＲＡＷ」を“１”（１状態）に更新する。そして、レジスタ管理ユニット５０は、マルチプレクサ５３によって、自身のＰＥ＿３をＲｅｇＸの読み込み元として選択する。また、レジスタ管理ユニット５０は、ＰＥ＿３に対応する管理情報のＡフラグと、ＰＥ＿３に対応する管理情報の「ＲＡＷ」とに基づいて、マルチプレクサ５２によって、ＰＥ＿３のＳＴＡＬＬＢ信号をＨ状態にする。その結果、ＰＥ＿３の命令実行が保留されずにそのまま実行される。
このように、自身のＰＥ１０において、レジスタＲｅｇＸに書き込みが実行されている場合には、ＰＥ１０は、自身のレジスタＲｅｇＸからデータを読み出す（読み込む）。 For example, when the PE 10-3 (PE_3) executes a write instruction to the register RegX at time T21, the register management unit 50 updates the A flag of the management information corresponding to PE_3 to “1” (H state).
Next, when PE10-3 (PE_3) executes a read instruction to the register RegX at time T22, the register management unit 50 corresponds to PE_3 because the A flag of the management information is “1” (H state). Management information “RAW” is updated to “1” (1 state). Then, the register management unit 50 uses the multiplexer 53 to select its own PE_3 as the RegX reading source. Also, the register management unit 50 sets the STALLB signal of PE_3 to the H state by the multiplexer 52 based on the A flag of the management information corresponding to PE_3 and the “RAW” of the management information corresponding to PE_3. As a result, the instruction execution of PE_3 is executed as it is without being suspended.
As described above, when writing is performed in the register RegX in the PE 10 itself, the PE 10 reads (reads) data from the register RegX.

＜ストアバッファ処理＞
次に、本実施形態によるプロセッサ１のストアバッファ処理について説明する。
ストアバッファユニット６０は、ストアバッファ記憶部６１のストアバッファテーブルに基づいて、ロード命令に対して、当該ロード命令の「Ａｄｄｒ」を利用して、エントリテーブル記憶部６２の「ＦＩＦＯ」上の自身のストアバッファテーブルを含めた最も近いストアバッファテーブルを一つ検出する。ストアバッファユニット６０は、当該ストアバッファテーブルの中で、最も新しく（最も「ＬＰＣ」の値が大きく）、且つ当該ロード命令の「Ａｄｄｒ」の値が一致しているアクセス情報のデータをオペランドデータネットワークＮＷ２に出力する（ロードバイパス処理）。これにより、ストアバッファユニット６０は、データメモリ７１に過去に書き込んだ（ストアした）データを再利用して、再びデータメモリ７１から読み出す（ロードする）必要がないので高速にデータを読み出す（ロードする）ことができる。また、データメモリ７１から読み出したデータが更新されていない場合には、ストアバッファユニット６０は、データメモリ７１から過去に時間を掛けて読み出した（ロードした）データを再利用して、再びデータメモリ７１から読み出す（ロードする）必要がないので高速にデータを読み出す（ロードする）ことができる。 <Store buffer processing>
Next, the store buffer process of the processor 1 according to the present embodiment will be described.
Based on the store buffer table of the store buffer storage unit 61, the store buffer unit 60 uses its “Addr” of the load instruction in response to the load instruction, and stores its own on the “FIFO” of the entry table storage unit 62. Find the closest store buffer table including the store buffer table. The store buffer unit 60 transfers the access information data that is the newest (the largest “LPC” value) in the store buffer table and that matches the “Addr” value of the load instruction to the operand data network. Output to NW2 (load bypass process). As a result, the store buffer unit 60 reuses the data previously written (stored) in the data memory 71 and does not need to read (load) the data from the data memory 71 again, so the data is read (loaded) at high speed. )be able to. If the data read from the data memory 71 has not been updated, the store buffer unit 60 reuses the data that has been read (loaded) from the data memory 71 in the past and reuses the data memory again. Since there is no need to read (load) from 71, data can be read (loaded) at high speed.

また、メモリ管理ユニット７０は、エントリテーブル記憶部６２の「ＦＩＦＯ」順に各ストアバッファテーブルを検索し、各ストアバッファテーブル上で逐次、「Ｌｄ」（ロード）あるいは「Ｓｔ」（ストア）のフラグが“１”である行に対するデータメモリ７１のアクセスを実行する。ストアバッファユニット６０は、ストアバッファテーブルを検索して該当「Ａｄｄｒ」を持つ行からロードバイパスし、このアクセスの終了とともに「完了」（完了フラグ）を“１”にする。なお、「完了」（完了フラグ）が“１”になった際に、レジスタ管理ユニット５０は、当該命令がロード命令である場合は、ロード先のレジスタに該当する「ＵＰＤ」及びＡフラグを“１”にする。また、レジスタ管理ユニット５０は、当該命令がそのレジスタの最後の書き込み（ロード命令）の場合には、該当するＷフラグを“１”にする。 Further, the memory management unit 70 searches each store buffer table in the “FIFO” order of the entry table storage unit 62, and sequentially sets the flag “Ld” (load) or “St” (store) on each store buffer table. The data memory 71 is accessed for the row of “1”. The store buffer unit 60 searches the store buffer table, performs load bypass from the row having the corresponding “Addr”, and sets “complete” (completion flag) to “1” upon completion of this access. When the “completion” (completion flag) becomes “1”, the register management unit 50 sets “UPD” and A flag corresponding to the load destination register to “ Set to 1 ”. Further, when the instruction is the last write (load instruction) of the register, the register management unit 50 sets the corresponding W flag to “1”.

また、ストアバッファユニット６０は、当該命令がストア命令の場合には、ストア元のレジスタを読み込む必要があるため、レジスタ管理ユニット５０の管理テーブルに問合せて、読み込み不可能である場合には、読み込み可能になるまで、命令実行を保留させる。また、レジスタの読み込み後に、ストアバッファユニット６０は、レジスタの値、及びアドレス値を、ストアバッファテーブルの「Ｄａｔａ」及び「Ａｄｄｒ」に記憶させる。また、ストアバッファユニット６０は、「終了」（終了フラグ）を“１”にする。 Further, when the instruction is a store instruction, the store buffer unit 60 needs to read the register of the store source. Therefore, the store buffer unit 60 inquires the management table of the register management unit 50. Suspend instruction execution until possible. In addition, after reading the register, the store buffer unit 60 stores the register value and the address value in “Data” and “Addr” of the store buffer table. Further, the store buffer unit 60 sets “end” (end flag) to “1”.

次に、メモリ管理ユニット７０は、ストアバッファテーブルの「Ｓｔ」が“１”である場合に、対応する「終了」（終了フラグ）が“１”になるまで待つ（ストール状態となる）。また、メモリ管理ユニット７０は、「終了」（終了フラグ）が“１”になった場合に、データメモリ７１へのメモリアクセスを開始し、当該ストア（書き込み）が完了した場合に、ストアバッファユニット６０に、該当する「完了」（完了フラグ）を“１”にさせて、次のメモリアクセスの処理に移る。 Next, when “St” of the store buffer table is “1”, the memory management unit 70 waits until the corresponding “end” (end flag) becomes “1” (becomes a stalled state). The memory management unit 70 starts the memory access to the data memory 71 when the “end” (end flag) becomes “1”, and when the store (write) is completed, the store buffer unit At 60, the corresponding “complete” (completion flag) is set to “1”, and the process proceeds to the next memory access process.

このように、ストアバッファユニット６０は、ＰＥ１０からデータメモリ７１にアクセス要求があった場合に、ストアバッファ記憶部６１が記憶する当該ＰＥ１０に対応するストアバッファテーブルに基づいて、データメモリ７１とのアクセスを制御する。 As described above, when there is an access request from the PE 10 to the data memory 71, the store buffer unit 60 accesses the data memory 71 based on the store buffer table corresponding to the PE 10 stored in the store buffer storage unit 61. To control.

以上説明したように、本実施形態によるプロセッサ１は、複数のＰＥ１０（プロセッシングエレメント）と、実行管理ユニット４０（実行管理部）とを備えている。複数のＰＥ１０は、それぞれが、割り当てられたアトミックブロック（単位命令列）を一時記憶する命令キャッシュ部１１（一時記憶部）を有し、それぞれの命令キャッシュ部１１に記憶されたアトミックブロックに含まれる命令を実行可能に構成されている。実行管理ユニット４０は、アトミックブロックに分割し、分割した当該アトミックブロックを複数のＰＥ１０のそれぞれに逐次割り当て、割り当てたアトミックブロックを複数のＰＥ１０に並列実行させる。ここで、アトミックブロックは、アセンブリ言語以下のレベルの命令列であるプログラムを、途中に分岐命令を含まない命令列であって、分岐先の先頭命令を始端とし、分岐命令を終端とする命令列である。 As described above, the processor 1 according to the present embodiment includes the plurality of PEs 10 (processing elements) and the execution management unit 40 (execution management unit). Each of the plurality of PEs 10 includes an instruction cache unit 11 (temporary storage unit) that temporarily stores an assigned atomic block (unit instruction sequence), and is included in the atomic block stored in each instruction cache unit 11. The instruction is configured to be executable. The execution management unit 40 divides into atomic blocks, sequentially assigns the divided atomic blocks to each of the plurality of PEs 10, and causes the plurality of PEs 10 to execute the assigned atomic blocks in parallel. Here, an atomic block is an instruction sequence that does not include a branch instruction in the middle of a program that is an instruction sequence of the assembly language or lower level, and has a branch instruction at the beginning and a branch instruction at the end. It is.

これにより、本実施形態によるプロセッサ１は、アトミックブロック単位で、ＰＲ１０に割り当て、割り当てたアトミックブロックを複数のＰＥ１０に並列実行させるため、例えば、非数値計算処理を高速処理することができる。なお、本実施形態によるプロセッサ１は、以下のようなアプリケーションソフトウェアに有効である。
（１）単一スレッドに対して高速処理が求められるアプリケーション。
（２）複数のアトミックブロックで、且つ、用意されたＰＥ１０の数の範囲内でこのブロック群が構成されループ構造のあるアルゴリズムを持つアプリケーション。
（３）加えて複雑な制御フローを持つアルゴリズムで構成されたアプリケーション。 As a result, the processor 1 according to the present embodiment assigns to the PR 10 in units of atomic blocks, and causes the assigned atomic blocks to be executed in parallel by the plurality of PEs 10, so that, for example, high-speed numerical processing can be performed. Note that the processor 1 according to the present embodiment is effective for the following application software.
(1) An application that requires high-speed processing for a single thread.
(2) An application having an algorithm having a loop structure in which a plurality of atomic blocks and this block group are configured within the number of prepared PEs 10.
(3) In addition, an application configured with an algorithm having a complicated control flow.

例えば、本実施形態によるプロセッサ１は、上記要件を満たす、非数値計算アプリケーションである、制御問題、グラフ問題、情報検索、コンピュータ支援型設計、定理証明などに有効である。
また、本実施形態によるプロセッサ１は、専用のコンパイラを用意する必要なしに、複数のＰＥ１０に並列実行させて、非数値計算処理を高速処理することができる。 For example, the processor 1 according to the present embodiment is effective for control problems, graph problems, information retrieval, computer-aided design, theorem proving, and the like, which are non-numerical calculation applications that satisfy the above requirements.
Further, the processor 1 according to the present embodiment can execute non-numeric calculation processing at high speed by causing a plurality of PEs 10 to execute in parallel without preparing a dedicated compiler.

また、本実施形態では、実行管理ユニット４０は、分割したアトミックブロックが複数のＰＥ１０のいずれの命令キャッシュ部１１にも記憶されていない場合に、当該アトミックブロックを複数のＰＥ１０のいずれかに割り当てる。そして、実行管理ユニット４０は、分割したアトミックブロックが複数のＰＥ１０のいずれかの命令キャッシュ部１１に既に記憶されている場合に、当該ＰＥ１０に、当該命令キャッシュ部１１に既に記憶されているアトミックブロックを並列実行させる。
これにより、本実施形態によるプロセッサ１は、アトミックブロック単位でＰＥ１０への割り当て及び並列実行を容易に実現することができる。 In this embodiment, the execution management unit 40 assigns the atomic block to any one of the plurality of PEs 10 when the divided atomic block is not stored in any instruction cache unit 11 of the plurality of PEs 10. When the divided atomic block is already stored in any instruction cache unit 11 of the plurality of PEs 10, the execution management unit 40 stores the atomic block already stored in the instruction cache unit 11 in the PE 10 Are executed in parallel.
Thereby, the processor 1 according to the present embodiment can easily realize allocation to the PE 10 and parallel execution in units of atomic blocks.

また、本実施形態によるプロセッサ１は、プログラムにおけるアトミックブロックの先頭命令の位置を示す先頭位置情報（「ＢＴＡ」）と、当該アトミックブロックが有効であるか否かを示す有効情報（「有効フラグ」）と、次に実行するＰＥ１０を示す次実行情報（「ＰＴＲ１」、「ＰＴＲ２」）とを対応付けた管理情報を複数有する管理テーブルを記憶する管理テーブル記憶部４１を備える。実行管理ユニット４０は、プログラムに分岐命令が検出された際に、管理テーブル記憶部４１が記憶する管理テーブルに基づいて、次に実行するアトミックブロックが複数のＰＥ１０のいずれかの命令キャッシュ部１１に既に記憶されており、且つ有効な状態であるか否かを判定する。
これにより、本実施形態によるプロセッサ１は、キャッシュミス、及びキャッシュヒットの判定を管理テーブルに基づいてより容易に行うことができる。 Further, the processor 1 according to the present embodiment has the head position information (“BTA”) indicating the position of the head instruction of the atomic block in the program and the valid information (“valid flag”) indicating whether or not the atomic block is valid. ) And the next execution information (“PTR1”, “PTR2”) indicating the PE10 to be executed next, the management table storage unit 41 that stores a management table having a plurality of management information. When a branch instruction is detected in the program, the execution management unit 40 has an atomic block to be executed next in any instruction cache unit 11 of the plurality of PEs 10 based on the management table stored in the management table storage unit 41. It is determined whether it is already stored and valid.
Thereby, the processor 1 according to the present embodiment can more easily determine the cache miss and the cache hit based on the management table.

また、本実施形態では、実行管理ユニット４０は、プログラムに分岐命令が検出された際に、管理テーブル記憶部４１が記憶する管理テーブルから、次実行情報（「ＰＴＲ１」、「ＰＴＲ２」）に基づいて当該分岐命令の次に実行するアトミックブロックに対応する先頭位置情報（「ＢＴＡ」）及び有効情報（「有効フラグ」）を取得する。そして、実行管理ユニット４０は、取得した先頭位置情報の示す位置と、次に実行するプログラムの実行位置とが一致し、且つ、取得した有効情報が有効を示す場合に、次実行情報に一致するＰＥ１０が、アトミックブロックを命令キャッシュ部１１に記憶しており、且つ、有効な状態であると判定する。
これにより、本実施形態によるプロセッサ１は、キャッシュヒットを簡易な手法により適切に判定することができる。 Further, in this embodiment, the execution management unit 40 is based on the next execution information (“PTR1”, “PTR2”) from the management table stored in the management table storage unit 41 when a branch instruction is detected in the program. Thus, the head position information (“BTA”) and valid information (“valid flag”) corresponding to the atomic block to be executed next to the branch instruction are acquired. Then, the execution management unit 40 matches the next execution information when the position indicated by the acquired head position information matches the execution position of the program to be executed next and the acquired valid information indicates validity. The PE 10 stores the atomic block in the instruction cache unit 11 and determines that it is in a valid state.
Thereby, the processor 1 according to the present embodiment can appropriately determine a cache hit by a simple method.

また、本実施形態では、実行管理ユニット４０は、取得した先頭位置情報の示す位置と、次に実行するプログラムの実行位置とが一致しない、又は取得した有効情報が無効を示す場合に、複数のＰＥ１０の１つに次に実行するアトミックブロックを割り当てるとともに、割り当てたアトミックブロックに対応させて、管理テーブルを更新する。
これにより、本実施形態によるプロセッサ１は、キャッシュミスの場合に、簡易な処方により、アトミックブロックを適切にＰＥ１０に割り当てることができる。 Further, in the present embodiment, the execution management unit 40 has a plurality of positions when the position indicated by the acquired head position information does not match the execution position of the program to be executed next, or when the acquired valid information indicates invalidity. The atomic block to be executed next is assigned to one of the PEs 10 and the management table is updated in correspondence with the assigned atomic block.
Thereby, the processor 1 by this embodiment can allocate an atomic block to PE10 appropriately by simple prescription in the case of a cache miss.

また、本実施形態では、実行管理ユニット４０は、複数のＰＥ１０の１つに次に実行するアトミックブロックを割り当てる場合に、管理テーブルの当該アトミックブロックに対応する先頭位置情報（「ＢＴＡ」）を記憶させるとともに、有効情報（「有効フラグ」）を、無効を示す値に更新する。そして、実行管理ユニット４０は、ＰＥ１０が、当該アトミックブロックの実行を完了した場合に、当該アトミックブロックに対応する有効情報（「有効フラグ」）を、有効を示す値に更新する。
これにより、本実施形態によるプロセッサ１は、既に実行したアトミックブロックを適切に管理することができる。 In the present embodiment, when the execution management unit 40 assigns the next atomic block to be executed to one of the plurality of PEs 10, the execution management unit 40 stores the head position information (“BTA”) corresponding to the atomic block in the management table. At the same time, the valid information (“valid flag”) is updated to a value indicating invalidity. Then, when the PE 10 completes execution of the atomic block, the execution management unit 40 updates the valid information (“valid flag”) corresponding to the atomic block to a value indicating validity.
Thereby, the processor 1 by this embodiment can manage the atomic block already performed appropriately.

また、本実施形態では、ＰＥ１０は、パイプライン処理により命令を実行する。
これにより、本実施形態によるプロセッサ１は、パイプライン処理により、さらに高速に命令を実行することができる。 In the present embodiment, the PE 10 executes instructions by pipeline processing.
Thereby, the processor 1 according to the present embodiment can execute instructions at higher speed by pipeline processing.

また、本実施形態では、複数のＰＥ１０のそれぞれは、命令の実行に利用されるレジスタファイル部１２（レジスタ）を有している。プロセッサ１は、さらに、レジスタ管理ユニット５０（レジスタ管理部）を備えている。レジスタ管理ユニット５０は、複数のＰＥ１０の間で、レジスタの書き込み後の読み出しによるデータ依存であるＲＡＷハザードが発生した場合に、最も近い過去に当該レジスタに書き込みを実行しているＰＥ１０によるアトミックブロックの当該レジスタへの書き込みが完了するまで、当該レジスタの読み出しを実行するＰＥ１０の命令実行を保留させる。
これにより、本実施形態によるプロセッサ１は、ＲＡＷハザードが発生した場合においても、適切に命令を実行することができる。すなわち、本実施形態によるプロセッサ１は、ＰＥ１０のレジスタ間のデータ依存性を回避することができる。 In the present embodiment, each of the plurality of PEs 10 includes a register file unit 12 (register) that is used to execute an instruction. The processor 1 further includes a register management unit 50 (register management unit). When a RAW hazard, which is data-dependent due to reading after register writing, occurs between the plurality of PEs 10, the register management unit 50 creates an atomic block by the PE 10 that has performed writing to the register in the nearest past. Until the writing to the register is completed, the instruction execution of the PE 10 that reads the register is suspended.
As a result, the processor 1 according to the present embodiment can appropriately execute instructions even when a RAW hazard occurs. That is, the processor 1 according to the present embodiment can avoid data dependency between the registers of the PE 10.

また、本実施形態によるプロセッサ１は、ＰＥ１０を識別するＰＥ識別情報（「ＰＥ＿１」など）と、アトミックブロックの実行において、レジスタが更新されたことを示す更新情報（「ＵＰＤ」）と、アトミックブロックにおけるレジスタの最後の書き込みが完了したことを示す完了情報（Ｗフラグ）と、レジスタの書き込みが実行されたことを示す実行情報（Ａフラグ）とを対応付けた更新履歴情報をＰＥ１０の個数分有する更新履歴テーブルを記憶する更新履歴記憶部５１を備えている。そして、レジスタ管理ユニット５０は、更新履歴記憶部５１が記憶する更新履歴テーブルに基づいて、ＰＥ１０間でＲＡＷハザードが発生したか否かを判定するとともに、ＰＥ１０間でＲＡＷハザードが発生した場合（例えば、「ＲＡＷ」が“０”の場合）に、更新履歴テーブルに含まれる、最も近い過去にレジスタに書き込みを実行しているＰＥ１０に対応する完了情報に基づいて、ＲＡＷハザードの発生したＰＥ１０の命令実行を保留させる。
これにより、本実施形態によるプロセッサ１は、更新履歴情報に基づいて、ＲＡＷハザードが発生した場合に、ＲＡＷハザードの発生したＰＥ１０の命令実行を適切に保留（ストール）させることができる。 The processor 1 according to the present embodiment also includes PE identification information (such as “PE — 1”) for identifying the PE 10, update information (“UPD”) indicating that the register has been updated in the execution of the atomic block, and the atomic block. Update history information associating completion information (W flag) indicating completion of the last register write with execution information (A flag) indicating that register write has been executed for the number of PEs 10. An update history storage unit 51 that stores an update history table is provided. Then, the register management unit 50 determines whether or not a RAW hazard has occurred between the PEs 10 based on the update history table stored in the update history storage unit 51, and when a RAW hazard has occurred between the PEs 10 (for example, The instruction of the PE 10 in which the RAW hazard has occurred is based on the completion information corresponding to the PE 10 that has been writing to the register in the past in the update history table, when “RAW” is “0”). Defer execution.
Thereby, the processor 1 according to the present embodiment can appropriately hold (stall) the instruction execution of the PE 10 in which the RAW hazard has occurred when the RAW hazard has occurred based on the update history information.

また、本実施形態では、レジスタ管理ユニット５０は、実行情報（Ａフラグ）に基づいて、ＲＡＷハザードが発生したか否かを判定し、更新情報（「ＵＰＤ」）に基づいて、最も近い過去にレジスタに書き込みを実行しているＰＥ１０を選択する。
これにより、本実施形態によるプロセッサ１は、簡易な手法により、ＲＡＷハザードの発生の検出、及び近い過去にレジスタに書き込みを実行しているＰＥ１０の選択を適切に行うことができる。 In the present embodiment, the register management unit 50 determines whether or not a RAW hazard has occurred based on the execution information (A flag), and based on the update information (“UPD”) The PE 10 that is writing to the register is selected.
Thereby, the processor 1 according to the present embodiment can appropriately detect the occurrence of the RAW hazard and select the PE 10 that has performed writing to the register in the near past by a simple method.

また、本実施形態では、複数のＰＥ１０のそれぞれは、レジスタを複数備え、更新履歴記憶部５１は、複数のレジスタのそれぞれに対応する更新履歴テーブルを記憶する。
これにより、本実施形態によるプロセッサ１は、ＰＥ１０のそれぞれが、複数のレジスタを備える場合にも対応可能である。また、本実施形態によるプロセッサ１は、ＰＥ１０が、レジスタを多数（複数）備えることにより、より複雑な処理が可能になる。 In the present embodiment, each of the plurality of PEs 10 includes a plurality of registers, and the update history storage unit 51 stores an update history table corresponding to each of the plurality of registers.
Thereby, the processor 1 according to the present embodiment can cope with the case where each PE 10 includes a plurality of registers. Further, the processor 1 according to the present embodiment can perform more complicated processing when the PE 10 includes a large number (a plurality) of registers.

また、本実施形態によるプロセッサ１は、ストアバッファ記憶部６１と、ストアバッファユニット６０（ストアバッファ制御部）とを備えている。ストアバッファ記憶部６１は、アトミックブロックに含まれる各命令に対応するローカル位置情報（「ＬＰＣ」）と、データメモリ７１（外部記憶部）にアクセスした履歴を示すアクセス情報とを対応付けたストアバッファテーブルを、複数のＰＥ１０の個数分記憶する。ストアバッファユニット６０は、ＰＥ１０からデータメモリ７１にアクセス要求があった場合に、ストアバッファ記憶部６１が記憶する当該ＰＥ１０に対応するストアバッファテーブルに基づいて、データメモリ７１とのアクセスを制御する。
これにより、本実施形態によるプロセッサ１は、データメモリ７１とのアクセスにおいても適切に処理を行うことができる。また、本実施形態によるプロセッサ１は、ストアバッファ記憶部６１を利用することで、例えば、データメモリ７１から過去に読み出したデータ、または、ストアバッファ記憶部６１を介してデータメモリ７１に記憶させたデータを、再利用できるので、データアクセスの処理時間を短縮することができる。 The processor 1 according to the present embodiment includes a store buffer storage unit 61 and a store buffer unit 60 (store buffer control unit). The store buffer storage unit 61 associates local position information (“LPC”) corresponding to each instruction included in the atomic block with access information indicating a history of accessing the data memory 71 (external storage unit). The table is stored for the number of PEs 10. When there is an access request from the PE 10 to the data memory 71, the store buffer unit 60 controls access to the data memory 71 based on the store buffer table corresponding to the PE 10 stored in the store buffer storage unit 61.
As a result, the processor 1 according to the present embodiment can appropriately perform processing even when accessing the data memory 71. Further, the processor 1 according to the present embodiment uses the store buffer storage unit 61 to store data read in the past from the data memory 71 or stored in the data memory 71 via the store buffer storage unit 61, for example. Since the data can be reused, the data access processing time can be shortened.

また、本実施形態では、アクセス情報には、アクセスの種別を示す種別情報（「Ｌｄ」、「Ｓｔ」など）と、データメモリ７１においてアクセスする記憶位置情報（「Ａｄｄｒ」）及びアクセスしたデータ（「Ｄａｔａ」）とが対応付けられている。ストアバッファユニット６０は、データメモリ７１からデータを読み出すアクセス要求に対して、ストアバッファテーブルに当該アクセス要求に対応するデータが存在する場合に、当該データをＰＥ１０に出力する。ストアバッファユニット６０は、ストアバッファテーブルに当該アクセス要求に対応するデータが存在しない場合に、データメモリ７１から読み出したデータをＰＥ１０に出力する。
これにより、本実施形態によるプロセッサ１は、簡易な手法により、適切にデータメモリ７１からデータを読み出すことができる。 In this embodiment, the access information includes type information (“Ld”, “St”, etc.) indicating the type of access, storage location information (“Addr”) accessed in the data memory 71, and accessed data ( “Data”). In response to an access request for reading data from the data memory 71, the store buffer unit 60 outputs the data to the PE 10 when data corresponding to the access request exists in the store buffer table. The store buffer unit 60 outputs the data read from the data memory 71 to the PE 10 when there is no data corresponding to the access request in the store buffer table.
Thereby, the processor 1 according to the present embodiment can appropriately read data from the data memory 71 by a simple method.

また、本実施形態では、ストアバッファユニット６０は、データメモリ７１にデータを書き込むアクセス要求に対して、ＰＥ１０に対応するストアバッファテーブルに、ローカル位置情報と、当該アクセス情報とを対応付けて記憶させるとともに、データメモリ７１に当該データを記憶させる。
これにより、本実施形態によるプロセッサ１は、簡易な手法により、適切にデータメモリ７１からデータを書き込むことができる。 In the present embodiment, the store buffer unit 60 stores the local position information and the access information in association with each other in the store buffer table corresponding to the PE 10 in response to an access request for writing data in the data memory 71. At the same time, the data is stored in the data memory 71.
Thereby, the processor 1 according to the present embodiment can appropriately write data from the data memory 71 by a simple method.

なお、本発明は、上記の実施形態に限定されるものではなく、本発明の趣旨を逸脱しない範囲で変更可能である。
例えば、上記の実施形態において、実行管理ユニット４０が、管理テーブル記憶部４１を備える例を説明したが、実行管理ユニット４０の外部に、管理テーブル記憶部４１を備えるようにしてもよい。また、レジスタ管理ユニット５０が、更新履歴記憶部５１を備える例を説明したが、レジスタ管理ユニット５０の外部に、更新履歴記憶部５１を備えるようにしてもよい。また、ストアバッファユニット６０が、ストアバッファ記憶部６１及びエントリテーブル記憶部６２を備える例を説明したが、ストアバッファユニット６０の外部に、ストアバッファ記憶部６１とエントリテーブル記憶部６２とのいずれか一方又は両方を備えるようにしてもよい。 In addition, this invention is not limited to said embodiment, It can change in the range which does not deviate from the meaning of this invention.
For example, in the above embodiment, the execution management unit 40 includes the management table storage unit 41. However, the management table storage unit 41 may be provided outside the execution management unit 40. Further, the example in which the register management unit 50 includes the update history storage unit 51 has been described, but the update history storage unit 51 may be provided outside the register management unit 50. Further, the example in which the store buffer unit 60 includes the store buffer storage unit 61 and the entry table storage unit 62 has been described. However, either the store buffer storage unit 61 or the entry table storage unit 62 is provided outside the store buffer unit 60. One or both may be provided.

また、上記の実施形態において、更新履歴記憶部５１及びエントリテーブル記憶部６２が、ＦＩＦＯに構成される例を説明したが、ポインタを設けて、サイクリック（循環）に構成してもよい。
また、上記の実施形態において、レジスタ管理ユニット５０が、命令の実行を保留、及び保留の解除をＰＥ１０に指示する例を説明したが、レジスタへの読み出し命令の際のレジスタ管理ユニット５０への問合せで、ＰＥ１０が自ら命令の実行を保留し、レジスタ管理ユニット５０が、問合せに対して保留の解除をＰＥ１０に指示するようにしてもよい。 In the above embodiment, the update history storage unit 51 and the entry table storage unit 62 are configured as FIFOs. However, the update history storage unit 51 and the entry table storage unit 62 may be configured cyclically by providing pointers.
Further, in the above-described embodiment, the example has been described in which the register management unit 50 instructs the PE 10 to suspend execution of the instruction and to release the suspension, but the inquiry to the register management unit 50 at the time of a read instruction to the register Then, the PE 10 may hold the execution of the instruction by itself, and the register management unit 50 may instruct the PE 10 to release the hold in response to the inquiry.

また、上記の実施形態において、プロセッサ１は、メモリ管理ユニット７０を介して、データメモリ７１とアクセスする例を説明したが、これに限定されるものではなく、キャッシュメモリを介してデータメモリ７１とアクセスするようにしてもよいし、データメモリ７１と直接アクセスするようにしてもよい。
また、実行管理ユニット４０の構成は、図４に示す構成に限定されるものではなく、実行管理ユニット４０は、他の構成により実現されてもよい。また、レジスタ管理ユニット５０の構成は、図６に示す構成に限定されるものではなく、レジスタ管理ユニット５０は、他の構成により実現されてもよい。 In the above embodiment, the example in which the processor 1 accesses the data memory 71 via the memory management unit 70 has been described. However, the present invention is not limited to this, and the processor 1 and the data memory 71 via the cache memory are not limited thereto. You may make it access, and you may make it access the data memory 71 directly.
Further, the configuration of the execution management unit 40 is not limited to the configuration illustrated in FIG. 4, and the execution management unit 40 may be realized by other configurations. Further, the configuration of the register management unit 50 is not limited to the configuration shown in FIG. 6, and the register management unit 50 may be realized by other configurations.

また、上記の実施形態において、プロセッサ１が、実行管理ユニット４０及び管理テーブル記憶部４１を一か所により集中管理する例を説明したが、これに限定されるものではない。プロセッサ１は、例えば、実行管理ユニット４０及び管理テーブル記憶部４１の一部または全部を各ＰＥ１０に分散して備え、分散管理されるようにしてもよい。プロセッサ１は、このような分散型で実装された場合であっても、上記の実施形態と同様の効果を奏する。 In the above embodiment, the example in which the processor 1 centrally manages the execution management unit 40 and the management table storage unit 41 in one place has been described. However, the present invention is not limited to this. For example, the processor 1 may include a part or all of the execution management unit 40 and the management table storage unit 41 distributed in each PE 10 and may be managed in a distributed manner. Even when the processor 1 is implemented in such a distributed type, the processor 1 has the same effect as the above embodiment.

１…プロセッサ、１０，１０−１，１０−２，１０−３，１０−４，１０−Ｎ…ＰＥ、１１…命令キャッシュ部、１２…レジスタファイル部、１３…命令実行部、２０…ＰＣ更新部、２１…ＰＣ、３０…命令フェッチ部、３１…ＩＲ、４０…実行管理ユニット、４１…管理テーブル記憶部、４２…キャッシュミスハンドラ部、４３…プリデコーダ部、５０…レジスタ管理ユニット、５１…更新履歴記憶部、５２，５３…マルチプレクサ、４２２，４２３…ＡＮＤ回路、６０…ストアバッファユニット、６１…ストアバッファ記憶部、６２…エントリテーブル記憶部、７０…メモリ管理ユニット、７１…データメモリ、４２１…比較回路、４２４…ＬＲＵアービタ、４２５…ハンドラ部、ＮＷ１…命令ネットワーク、ＮＷ２…オペランドデータネットワーク DESCRIPTION OF SYMBOLS 1 ... Processor 10, 10-1, 10-2, 10-3, 10-4, 10-N ... PE, 11 ... Instruction cache part, 12 ... Register file part, 13 ... Instruction execution part, 20 ... PC update , 21 ... PC, 30 ... instruction fetch unit, 31 ... IR, 40 ... execution management unit, 41 ... management table storage unit, 42 ... cache miss handler unit, 43 ... predecoder unit, 50 ... register management unit, 51 ... Update history storage unit, 52, 53 ... multiplexer, 422, 423 ... AND circuit, 60 ... store buffer unit, 61 ... store buffer storage unit, 62 ... entry table storage unit, 70 ... memory management unit, 71 ... data memory, 421 ... Comparison circuit, 424 ... LRU arbiter, 425 ... Handler part, NW1 ... Instruction network, NW2 ... Operand data Ttowaku

Claims

A plurality of processing elements each having a temporary storage unit for temporarily storing an assigned unit instruction sequence, and capable of executing instructions included in the unit instruction sequence stored in the temporary storage unit;
A program that is an instruction sequence at a level of assembly language or lower is changed to the unit instruction sequence that is an instruction sequence that does not include a branch instruction in the middle and that has a branch instruction at the beginning and a branch instruction at the end. An execution management unit that divides and sequentially assigns the divided unit instruction sequence to each of the plurality of processing elements, and causes the plurality of processing elements to execute the assigned unit instruction sequence in parallel ;
Start position information indicating the position of the start instruction of the unit instruction sequence in the program, valid information indicating whether or not the unit instruction sequence is valid, and next execution information indicating the processing element to be executed next A management table storage unit for storing a management table having a plurality of associated management information ,
The execution management unit
When the divided unit instruction sequence is not stored in any of the temporary storage units of the plurality of processing elements, the unit instruction sequence is assigned to any of the plurality of processing elements,
When the divided unit instruction sequence is already stored in the temporary storage unit of any of the plurality of processing elements, the unit instruction sequence already stored in the temporary storage unit is connected in parallel to the processing element. Let it run
When a branch instruction is detected in the program, based on the management table stored in the management table storage unit, the unit instruction sequence to be executed next is stored in the temporary storage unit in any of the plurality of processing elements. Determine whether it is already stored and valid
Processor, wherein a call.

The execution management unit
When the branch instruction is detected in the program, the head position corresponding to the unit instruction sequence to be executed next to the branch instruction based on the next execution information from the management table stored in the management table storage unit Information and the valid information,
If the position indicated by the acquired head position information matches the execution position of the next program to be executed, and the acquired valid information indicates validity, the processing element that matches the next execution information is , the unit instruction sequence is stored in the temporary storage unit, and processor according to claim 1, wherein the determining that the valid state.

The execution management unit
When the position indicated by the acquired head position information does not match the execution position of the next program to be executed or when the acquired valid information indicates invalid, the next execution is performed on one of the plurality of processing elements. The processor according to claim 2 , wherein the unit instruction sequence to be assigned is assigned and the management table is updated in correspondence with the assigned unit instruction sequence.

The execution management unit
When allocating the unit instruction sequence to be executed next to one of the plurality of processing elements, the start position information corresponding to the unit instruction sequence of the management table is stored, and the valid information indicates invalidity Update to value,
The processor according to claim 3 , wherein when the processing element completes execution of the unit instruction sequence, the valid information corresponding to the unit instruction sequence is updated to a value indicating validity.

The processor according to any one of claims 1 to 4 , wherein the processing element executes the instruction by pipeline processing.

Each of the plurality of processing elements has a register used to execute the instruction,
Furthermore, when a RAW hazard that is data-dependent due to reading after writing to the register occurs between the plurality of processing elements, the unit by the processing element that has performed writing to the register in the nearest past until the write to the register of the instruction sequence is completed, any one of claims 1 to 5, characterized in that it comprises a register management unit for holding the instruction execution of the processing elements to perform the reading of the register The processor according to item.

  A plurality of processing elements each having a temporary storage unit for temporarily storing an assigned unit instruction sequence, and capable of executing instructions included in the unit instruction sequence stored in the temporary storage unit;
  A program that is an instruction sequence at a level of assembly language or lower is changed to the unit instruction sequence that is an instruction sequence that does not include a branch instruction in the middle and that has a branch instruction at the beginning and a branch instruction at the end. An execution management unit that divides, sequentially assigns the divided unit instruction sequence to each of the plurality of processing elements, and causes the plurality of processing elements to execute the assigned unit instruction sequence in parallel;
  With
  Each of the plurality of processing elements has a register used to execute the instruction,
  Furthermore, when a RAW hazard that is data-dependent due to reading after writing to the register occurs between the plurality of processing elements, the unit by the processing element that has performed writing to the register in the nearest past A register management unit that suspends instruction execution of the processing element that executes reading of the register until the writing of the instruction sequence to the register is completed.
  A processor characterized by that.

PE identification information for identifying the processing element, update information indicating that the register has been updated in the execution of the unit instruction sequence, and completion indicating that the last write of the register in the unit instruction sequence has been completed An update history storage unit for storing an update history table having update history information in association with information and execution information indicating that writing of the register has been executed, for the number of processing elements;
The register management unit
Based on the update history table stored in the update history storage unit, it is determined whether or not the RAW hazard has occurred,
In the processing element that reads from the register, a RAW hazard in the processing element does not occur, and the processing element that reads from the register and a write to the register that is closest to the past are written. When a RAW hazard between the processing elements occurs between the processing elements to be performed, the register based on the completion information on the register to be written for the processing element to be written in the nearest past The processor according to claim 6 , wherein the processor controls whether or not to suspend the execution of the instruction of the processing element that reads from the processing element .

The register management unit
Whether or not a RAW hazard has occurred in the processing element in the processing element that reads the register based on the execution information related to the register to be read for the processing element that reads the register the processor of claim 8, wherein the determining whether.

Each of the plurality of processing elements includes a plurality of the registers,
The processor according to claim 8 or 9 , wherein the update history storage unit stores the update history table corresponding to each of the plurality of registers.

A store buffer that stores a store buffer table in which local position information corresponding to each instruction included in the unit instruction sequence and access information indicating a history of accessing the external storage unit are associated with each other as many as the number of the processing elements. A storage unit;
A store that controls access to the external storage unit based on the store buffer table corresponding to the processing element stored in the store buffer storage unit when there is an access request from the processing element to the external storage unit the processor according to any one of claims 1 0 to claim 1, characterized in that it comprises a buffer control unit.

The access information is associated with type information indicating the type of access, storage location information accessed in the external storage unit, and accessed data.
The store buffer control unit
In response to the access request for reading data from the external storage unit, when the data corresponding to the access request exists in the store buffer table, the data is output to the processing element,
The processor of claim 1 1, wherein the data corresponding to the access request to the store buffer table in the absence, and outputs the data read from the external storage unit to the processing element.

The store buffer control unit
In response to the access request for writing data to the external storage unit, the local location information and the access information are stored in association with each other in the store buffer table corresponding to the processing element, and the external storage unit the processor of claim 1 1 or claim 1 2, characterized in that to store the data.