JP2001092663A

JP2001092663A - Data processor

Info

Publication number: JP2001092663A
Application number: JP26336699A
Authority: JP
Inventors: Kazuaki Okamoto; 一晃岡本
Original assignee: Sanyo Electric Co Ltd
Current assignee: Sanyo Electric Co Ltd
Priority date: 1999-09-17
Filing date: 1999-09-17
Publication date: 2001-04-06

Abstract

PROBLEM TO BE SOLVED: To simplify a compiler technique and a circuit constitution in the parallel processing of data. SOLUTION: This data processor is provided with a first processing unit 10 and a second processing unit 20. The first processing unit 10 executes a memory access instruction whose executing time is dynamically changed, and the second processing unit 20 executes an arithmetic instruction whose executing time is statically decided. The both instructions are asynchronously processed in parallel so that a compiler technique can be simplified, and the circuit can be made simpler than a circuit constitution in a super scalar system. When any dependency is present between the both instructions, data are transferred between processing units by using a register 12.

Description

【発明の詳細な説明】DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明はデータ処理装置、特
にストアドプログラム型デジタル計算機において、並列
に設置された複数の処理ユニットにより複数の命令を並
列的に実行する装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a data processing device, and more particularly to a device for executing a plurality of instructions in parallel by a plurality of processing units installed in parallel in a stored program type digital computer.

【０００２】[0002]

【従来の技術】マイクロプロセッサの処理性能を向上さ
せるため、複数の処理ユニットを並列に動作させ、同時
に複数の命令を実行する方法が有効である。これを実現
するための従来の方法として、スーパースカラ方式とＶ
ＬＩＷ（Very Long Instruction Word）方式がある。2. Description of the Related Art In order to improve the processing performance of a microprocessor, a method of operating a plurality of processing units in parallel and simultaneously executing a plurality of instructions is effective. Conventional methods for realizing this are the superscalar method and V
There is an LIW (Very Long Instruction Word) system.

【０００３】図５には、スーパースカラ方式の概念図が
示されている。スーパースカラ方式では、まず順次実行
されるべき複数の命令を解析し、命令関係の依存関係を
検査する。この解析により、同時に実行可能な命令を動
的に検出し、これらの命令を各処理ユニットに分配して
並列に実行する。図では、同時に実行可能な命令として
Ａｄｄ（加算）、Ｓｕｂ（減算）、Ｌｏａｄ（ロード）
を抽出し、これらを分配する場合を示している。実行時
間の異なる複数の命令を動的にスケジューリングするの
で、効率の良い並列実行が可能となる。FIG. 5 shows a conceptual diagram of the super scalar system. In the super scalar method, first, a plurality of instructions to be sequentially executed are analyzed, and the dependency relation of the instructions is checked. By this analysis, simultaneously executable instructions are dynamically detected, and these instructions are distributed to the respective processing units and executed in parallel. In the figure, Add (addition), Sub (subtraction), and Load (load) are instructions that can be executed simultaneously.
Are extracted and distributed. Since a plurality of instructions having different execution times are dynamically scheduled, efficient parallel execution becomes possible.

【０００４】一方、図６には、ＶＬＩＷ方式の概念図が
示されている。ＶＬＩＷ方式では、並列に実行できる命
令をコンパイル時に静的に解析し、各処理ユニットへの
命令を連結して長命令形式をとるものである。すなわ
ち、処理ユニットが例えばユニット１、２、３、４、５
と存在する場合、命令はユニット１用命令＋ユニット２
用命令＋ユニット３用命令＋ユニット４用命令＋ユニッ
ト５用命令の形をとる。プロセッサは実行時に長命令形
式の命令を分解し、予め定められた各処理ユニットに対
して小命令を発行することで、並列に実行することが可
能となる。図では、Ａｄｄ、Ｓｕｂ、Ｍｕｌ（乗算）、
Ｄｉｖ（除算）、Ｌｏａｄに分解して各処理ユニットに
発行する場合を示している。命令の依存解析はすべてコ
ンパイル時にソフトウェアで静的に行うため、複雑な依
存解析回路が不要となり、全体の並列度を大きくするこ
とが比較的容易である。On the other hand, FIG. 6 shows a conceptual diagram of the VLIW system. In the VLIW method, instructions that can be executed in parallel are statically analyzed at the time of compilation, and instructions for each processing unit are connected to form a long instruction format. That is, the processing unit is, for example, units 1, 2, 3, 4, 5,
, The instruction is the instruction for unit 1 + the instruction for unit 2
Instruction + unit 3 instruction + unit 4 instruction + unit 5 instruction. The processor decomposes the instruction in the long instruction format at the time of execution and issues a small instruction to each of the predetermined processing units so that the processor can execute the instructions in parallel. In the figure, Add, Sub, Mul (multiplication),
The figure shows a case where the data is divided into Div (division) and Load and issued to each processing unit. Since all instruction dependency analysis is performed statically by software at the time of compilation, a complicated dependency analysis circuit is not required, and it is relatively easy to increase the overall parallelism.

【０００５】[0005]

【発明が解決しようとする課題】しかしながら、スーパ
ースカラ方式では、全ての命令の解析を動的に行うた
め、命令の依存解析を行う回路や複数の命令を同時に発
行する回路が複雑化し、並列度を大きくするにつれ回路
規模も大きくなり動作周波数が低下してしまう問題があ
る。また、並列制御部分のハードウェア構成が並列度に
より固定されてしまうため、一度設計すると容易に処理
ユニットを追加あるいは削除することが困難であり、柔
軟性に欠ける問題がある。However, in the superscalar system, all instructions are dynamically analyzed, so that a circuit for analyzing dependence of instructions and a circuit for simultaneously issuing a plurality of instructions become complicated, and the parallelism is reduced. However, there is a problem that the circuit scale is increased as the value is increased and the operating frequency is reduced. In addition, since the hardware configuration of the parallel control part is fixed depending on the degree of parallelism, it is difficult to easily add or delete a processing unit once designed, and there is a problem of lack of flexibility.

【０００６】ＶＬＩＷ方式においても、ロード命令やス
トア命令のようなメモリアクセス命令の如き動的に実行
時間が変化する命令に対して、静的にスケジューリング
することが難しく、コンパイラ技術が著しく困難になる
問題がある。しかも、本来は非同期に動作しているメモ
リアクセスユニットと、演算処理ユニットに対して、同
期的に命令を発行するので、メモリアクセスがストール
すると、同時に他の演算ユニットもストールしてしま
い、並列処理のメリットが失われる問題がある。Also in the VLIW method, it is difficult to statically schedule an instruction whose execution time changes dynamically, such as a memory access instruction such as a load instruction or a store instruction, and the compiler technology becomes extremely difficult. There's a problem. Moreover, since instructions are issued synchronously to the memory access unit and the operation processing unit which are originally operating asynchronously, when the memory access is stalled, the other operation units are also stalled at the same time, and the parallel processing is performed. There is a problem that the advantage of is lost.

【０００７】本発明は、上記従来技術の有する課題に鑑
みなされたものであり、その目的は、コンパイラ技術を
容易なものとするとともに回路構成を簡易化し、かつ、
処理ユニットの追加や削除にも対応することができる、
並列処理可能なデータ処理装置を提供することにある。SUMMARY OF THE INVENTION The present invention has been made in view of the above-mentioned problems of the prior art, and has as its object to simplify the circuit configuration while simplifying the compiler technology, and
It can also handle addition and deletion of processing units.
An object of the present invention is to provide a data processing device capable of parallel processing.

【０００８】[0008]

【課題を解決するための手段】上記目的を達成するため
に、本発明のデータ処理装置は、動的に実行時間が変化
する命令を処理する第１処理ユニットと静的に実行時間
が決定する命令を処理する第２処理ユニットが互いに独
立して並列動作することを特徴とする。In order to achieve the above object, a data processing apparatus according to the present invention statically determines an execution time with a first processing unit for processing an instruction whose execution time changes dynamically. The second processing units for processing instructions operate in parallel independently of each other.

【０００９】また、前記第１処理ユニットは、前記動的
に実行時間が変化する命令を取り込みデコードする第１
制御部と、前記第１制御部から供給された命令を実行す
る第１実行部とを有し、前記第２処理ユニットは、前記
静的に実行時間が決定する命令を取り込みデコードする
第２制御部と、前記第２制御部から供給された命令を実
行する第２実行部とを有することを特徴とする。Further, the first processing unit receives and decodes the instruction whose execution time changes dynamically.
A second execution unit that has a control unit and a first execution unit that executes an instruction supplied from the first control unit, wherein the second processing unit fetches and decodes the instruction whose execution time is statically determined; And a second execution unit that executes an instruction supplied from the second control unit.

【００１０】また、前記第１処理ユニットと第２処理ユ
ニットで共有されるデータ記憶手段を有し、前記データ
記憶手段にデータを書き込むことで前記動的に実行時間
が変化する命令と前記静的に実行時間が決定する命令と
の間の依存関係を伝達することを特徴とする。The data processing means further comprises data storage means shared by the first processing unit and the second processing unit. And transmitting a dependency between the instruction and an instruction whose execution time is determined.

【００１１】また、前記第１処理ユニットは第１データ
記憶手段を有し、前記第２処理ユニットは第２データ記
憶手段を有し、前記第１記憶手段と前記第２記憶手段と
の間でデータ転送を行うことで前記動的に実行時間が変
化する命令と前記静的に実行時間が決定する命令との間
の依存関係を伝達することを特徴とする。Further, the first processing unit has first data storage means, and the second processing unit has second data storage means, between the first storage means and the second storage means. By performing data transfer, a dependency between the instruction whose execution time dynamically changes and the instruction whose execution time is statically determined is transmitted.

【００１２】また、前記第２処理ユニットは複数存在し
てそれぞれ独立に並列動作し、前記第１処理ユニットは
第１データ記憶手段を有し、前記複数の第２処理ユニッ
トのそれぞれは第２記憶手段を有し、前記第１記憶手段
と前記第２記憶手段との間でデータ転送を行うことで前
記動的に実行時間が変化する命令と前記静的に実行時間
が決定する命令との間の依存関係を伝達することを特徴
とする。A plurality of the second processing units exist and operate in parallel independently of each other. The first processing unit has first data storage means, and each of the plurality of second processing units has a second storage unit. Means for performing a data transfer between the first storage means and the second storage means, between the instruction whose execution time dynamically changes and the instruction whose execution time is statically determined. Is transmitted.

【００１３】ここで、前記第１制御部あるいは第２制御
部の内容を書き換える書換手段をさらに有することが好
適である。また、前記動的に実行時間が変化する命令は
メモリアクセス命令であり、前記静的に実行時間が決定
する命令は演算命令であることが好適である。Here, it is preferable that the apparatus further comprises rewriting means for rewriting the contents of the first control section or the second control section. Preferably, the instruction whose execution time dynamically changes is a memory access instruction, and the instruction whose execution time is statically determined is an operation instruction.

【００１４】このように、本発明では、メモリアクセス
命令等のように動的に実行時間が変化する命令と演算命
令等のように静的に実行時間が決定する命令とを分離
し、それぞれを別の処理ユニットで独立に実行する。図
１には、本発明の概念が示されており、動的に実行時間
が変化する命令としてＬｏａｄを例示し、静的に実行時
間が決定する命令としてＡｄｄ、Ｓｕｂ、Ｍｕｌ、Ｄｉ
ｖを例示して、両者を互いに分離し、並列して処理する
ことを表している。これにより、動的に実行時間が変化
する命令と依存関係にない命令は確実に並列実行でき、
効率良いスケジューリングを行うコンパイラ技術を容易
化することができる。また、静的に実行時間が決定する
命令はソフトウェアでスケジューリングすることで、命
令実行時の命令依存解析を不要とし、回路構成を簡易化
して動作周波数の低下を防ぐことができる。さらに、静
的に実行時間が決定する命令に関しては、ソフトウェア
で命令依存解析を行うため、再設計時の処理ユニットの
追加や削除にも容易に対応することができる。As described above, in the present invention, an instruction whose execution time dynamically changes, such as a memory access instruction, and an instruction whose execution time is statically determined, such as an operation instruction, are separated from each other. Run independently in another processing unit. FIG. 1 illustrates the concept of the present invention. Load is exemplified as an instruction whose execution time dynamically changes, and Add, Sub, Mul, and Di are instructions that statically determine the execution time.
The example of “v” indicates that both are separated from each other and processed in parallel. This ensures that instructions that do not depend on instructions whose execution time changes dynamically can be executed in parallel,
It is possible to facilitate a compiler technique for performing efficient scheduling. In addition, by scheduling software instructions whose execution time is determined statically by software, it is not necessary to perform an instruction dependence analysis at the time of execution of the instruction, thereby simplifying the circuit configuration and preventing a decrease in operating frequency. Further, for instructions whose execution time is determined statically, instruction dependency analysis is performed by software, so that it is possible to easily cope with addition or deletion of a processing unit at the time of redesign.

【００１５】動的に実行時間が変化する命令を処理する
第１処理ユニットと静的に実行時間が決定する命令を処
理する第２処理ユニットはそれぞれ制御部（プログラム
制御部）を有し、それぞれの命令を取り込み、デコード
してそれぞれの実行部に命令を供給し、独立して並列処
理する。静的な実行時間で決定する命令が動的に実行時
間が変化する命令の実行結果を利用する場合等には、共
有の、あるいは個別のデータ記憶手段が用いられる。す
なわち、第１処理ユニットの実行結果をデータ記憶手段
に記憶させ、第２処理ユニットがこの結果を読み出すこ
とで、両者の依存関係に応じた処理を実行し同期を保証
する。第１処理ユニット及び第２処理ユニットはそれぞ
れ制御部を有しており、これらの内容を互いに書換可能
とすることで、いずれかの処理ユニットで他の処理ユニ
ットの実行を管理、あるいは制御することも可能とな
る。The first processing unit for processing an instruction whose execution time dynamically changes and the second processing unit for processing an instruction whose execution time is statically determined have respective control units (program control units). , And decodes the supplied instructions to supply the instructions to the respective execution units, which are independently processed in parallel. When an instruction determined by a static execution time uses an execution result of an instruction whose execution time dynamically changes, a shared or individual data storage unit is used. That is, the execution result of the first processing unit is stored in the data storage unit, and the result is read out by the second processing unit, thereby executing a process according to the dependency between the two and guaranteeing synchronization. Each of the first processing unit and the second processing unit has a control unit, and these contents can be rewritten with each other so that one of the processing units manages or controls execution of another processing unit. Is also possible.

【００１６】[0016]

【発明の実施の形態】以下、図面に基づき本発明の実施
形態について説明する。Embodiments of the present invention will be described below with reference to the drawings.

【００１７】図２には、本実施形態の構成図が示されて
いる。第１処理ユニット１０、第２処理ユニット２０及
び外部メモリ１４が設けられている。第１処理ユニット
１０は、プログラムカウンタ（ＰＣ）を有するプログラ
ム制御部１０ａ、ロード／ストア（Ｌ／Ｓ）ユニット１
０ｂ、及びレジスタ１２を有する。プログラム制御部１
０ａは、ＰＣに加え、命令フェッチ部及び命令デコード
部を有しており、コンパイル時にロード命令あるいはス
トア命令などのメモリアクセス命令をフェッチしてデコ
ードし、ロード／ストアユニット１０ｂに供給する。ロ
ード／ストアユニット１０ｂは、命令デコード部から与
えられた命令に従ってアドレスを計算するアドレス計算
部、アドレス計算部で計算されたアドレスに従って外部
メモリ１４へアクセスするメモリアクセス部、外部メモ
リ１４からロードされたデータをレジスタ１２に書き込
むレジスタ書き込み部を有しており、これらによりプロ
グラム制御部１０ａから供給されたロード命令あるいは
ストア命令を実行する。一方、第２処理ユニット２０
は、ＰＣを有するプログラム制御部２０ａ、演算ユニッ
ト２０ｂ、２０ｃ及びレジスタ１２を有する。プログラ
ム制御部２０ａもプログラム制御部１０ａと同様に命令
フェッチ部及び命令デコード部を有し、ＡｄｄやＳｕ
ｂ、Ｍｕｌ等の演算命令をフェッチしてデコードし、演
算命令毎に演算ユニット２０ｂ、２０ｃに供給する。演
算ユニット２０ｂ、２０ｃはそれぞれ所定の演算を実行
し（例えば演算ユニット２０ｂはＡｄｄ、演算ユニット
２０ｃはＳｕｂ）、演算結果をレジスタ１２に書き込
む。演算ユニット２０ｂ、２０ｃは並列動作可能であ
り、プログラム制御部２０ａはこれらの演算処理ユニッ
ト２０ｂ、２０ｃに対して並列に演算命令を発行する。
レジスタ１２は、第１処理ユニット１０及び第２処理ユ
ニット２０に共通のレジスタであり、第１処理ユニット
の処理結果、すなわち外部メモリ１４からロードされた
データを第２処理ユニット２０に供給し、あるいは第２
処理ユニット２０で得られた演算結果を第１処理ユニッ
ト１０に供給する。FIG. 2 shows a configuration diagram of the present embodiment. A first processing unit 10, a second processing unit 20, and an external memory 14 are provided. The first processing unit 10 includes a program control unit 10a having a program counter (PC), a load / store (L / S) unit 1
0b, and a register 12. Program control unit 1
0a has an instruction fetch unit and an instruction decode unit in addition to the PC, fetches and decodes a memory access instruction such as a load instruction or a store instruction at the time of compilation, and supplies it to the load / store unit 10b. The load / store unit 10b calculates an address according to an instruction given from the instruction decode unit, a memory access unit that accesses the external memory 14 according to the address calculated by the address calculation unit, and is loaded from the external memory 14. It has a register writing unit that writes data to the register 12, and executes a load instruction or a store instruction supplied from the program control unit 10a. On the other hand, the second processing unit 20
Has a program control unit 20a having a PC, operation units 20b and 20c, and a register 12. The program control unit 20a also has an instruction fetch unit and an instruction decode unit as in the case of the program control unit 10a.
Operation instructions such as b and Mul are fetched and decoded, and supplied to the operation units 20b and 20c for each operation instruction. The operation units 20b and 20c each execute a predetermined operation (for example, the operation unit 20b is Add and the operation unit 20c is Sub), and writes the operation result to the register 12. The operation units 20b and 20c can operate in parallel, and the program control unit 20a issues operation instructions to these operation processing units 20b and 20c in parallel.
The register 12 is a register common to the first processing unit 10 and the second processing unit 20, and supplies a processing result of the first processing unit, that is, data loaded from the external memory 14 to the second processing unit 20, or Second
The operation result obtained by the processing unit 20 is supplied to the first processing unit 10.

【００１８】このように、本実施形態では、第１処理ユ
ニット１０及び第２処理ユニット２０がそれぞれプログ
ラム制御部と実行部を有しているため、プログラムのコ
ンパイル時にメモリアクセス命令と演算命令を分離し、
動的に実行時間が変化するロード命令やストア命令等の
メモリアクセス命令は第１処理部１０に供給して実行
し、静的に実行時間が決定するＡｄｄ命令やＳｕｂ命令
等の演算命令は第２処理ユニット２０に供給して実行す
ることで、両者を独立、非同期に実行することができ
る。したがって、動的に変化するメモリアクセス時間に
影響されることなく、演算命令を並列実行することがで
きる。As described above, in the present embodiment, since the first processing unit 10 and the second processing unit 20 each have the program control unit and the execution unit, the memory access instruction and the operation instruction are separated at the time of compiling the program. And
Memory access instructions such as load instructions and store instructions whose execution time dynamically changes are supplied to the first processing unit 10 and executed. Operation instructions such as Add instructions and Sub instructions whose execution time is statically determined are executed by the first processing unit 10. By supplying and executing the two processing units 20, both can be executed independently and asynchronously. Therefore, the operation instructions can be executed in parallel without being affected by the dynamically changing memory access time.

【００１９】そして、メモリアクセス命令と演算命令に
依存関係がある場合、例えばある演算を行い、その演算
結果を利用してさらに他の演算を行う場合には、両処理
ユニットで共有するレジスタ１２にデータを書き込んで
処理ユニット間でデータの受け渡しを行うことにより、
両命令の同期を確保することができる。When there is a dependency between the memory access instruction and the operation instruction, for example, when a certain operation is performed and another operation is performed by using the operation result, the register 12 shared by the two processing units is used. By writing data and passing data between processing units,
Synchronization of both instructions can be ensured.

【００２０】図３には、他の実施形態の構成が示されて
いる。図２と異なる点は、第１処理ユニット１０及び第
２処理ユニット２０がそれぞれ固有のレジスタを有する
点である。すなわち、第１処理ユニット１０は、プログ
ラム制御部１０ａ、Ｌ／Ｓユニット１０ｂ及びレジスタ
１０ｃを有し、第２処理ユニット２０はプログラム制御
部２０ａ、演算ユニット２０ｂ、２０ｃ及びレジスタ２
０ｄを有する。FIG. 3 shows the configuration of another embodiment. The difference from FIG. 2 is that each of the first processing unit 10 and the second processing unit 20 has its own register. That is, the first processing unit 10 includes a program control unit 10a, an L / S unit 10b, and a register 10c, and the second processing unit 20 includes a program control unit 20a, operation units 20b, 20c, and a register 2c.
0d.

【００２１】第１処理ユニット１０及び第２処理ユニッ
ト２０での処理は図２の場合と同様であり、第１処理ユ
ニット１０ではメモリアクセス命令を取り込み、デコー
ドして外部メモリ１４にアクセスし、第２処理ユニット
２０では演算命令を取り込み、デコードして各種演算を
並列に実行することで、メモリアクセス命令と演算命令
を非同期に並列実行する。The processing in the first processing unit 10 and the second processing unit 20 is the same as that in FIG. 2. The first processing unit 10 fetches a memory access instruction, decodes it, accesses the external memory 14, and The two-processing unit 20 takes in the operation instruction, decodes it, and executes various operations in parallel, thereby asynchronously executing the memory access instruction and the operation instruction in parallel.

【００２２】そして、メモリアクセス命令と演算命令の
間に依存関係が存在する場合には、それぞれの処理ユニ
ットに設けられているレジスタ１０ｃ、２０ｄを用いて
データの受け渡しを行う。例えば、外部メモリ１４から
ロードしたデータはレジスタ１０ｃに書き込まれ、さら
にデータ受渡手段（図示せず）によりレジスタ２０ｄに
書き込まれる。第２処理ユニット２０の演算ユニット２
０ｂ（あるいは２０ｃ）は、レジスタ２０ｄに書き込ま
れたこのデータを読み出して演算を行うことで、両命令
の同期を確保できる。もちろん、この逆も可能であり、
演算ユニット２０ｂ（あるいは２０ｃ）で演算された結
果はレジスタ２０ｄに書き込まれ、さらにデータ受渡手
段によりレジスタ１０ｃに書き込まれる。第１処理ユニ
ット１０のＬ／Ｓユニット１０ｂは、レジスタ１０ｃに
書き込まれたデータをストア命令に従って外部メモリ１
４に記憶することで、両命令の依存関係を確実に実行で
きる。When there is a dependency between the memory access instruction and the operation instruction, data is transferred using the registers 10c and 20d provided in the respective processing units. For example, data loaded from the external memory 14 is written to the register 10c, and further written to the register 20d by data transfer means (not shown). Arithmetic unit 2 of second processing unit 20
0b (or 20c) can read the data written in the register 20d and perform an operation to ensure the synchronization between the two instructions. Of course, the reverse is also possible,
The result calculated by the arithmetic unit 20b (or 20c) is written to the register 20d, and further written to the register 10c by the data transfer means. The L / S unit 10b of the first processing unit 10 stores the data written in the register 10c in the external memory 1 according to a store instruction.
4, the dependencies between the two instructions can be reliably executed.

【００２３】図４には、さらに他の実施形態の構成が示
されている。図３と異なる点は、第２処理ユニットがユ
ニット２２、２４、２６と複数個設けられ、これらが相
互結合網３０により相互に接続されるとともに第１処理
ユニット１０にも接続されている点である。複数の第２
処理ユニット２２、２４、２６のそれぞれは、図３と同
様にプログラム制御部２２ａ、２４ａ、２６ａを有し、
また、それぞれ演算ユニット２２ｂ、２４ｂ、２６ｂと
レジスタ２２ｃ、２４ｃ、２６ｃを有している。各処理
ユニットの処理は図３と同様であり、第１処理ユニット
１０ではメモリアクセス命令を保持、デコードして外部
メモリ１４にアクセスし、第２処理ユニット２２、２
４、２６ではそれぞれ演算命令を保持、デコードして各
種演算を並列に実行することで、メモリアクセス命令と
演算命令を非同期に並列実行する。FIG. 4 shows the configuration of still another embodiment. The difference from FIG. 3 is that a plurality of second processing units are provided as units 22, 24, and 26, which are connected to each other by an interconnection network 30 and also connected to the first processing unit 10. is there. Multiple second
Each of the processing units 22, 24, and 26 has program control units 22a, 24a, and 26a as in FIG.
In addition, it has arithmetic units 22b, 24b, 26b and registers 22c, 24c, 26c, respectively. The processing of each processing unit is the same as that of FIG. 3. The first processing unit 10 holds and decodes a memory access instruction, accesses the external memory 14, and
In steps 4 and 26, the operation instruction is held and decoded, and various operations are executed in parallel, so that the memory access instruction and the operation instruction are asynchronously executed in parallel.

【００２４】そして、メモリアクセス命令と演算命令の
間に依存関係が存在する場合には、それぞれの処理ユニ
ットに設けられているレジスタ１０ｃ、２２ｃ、２４
ｃ、２６ｄを用いてデータの受け渡しを行う。When there is a dependency between the memory access instruction and the operation instruction, the registers 10c, 22c, 24 provided in the respective processing units are provided.
Data transfer is performed using c and 26d.

【００２５】この実施形態では、演算命令を行う第２処
理ユニットがそれぞれ独立して非同期に処理を行うた
め、より複雑な命令に対しても柔軟に対応することがで
き、メモリアクセス命令と演算命令との間に依存関係が
存在する場合にも、複数のレジスタ２２ｃ、２４ｃ、２
６ｃを用いたデータの受け渡しにより容易に対応するこ
とができる。In this embodiment, since the second processing units for executing the operation instructions independently and asynchronously perform the processing, it is possible to flexibly cope with even more complicated instructions. When there is a dependency between the plurality of registers 22c, 24c,
6c can be easily handled by transferring data.

【００２６】なお、この実施形態において、例えば第１
処理ユニット１０内のプログラム制御部１０ａが、相互
結合網３０を介して第２処理ユニット２２，２４、２６
内の各プログラム制御部２２ａ、２４ａ、２６ａのＰＣ
の内容を書き換えるように制御することも可能である。
例えば、第１処理ユニット１０で実行されるロード命令
でロードしたデータに基づいて第２処理ユニット２４で
演算を実行する依存関係が存在する場合において、ロー
ド命令に一定時間以上要してしまう場合には、プログラ
ム制御部１０ａはプログラム制御部２４ａにアクセスし
てＰＣの内容を書き換え、当該演算の順序を変更するこ
とで、処理の効率化を図ることができる。In this embodiment, for example, the first
The program control unit 10 a in the processing unit 10 is connected to the second processing units 22, 24, 26 via the interconnection network 30.
Of each program control unit 22a, 24a, 26a in the PC
Can be controlled so as to rewrite the contents.
For example, when there is a dependency relationship in which the second processing unit 24 performs an operation based on data loaded by a load instruction executed by the first processing unit 10, and when the load instruction requires a certain time or more, In other words, the program control unit 10a accesses the program control unit 24a, rewrites the contents of the PC, and changes the order of the calculations, thereby increasing the efficiency of the processing.

【００２７】以上、本発明の実施形態について説明した
が、本発明の技術思想の範囲内で種種の変形使用が可能
である。例えば、第１処理ユニットは１個に限定される
ことなく、必要に応じて２個以上設けることも可能であ
り、この場合、各第１処理ユニットにプログラム制御部
と実行部（Ｌ／Ｓユニット）を設ければよい。また、第
２処理ユニット内の演算ユニットも任意の数だけ設ける
ことができる。Although the embodiments of the present invention have been described above, various modifications can be made within the scope of the technical concept of the present invention. For example, the number of first processing units is not limited to one, and two or more first processing units can be provided as necessary. In this case, each first processing unit has a program control unit and an execution unit (L / S unit). ) May be provided. Also, any number of arithmetic units in the second processing unit can be provided.

【００２８】さらに、本実施形態では、動的に実行時間
が変化する命令としてメモリアクセス命令、静的に実行
時間が決定する命令として演算命令を用いたが、他の命
令をこれに準じて分離することができる。本発明の本質
は、実行時間が変化し得る命令と実行時間が固定した命
令とをコンパイル時に分離して非同期に実行することで
あり、命令の性質に応じて適宜分離し、非同期に実行す
ればよい。Further, in this embodiment, a memory access instruction is used as an instruction whose execution time dynamically changes, and an arithmetic instruction is used as an instruction whose execution time is statically determined. However, other instructions are separated according to this. can do. The essence of the present invention is that an instruction whose execution time can be changed and an instruction whose execution time is fixed are separated at compile time and executed asynchronously. Good.

【００２９】[0029]

【発明の効果】以上説明したように、本発明によれば、
データの並列処理において、コンパイラ技術を容易なも
のとするとともに回路構成を簡易化し、かつ、処理ユニ
ットの追加や削除にも対応することができる。As described above, according to the present invention,
In parallel processing of data, it is possible to simplify the compiler technology and simplify the circuit configuration, and to cope with addition and deletion of processing units.

[Brief description of the drawings]

【図１】本発明の概念説明図である。FIG. 1 is a conceptual explanatory diagram of the present invention.

【図２】実施形態の構成図である。FIG. 2 is a configuration diagram of an embodiment.

【図３】他の実施形態の構成図である。FIG. 3 is a configuration diagram of another embodiment.

【図４】さらに他の実施形態の構成図である。FIG. 4 is a configuration diagram of still another embodiment.

【図５】スーパースカラ方式の概念図である。FIG. 5 is a conceptual diagram of a superscalar system.

【図６】ＶＬＩＷ方式の概念図である。FIG. 6 is a conceptual diagram of the VLIW method.

[Explanation of symbols]

１０第１処理ユニット、１４外部メモリ、２０〜２
６第２処理ユニット。10 first processing unit, 14 external memory, 20 to 2
6 Second processing unit.

Claims

[Claims]

1. A data processing apparatus, wherein a first processing unit for processing an instruction whose execution time dynamically changes and a second processing unit for processing an instruction whose execution time is statically determined are independent of each other. A data processing device that operates in parallel.

2. The apparatus according to claim 1, wherein the first processing unit receives and decodes the instruction whose execution time changes dynamically, and an instruction supplied from the first control unit. A second execution unit that executes the first execution unit, wherein the second processing unit receives and decodes the instruction whose execution time is statically determined, and an instruction supplied from the second control unit. And a second execution unit that executes the following.

3. The apparatus according to claim 2, further comprising a data storage unit shared by the first processing unit and the second processing unit, wherein the execution time is dynamically increased by writing data to the data storage unit. A data processing device for transmitting a dependency relationship between an instruction whose value changes and an instruction whose execution time is statically determined.

4. The apparatus according to claim 2, wherein the first processing unit has a first data storage unit, the second processing unit has a second data storage unit, and the first storage unit and the first storage unit have a first data storage unit. Data for transmitting a dependency relationship between the instruction whose execution time is dynamically changed and the instruction whose execution time is statically determined by performing data transfer with the second storage means. Processing equipment.

5. The apparatus according to claim 2, wherein a plurality of the second processing units exist and operate independently in parallel, the first processing unit includes a first data storage unit, and the plurality of second processing units Each of the processing units has a second storage means, and the data whose execution time changes dynamically by performing data transfer between the first storage means and the second storage means and the static execution A data processing device for transmitting a dependency relationship with a time-determined instruction.

6. The data processing apparatus according to claim 5, further comprising rewriting means for rewriting the contents of said first control unit or said second control unit.

7. The apparatus according to claim 1, wherein said instruction whose execution time dynamically changes is a memory access instruction, and said instruction whose execution time is statically determined is an operation instruction. A data processing device, comprising: