JP2008158699A

JP2008158699A - Processor

Info

Publication number: JP2008158699A
Application number: JP2006345059A
Authority: JP
Inventors: Shunichi Ishiwatari; 俊一石渡
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2006-12-21
Filing date: 2006-12-21
Publication date: 2008-07-10

Abstract

<P>PROBLEM TO BE SOLVED: To provide a processor allowing prevention of deterioration of performance of the whole processor by reducing overhead of the processor controlling a vector arithmetic processor. <P>SOLUTION: This processor 1 incudes: a first processor 2 outputting a vector arithmetic parameter for a vector operation; and a second processor 3 executing the vector operation based on the vector arithmetic parameter. The second processor 3 has: a first register 26 storing the vector arithmetic parameter; a scalar processor 31 writing the vector arithmetic parameter into the first register 26 according to an instruction designated from the first processor 2; and one or more vector processors 32, 33 executing the instruction designated by the first processor 2 based on the vector arithmetic parameter stored in the first register 26. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、プロセッサに関し、特に、ベクトル演算を実行可能なプロセッサに関する。 The present invention relates to a processor, and more particularly to a processor capable of performing vector operations.

従来より、複数のプロセッサを含むマルチプロセッサが開発され、実用化されている。例えば、コントロールプロセッサが全体的な処理及び例外的な処理を行い、ベクトルコプロセッサが反復処理を行うように構成されるマルチプロセッサが提案されている（例えば、特許文献１参照）。 Conventionally, a multiprocessor including a plurality of processors has been developed and put into practical use. For example, a multiprocessor has been proposed in which a control processor performs overall processing and exceptional processing, and a vector coprocessor performs iterative processing (see, for example, Patent Document 1).

その提案に係るシステムでは、コントロールプロセッサがスカラ演算を行い、ベクトルコプロセッサがベクトル演算を行うが、コントロールプロセッサが、ベクトルコプロセッサの初期化、開始及び停止を行うようになっている。これにより、２つのプロセッサの処理の同期化を実現するものである。 In the system according to the proposal, the control processor performs a scalar operation and the vector coprocessor performs a vector operation, and the control processor initializes, starts, and stops the vector coprocessor. As a result, the processing of the two processors is synchronized.

しかし、このようなマルチプロセッサを、ベクトル演算におけるベクトル長が短い（すなわち、データ数が少ない）データ処理に適用すると、初期化、開始及び停止のための、コントロールプロセッサのオーバヘッドが高くなり、プロセッサ全体の性能が上がり難いという問題があった。これは、コントロールプロセッサとベクトルコプロセッサの間の通信が律速し、プロセッサ全体の性能が制限されるからである。
特開平10-149341号公報 However, when such a multiprocessor is applied to data processing with a short vector length (that is, a small number of data) in vector operation, the overhead of the control processor for initialization, start and stop increases, and the entire processor There was a problem that the performance of was difficult to improve. This is because communication between the control processor and the vector coprocessor is rate-determined and the performance of the entire processor is limited.
Japanese Patent Laid-Open No. 10-149341

そこで、本発明は、ベクトル演算用のプロセッサを制御するプロセッサのオーバヘッドを少なくしてプロセッサ全体の性能の低下を防止することができるプロセッサを提供することを目的とする。 Therefore, an object of the present invention is to provide a processor that can reduce the overhead of the processor that controls the processor for vector operation and can prevent the performance of the entire processor from being lowered.

本発明の一態様によれば、ベクトル演算のためのベクトル演算用パラメータを出力する第１のプロセッサと、前記ベクトル演算を前記ベクトル演算用パラメータに基づいて実行する第２のプロセッサと、を含み、前記第２のプロセッサは、前記ベクトル演算用パラメータを格納する第１のレジスタと、前記第１のプロセッサから指定された命令に従って、前記第１のレジスタに前記ベクトル演算用パラメータを書き込むスカラプロセッサと、前記第１のプロセッサにより指定された命令を、前記第１のレジスタに格納された前記ベクトル演算用パラメータに基づいて実行する１以上のベクトルプロセッサと、を有する According to an aspect of the present invention, the method includes: a first processor that outputs a vector operation parameter for vector operation; and a second processor that executes the vector operation based on the vector operation parameter; The second processor includes a first register that stores the vector operation parameter, a scalar processor that writes the vector operation parameter to the first register in accordance with an instruction specified by the first processor, One or more vector processors that execute an instruction designated by the first processor based on the vector operation parameter stored in the first register;

本発明によれば、ベクトル演算用のプロセッサを制御するプロセッサのオーバヘッドを少なくしてプロセッサ全体の性能の低下を防止することができるプロセッサを実現することができる。 ADVANTAGE OF THE INVENTION According to this invention, the processor which can reduce the overhead of the processor which controls the processor for vector operations, and can prevent the fall of the performance of the whole processor is realizable.

以下、図面を参照して本発明の実施の形態を説明する。
（プロセッサ１全体の構成）
まず、本実施の形態に係わるプロセッサの構成を説明する。図１は、本実施の形態に係わるプロセッサの構成を示すブロック図である。
以下、本実施の形態は、デジタルテレビジョン装置における画像データと音声データのデコード装置としてのプロセッサの例で説明する。 Embodiments of the present invention will be described below with reference to the drawings.
(Configuration of the entire processor 1)
First, the configuration of the processor according to the present embodiment will be described. FIG. 1 is a block diagram showing a configuration of a processor according to the present embodiment.
Hereinafter, the present embodiment will be described using an example of a processor as a decoding device for image data and audio data in a digital television apparatus.

図１に示すプロセッサ１は、２つのサブプロセッサ２，３を含むマルチプロセッサである。第１のプロセッサ２は、制御用及びオーディオデータ処理用のプロセッサであり、第２のプロセッサ３は、ビデオデータ処理用のプロセッサである。制御用のプロセッサ２は、プロセッサ３の制御と、音声データのデコード処理とを行い、ベクトル演算用のプロセッサ３は、ブロック単位（例えばマクロブロック単位）に分割された画像データに対して、ブロック毎に同じ処理を繰り返すような画像データのデコード処理を行う。 The processor 1 shown in FIG. 1 is a multiprocessor including two sub-processors 2 and 3. The first processor 2 is a processor for control and audio data processing, and the second processor 3 is a processor for video data processing. The control processor 2 performs control of the processor 3 and audio data decoding processing, and the vector calculation processor 3 performs block-by-block processing on the image data divided in block units (for example, macroblock units). The image data decoding process is repeated so that the same process is repeated.

プロセッサ１は、さらに、ダイレクトメモリアクセス（以下、DMAという）コントローラ（以下、DMACと略す）４と、バスインターフェース（以下、バスI/Fと略す）５を有する。バスI/F５は、プロセッサ２と３を、グローバルバス７に接続するためのインターフェースである。グローバルバス７には、記憶部である主メモリ６が接続されている。よって、プロセッサ２と３は、グローバルバス７を介して主メモリ６と接続されている。グローバルバス７は、プロセッサ１と主メモリ６間のデータの送受信だけでなく、デジタルテレビジョン装置の他の機能等を実現するための回路、プロセッサ、チップ等との通信を行うためのバスでもある。 The processor 1 further includes a direct memory access (hereinafter referred to as DMA) controller (hereinafter abbreviated as DMAC) 4 and a bus interface (hereinafter abbreviated as bus I / F) 5. The bus I / F 5 is an interface for connecting the processors 2 and 3 to the global bus 7. A main memory 6 as a storage unit is connected to the global bus 7. Therefore, the processors 2 and 3 are connected to the main memory 6 via the global bus 7. The global bus 7 is a bus for not only transmitting / receiving data between the processor 1 and the main memory 6 but also for communicating with circuits, processors, chips, etc. for realizing other functions of the digital television apparatus. .

また、プロセッサ２とプロセッサ３は、制御バス８を介して、互いに接続され、かつ制御バス８にはDMAC４が接続されている。
さらにまた、プロセッサ３は、ローカルバス９を介してバスI/F５と接続されている。 The processor 2 and the processor 3 are connected to each other via the control bus 8, and the DMAC 4 is connected to the control bus 8.
Furthermore, the processor 3 is connected to the bus I / F 5 via the local bus 9.

プロセッサ２は、命令キャッシュ１１と、データキャッシュ１２と、中央処理装置（以下、CPUと略す）１３とを含んで構成され、上述したように、プロセッサ３の制御と音声データの処理とを行う。
プロセッサ３は、命令メモリ２１と、データメモリ２２と、ベクトルレジスタ２３と、DMAC２４と、制御バスレジスタ２５と、制御レジスタ２６と、スカラプロセッサ３１と、複数（ここでは２つ）のベクトルプロセッサ３２、３３とを含んで構成され、画像データのデコード処理を行う。
なお、図１では、プロセッサ３には、２つのベクトルプロセッサ３２，３３が設けられているが、３つ以上でもよい。 The processor 2 includes an instruction cache 11, a data cache 12, and a central processing unit (hereinafter abbreviated as CPU) 13. The processor 2 controls the processor 3 and processes audio data as described above.
The processor 3 includes an instruction memory 21, a data memory 22, a vector register 23, a DMAC 24, a control bus register 25, a control register 26, a scalar processor 31, a plurality of (here, two) vector processors 32, 33, and decodes image data.
In FIG. 1, the processor 3 is provided with two vector processors 32 and 33, but may be three or more.

（プロセッサ２、プロセッサ３及びDMA４の構成）
次に、それぞれがサブプロセッサであるプロセッサ２とプロセッサ３の構成について説明する。
まず、プロセッサ２について説明する。
命令キャッシュ１１は、CPU１３が実行すべき１つの命令あるいは複数の命令（複数の命令からなるプログラムも、以下単に命令という）を一時的に格納するためのキャッシュであり、データキャッシュ１２は、CPU１３が処理すべきデータ及び処理したデータを一次的に格納するためのキャッシュである。
CPU１３は、データキャッシュ１２に格納されたデータについて、命令キャッシュ１１に格納された命令を読み出して実行する。
プロセッサ２の命令キャッシュ１１に格納される命令は、主メモリ６から読み出された命令である。データキャッシュ１２に書き込まれるデータも、主メモリ６から読み出されたデータである。プロセッサ２において処理されてデータキャッシュ１２に一時的に格納されたそれぞれのデータは、主メモリ６に転送されて記憶される。 (Configuration of processor 2, processor 3 and DMA 4)
Next, the configurations of the processor 2 and the processor 3 that are sub-processors will be described.
First, the processor 2 will be described.
The instruction cache 11 is a cache for temporarily storing one instruction or a plurality of instructions to be executed by the CPU 13 (a program consisting of a plurality of instructions is also simply referred to as an instruction hereinafter). This is a cache for temporarily storing data to be processed and processed data.
The CPU 13 reads and executes the instruction stored in the instruction cache 11 for the data stored in the data cache 12.
The instruction stored in the instruction cache 11 of the processor 2 is an instruction read from the main memory 6. Data written to the data cache 12 is also data read from the main memory 6. Each data processed in the processor 2 and temporarily stored in the data cache 12 is transferred to the main memory 6 and stored therein.

従って、プロセッサ２は、主メモリ６に記憶された命令を実行し、主メモリ６に記憶されたプロセッサ３において実行すべき命令とデコードすべき対象データをDMAC４によりプロセッサ３に転送し、プロセッサ３に所定の画像処理、ここではデコード処理をベクトル演算により実行させる。
また、プロセッサ２は、後述するように、プロセッサ３にベクトル演算用パラメータを出力する。 Therefore, the processor 2 executes the instruction stored in the main memory 6, transfers the instruction to be executed in the processor 3 stored in the main memory 6 and the target data to be decoded to the processor 3 by the DMAC 4, and sends it to the processor 3. Predetermined image processing, here decoding processing, is executed by vector operation.
Further, the processor 2 outputs a vector operation parameter to the processor 3 as described later.

次に、プロセッサ３について説明する。
命令メモリ２１は、スカラプロセッサ３１、ベクトルプロセッサ３２，３３及びDMAC２４がそれぞれ実行すべき、あるいは実行可能な命令を記憶するための記憶領域を有する命令保持部である。
データメモリ２２は、デコードすべき対象データと、デコードされた結果データとを、それぞれ記憶するための記憶領域を有するデータ保持部である。すなわち、データメモリ２２は、対象データ記憶領域と結果データ記憶領域の２つの記憶領域を有する。 Next, the processor 3 will be described.
The instruction memory 21 is an instruction holding unit having a storage area for storing instructions that the scalar processor 31, the vector processors 32 and 33, and the DMAC 24 should execute or can execute.
The data memory 22 is a data holding unit having a storage area for storing target data to be decoded and decoded result data. That is, the data memory 22 has two storage areas, a target data storage area and a result data storage area.

命令メモリ２１に書き込まれる命令も、主メモリ６から読み出された命令であり、データメモリ２２に書き込まれるデータも、主メモリ６から読み出されたデータである。なお、プロセッサ３において処理されて得られた結果の結果データも、データメモリ２２に出力され、その後転送されて、主メモリ６に記憶される。 The instruction written to the instruction memory 21 is also an instruction read from the main memory 6, and the data written to the data memory 22 is also data read from the main memory 6. Note that the result data obtained by processing in the processor 3 is also output to the data memory 22, and then transferred and stored in the main memory 6.

ベクトルレジスタ２３は、ベクトルプロセッサ３２，３３とデータメモリ２２の間に設けられ、ベクトルプロセッサ３２，３３が処理する対象の単位データ、例えば１６×１６画素のマクロブロックデータを連続して格納するレジスタである。ベクトルプロセッサ２２，２３において処理すべき対象データと、ベクトルプロセッサ２２，２３において処理して得られた結果データが格納される。すなわち、ベクトルレジスタ２３は、対象データ格納領域と結果データ格納領域の２つの格納領域を有する。 The vector register 23 is provided between the vector processors 32 and 33 and the data memory 22, and continuously stores unit data to be processed by the vector processors 32 and 33, for example, 16 × 16 pixel macroblock data. is there. Target data to be processed in the vector processors 22 and 23 and result data obtained by processing in the vector processors 22 and 23 are stored. That is, the vector register 23 has two storage areas, a target data storage area and a result data storage area.

DMAC２４は、ベクトルレジスタ２３とデータメモリ２２間で、対象データ及び結果データの転送を行うための転送部を構成する。DMAC２４の動作は、命令メモリ２１から読み出されたスカラLIW（Long Instruction Word）命令中に含まれる命令によって制御される。従って、あるベクトル演算が実行されている間に、その後に処理される対象データを、データメモリ２２からベクトルレジスタ２５へ予め転送してセットしておくようにすることができる。DMAC２４を制御する命令については後述する。
制御バスレジスタ２５は、命令メモリ２１中の実行すべき命令のアドレス、データメモリ２２中の処理すべき対象データのアドレス、処理して得られた結果データを格納するアドレス等を、プロセッサ２により制御バス８を介して書き込まれることによって、指定するためのレジスタである。 The DMAC 24 constitutes a transfer unit for transferring target data and result data between the vector register 23 and the data memory 22. The operation of the DMAC 24 is controlled by an instruction included in a scalar LIW (Long Instruction Word) instruction read from the instruction memory 21. Accordingly, while a certain vector operation is being executed, target data to be processed thereafter can be transferred from the data memory 22 to the vector register 25 and set in advance. Instructions for controlling the DMAC 24 will be described later.
The control bus register 25 controls, by the processor 2, the address of the instruction to be executed in the instruction memory 21, the address of the target data to be processed in the data memory 22, the address for storing the result data obtained by processing, and the like. It is a register for designating by being written via the bus 8.

例えば、命令メモリ２１には、それぞれがサブルーチンプログラムとしての複数の命令が格納され、データメモリ２２にも、大きなサイズのデータが格納される場合がある。そのような場合、制御バスレジスタ２５には、命令メモリ２１中のどの命令（サブルーチン）を起動するのかを示すプログラムカウンタの初期値、及び大きなサイズのデータを格納するデータメモリ２２中のデータのアドレスが保持される。 For example, the instruction memory 21 may store a plurality of instructions each as a subroutine program, and the data memory 22 may store a large size of data. In such a case, in the control bus register 25, an initial value of a program counter indicating which instruction (subroutine) in the instruction memory 21 is to be activated, and an address of the data in the data memory 22 storing a large size data Is retained.

具体的には、処理して得られた大きなサイズの結果データを、ローカルバス９を介して、プロセッサ２へのアドレスデータ等を供給する場合、制御バスレジスタ２５には、スカラプロセッサ３１及びベクトルプロセッサ３２，３３によって、その結果データを書き込む、データメモリ２２中のアドレスデータが書き込まれる。
なお、プロセッサ２とプロセッサ３との間で、小さなサイズのデータの送受信が行われる場合もある。そのような場合、制御バスレジスタ２５には、プロセッサ２とプロセッサ３との間で受け渡される小さなデータ構造のデータも格納される。 Specifically, when large-sized result data obtained by processing is supplied to the processor 2 via the local bus 9, the control bus register 25 includes a scalar processor 31 and a vector processor. 32 and 33 write the address data in the data memory 22 into which the result data is written.
In some cases, data of a small size is transmitted and received between the processor 2 and the processor 3. In such a case, the control bus register 25 also stores data having a small data structure transferred between the processor 2 and the processor 3.

制御レジスタ２６は、プロセッサ２から受信した、ベクトルプロセッサ３２，３３が各種ベクトル演算を行うときに必要なベクトル演算用パラメータを格納するためのレジスタである。制御レジスタ２６の複数のレジスタのそれぞれには、ベクトル演算の繰り返し回数、ベクトル演算時の自動インクリメントの幅、ベクトル演算器で使用する定数値データ、各ブロックの特徴量データ等のスカラデータが格納される。これらのデータは、制御バス８経由でプロセッサ２からは直接設定されない。これらのデータは、スカラプロセッサ３１によって、直接、制御レジスタ２６に設定される。 The control register 26 is a register for storing vector operation parameters received from the processor 2 and necessary for the vector processors 32 and 33 to perform various vector operations. Each of the plurality of registers of the control register 26 stores scalar data such as the number of repetitions of vector operation, the width of automatic increment at the time of vector operation, constant value data used in the vector operation unit, and feature value data of each block. The These data are not set directly from the processor 2 via the control bus 8. These data are directly set in the control register 26 by the scalar processor 31.

プロセッサ３のスカラプロセッサ３１は、内部に汎用レジスタを持ち、算術論理演算命令、分岐命令などを実行する回路を含む。後述するように、スカラプロセッサ３１は、プロセッサ２により指定された命令に従って、ベクトルプロセッサ３２，３３において実行される命令に関するパラメータを書き込んでセットするためのプロセッサである。スカラプロセッサ３１が、必要に応じて制御バスレジスタ２５と制御レジスタ２６に対するデータの読み書きをすることができるように、スカラプロセッサ３１と、制御バスレジスタ２５及び制御レジスタ２６との間は、データ転送用バス４１，４２によって接続されている。 The scalar processor 31 of the processor 3 includes a general-purpose register therein and includes a circuit that executes an arithmetic logic operation instruction, a branch instruction, and the like. As will be described later, the scalar processor 31 is a processor for writing and setting parameters relating to instructions executed in the vector processors 32 and 33 in accordance with instructions specified by the processor 2. Data transfer between the scalar processor 31 and the control bus register 25 and the control register 26 is performed so that the scalar processor 31 can read and write data to and from the control bus register 25 and the control register 26 as necessary. The buses 41 and 42 are connected.

例えば、スカラプロセッサ３１が、ベクトルプロセッサ３２，３３において得られた画像の複雑度を示す特徴量データに基づいて各ブロックの特徴量に基づく処理等を行うような場合、その特徴量データに関するデータは、制御バスレジスタ２５あるいは制御レジスタ２６に書き込まれる。 For example, when the scalar processor 31 performs processing based on the feature amount of each block based on the feature amount data indicating the complexity of the image obtained by the vector processors 32 and 33, the data related to the feature amount data is The data is written into the control bus register 25 or the control register 26.

なお、スカラプロセッサ３１は、オペレーティングシステム（以下、OSという）等の処理を実行しないプロセッサである方が好ましい。スカラプロセッサ３１をOSの処理ができるように構成してもよいが、OSの処理ができるように構成すると、半導体装置であるチップにおけるスカラプロセッサ３１の部分の面積が広くなってしまうというデメリットがある。後述するように、パラメータの設定の処理だけ、あるいはパラメータの設定処理に加えて所定のスカラ演算（例えば、画像の特徴量の計算）を行うだけであれば、スカラプロセッサ３１は、OSの処理を実行できないようなプロセッサである方が好ましい。
さらになお、スカラプロセッサ３１は、ベクトルプロセッサ３２，３３において実行される命令に関するパラメータをセットするための専用のプロセッサであってもよい。 The scalar processor 31 is preferably a processor that does not execute processing such as an operating system (hereinafter referred to as OS). The scalar processor 31 may be configured to be able to perform OS processing. However, when configured to be able to perform OS processing, there is a demerit that the area of the scalar processor 31 in a chip that is a semiconductor device increases. . As will be described later, if only a parameter setting process or a predetermined scalar operation (for example, calculation of an image feature amount) is performed in addition to the parameter setting process, the scalar processor 31 performs an OS process. A processor that cannot be executed is preferable.
Furthermore, the scalar processor 31 may be a dedicated processor for setting parameters relating to instructions executed in the vector processors 32 and 33.

ベクトルプロセッサ３２，３３は、それぞれプロセッサ２により指定された命令を、制御レジスタ２６に格納されたベクトル演算用パラメータに基づいて実行する。すなわち、ベクトルプロセッサ３２，３３は、それぞれ、必要に応じて、制御バスレジスタ２５、制御レジスタ２６及びベクトルレジスタ２３からデータを読み出し、指定された命令に基づく演算を繰り返し実行し、演算結果の結果データを制御バスレジスタ２５またはベクトルレジスタ２３に書き込む。そのため、ベクトルプロセッサ３２，３３と、制御バスレジスタ２５及び制御レジスタ２６との間は、データ転送用バス４１，４２によって接続されている。ベクトルプロセッサ３２，３３は、それぞれ異なる演算を実行するようにしてもよいし、同じ演算を行うようにしてもよい。
特に、複数の命令にそれぞれ対応した複数のパラメータデータを制御レジスタ２６に同時に設定できるように、スカラプロセッサ３１と制御レジスタ２６の間を接続するように設けられたデータ転送用バス４２は、複数の信号線あるいは複数のバスから構成されている。ここでは、３つの命令に対応したパラメータデータを制御レジスタ２６に並列に書き込んで設定できるように、３つのバスの信号線が並列に設けられている。そのため、データ転送用バス４２は、制御バス８よりも並列データ転送能力が高くなるので、プロセッサ３は、複数のベクトルプロセッサ（ここでは２つのベクトルプロセッサ３２，３３）を同時に実行させて並列処理させることができる。 The vector processors 32 and 33 each execute an instruction designated by the processor 2 based on the vector operation parameters stored in the control register 26. That is, the vector processors 32 and 33 respectively read data from the control bus register 25, the control register 26, and the vector register 23 as necessary, repeatedly execute an operation based on a designated instruction, and obtain result data of an operation result. Is written into the control bus register 25 or the vector register 23. For this reason, the vector processors 32 and 33 are connected to the control bus register 25 and the control register 26 by data transfer buses 41 and 42. The vector processors 32 and 33 may execute different calculations, or may perform the same calculation.
In particular, a data transfer bus 42 provided to connect between the scalar processor 31 and the control register 26 so that a plurality of parameter data respectively corresponding to a plurality of instructions can be simultaneously set in the control register 26 includes a plurality of parameter data. It consists of signal lines or multiple buses. Here, three bus signal lines are provided in parallel so that parameter data corresponding to the three instructions can be written and set in parallel in the control register 26. Therefore, the data transfer bus 42 has a higher parallel data transfer capability than the control bus 8, so the processor 3 simultaneously executes a plurality of vector processors (here, two vector processors 32 and 33) for parallel processing. be able to.

後述するように、スカラプロセッサ３１用の命令の中には、スカラプロセッサ３１が制御レジスタ２６にどのようなデータを設定するかを指示する命令も含まれている。これにより、プロセッサ２が、制御レジスタ２６に対するパラメータデータの設定を行う必要はないので、プロセッサがプロセッサ３を制御するためのオーバヘッドが少なくなる、すなわちプロセッサ２と３の間の通信により、プロセッサ１全体の処理速度が制限されるというようなことが無くなる。 As will be described later, the instruction for the scalar processor 31 includes an instruction for instructing what data the scalar processor 31 sets in the control register 26. Thereby, since it is not necessary for the processor 2 to set parameter data for the control register 26, the overhead for the processor to control the processor 3 is reduced, that is, the entire processor 1 is communicated between the processors 2 and 3. The processing speed is not limited.

また、本実施の形態では、一つのスカラプロセッサ３１に対して、複数の（ここでは２つの）ベクトルプロセッサ３２，３３が設けられている。これにより、従来のように、プロセッサ２から複数のベクトルプロセッサとの間で通信を行われないようにして、プロセッサ１の処理速度が制限されるというようにすることができる。 In this embodiment, a plurality of (here, two) vector processors 32 and 33 are provided for one scalar processor 31. As a result, it is possible to limit the processing speed of the processor 1 by preventing communication from the processor 2 to a plurality of vector processors as in the prior art.

DMAC４は、主メモリ６と命令メモリ２１の間、及び主メモリ６とデータメモリ２２の間での、DMAによるデータ転送を行う転送部を構成する。DMAC４は、プロセッサ２のCPU１３の制御の下で、主メモリ６と命令メモリ２１の間、及び主メモリ６とデータメモリ２２の間のデータ転送処理を行う。具体的には、プロセッサ２のCPU１３は、DMAC４を制御して、実行すべき命令と対象データをそれぞれ主メモリ６から読み出して、命令メモリ２１とデータメモリ２２へデータ転送し、結果データをデータメモリ２２から、主メモリ６へ転送して書き込む。そして、実行すべき命令は、ベクトル演算の場合、同じ処理を繰り返し実行することが多いので、命令メモリ２１には、複数の命令が格納可能となっている。 The DMAC 4 constitutes a transfer unit that performs data transfer by DMA between the main memory 6 and the instruction memory 21 and between the main memory 6 and the data memory 22. The DMAC 4 performs data transfer processing between the main memory 6 and the instruction memory 21 and between the main memory 6 and the data memory 22 under the control of the CPU 13 of the processor 2. Specifically, the CPU 13 of the processor 2 controls the DMAC 4, reads out the instruction to be executed and the target data from the main memory 6, transfers the data to the instruction memory 21 and the data memory 22, and stores the result data in the data memory. 22 to the main memory 6 for writing. In the case of a vector operation, an instruction to be executed often executes the same process repeatedly. Therefore, the instruction memory 21 can store a plurality of instructions.

そして、CPU１３は、命令メモリ２１に格納した命令のアドレスデータを、プロセッサ３の制御バスレジスタ２５の所定の領域に、制御バス８経由で書き込む。具体的には、後述するように、デコード処理を行うときに、プロセッサ２は命令メモリ２１中の、命令の先頭アドレスを指定することによって、プロセッサ３に、実行すべき命令を通知して指定することができる。 Then, the CPU 13 writes the address data of the instruction stored in the instruction memory 21 to a predetermined area of the control bus register 25 of the processor 3 via the control bus 8. Specifically, as will be described later, when performing the decoding process, the processor 2 notifies the processor 3 of the instruction to be executed by specifying the instruction start address in the instruction memory 21 and specifies the instruction. be able to.

なお、処理内容あるいは処理対象データによっては、プロセッサ３における命令メモリ２１とデータメモリ２２は、それぞれキャッシュでもよい。さらになお、プロセッサ３の命令メモリ２１とデータメモリ２２の容量を大きく取れる場合も、キャッシュミスの発生も低くなるので、命令メモリ２１とデータメモリ２２はそれぞれキャッシュでもよい。 Note that the instruction memory 21 and the data memory 22 in the processor 3 may be caches depending on the processing contents or the processing target data. Furthermore, even when the capacity of the instruction memory 21 and the data memory 22 of the processor 3 can be increased, the occurrence of a cache miss is reduced, so that the instruction memory 21 and the data memory 22 may be caches.

以上のように、プロセッサ１は、音声データと画像データの処理を行う、いわゆるメディアプロセッサであるが、プロセッサ２は、プロセッサ３を制御して、画像データのデコード処理を行わせている。
なお、本実施の形態では、上述したように、プロセッサ３は、画像データをブロック単位に分割してブロック毎に同じ処理を繰り返すような画像のデコード処理を行う。そのため、プロセッサ３では、そのような処理に最適なように、キャッシュではなく、それぞれ記憶容量が比較的小さい命令メモリ２１とデータメモリ２２が用いられている。これは、そのような処理の場合、CPU１３は、デコード処理のために実行すべき命令及びデコード処理すべきデータを予測可能であること、そして、キャッシュを用いるとキャッシュミスの発生による処理の停止が発生してしまうからである。 As described above, the processor 1 is a so-called media processor that processes audio data and image data, but the processor 2 controls the processor 3 to perform decoding processing of image data.
In the present embodiment, as described above, the processor 3 performs an image decoding process such that the image data is divided into blocks and the same process is repeated for each block. Therefore, in the processor 3, an instruction memory 21 and a data memory 22 having a relatively small storage capacity are used instead of a cache so as to be optimal for such processing. This is because, in such a process, the CPU 13 can predict the instruction to be executed for the decoding process and the data to be decoded, and if the cache is used, the process is stopped due to the occurrence of a cache miss. It will occur.

（命令）
次に、プロセッサ３において実行される命令について説明する。
プロセッサ３において実行される命令には、大きく分けて、スカラLIW命令とベクトルLIW命令の2種類ある。特に、スカラLIW命令は、プロセッサ３内のDMAC２４が実行するDMAに関する各種パラメータを設定するフィールドと、ベクトル演算用のパラメータを制御レジスタ２６に設定するためのフィールドを含む。 (order)
Next, instructions executed in the processor 3 will be described.
The instructions executed in the processor 3 are roughly classified into two types: a scalar LIW instruction and a vector LIW instruction. In particular, the scalar LIW instruction includes a field for setting various parameters related to DMA executed by the DMAC 24 in the processor 3 and a field for setting a parameter for vector operation in the control register 26.

命令メモリ２６には、スカラプロセッサ３１用の命令としてのスカラLIW命令と、ベクトルプロセッサ３２，３３用の命令としてのベクトルLIW命令とが混在した命令群が記憶される。従って、スカラLIW命令とベクトルLIW命令は、それぞれ、スカラプロセッサ用の命令であるのか、ベクトル演算用の命令であるのかが識別できるようにするための指定（あるいは識別）フィールドを有する。なお、ベクトルLIW命令には、その命令がベクトルプロセッサ３２と３３のいずれのプロセッサ用の命令なのかを示す、すなわち識別するデータが含まれる。 The instruction memory 26 stores an instruction group in which a scalar LIW instruction as an instruction for the scalar processor 31 and a vector LIW instruction as an instruction for the vector processors 32 and 33 are mixed. Therefore, each of the scalar LIW instruction and the vector LIW instruction has a designation (or identification) field for identifying whether the instruction is a scalar processor instruction or a vector operation instruction. The vector LIW instruction includes data indicating, that is, identifying which of the vector processors 32 and 33 the instruction is for the processor.

図２は、複数の命令が記憶された命令メモリ２１の構成を説明するための図である。図２に示すように、命令メモリ２１には、複数のLIW命令が記憶され、スカラプロセッサ３１用の命令と、ベクトルプロセッサ３２，３３用の命令が混在して含まれている。CPU１３が、複数のLIW命令の中から実行すべき命令を指定するときには、命令メモリ２１中の、その実行すべき命令の先頭アドレスを指定することによって、ベクトルプロセッサ３２，３３に実行すべき命令を通知する。 FIG. 2 is a diagram for explaining the configuration of the instruction memory 21 in which a plurality of instructions are stored. As shown in FIG. 2, the instruction memory 21 stores a plurality of LIW instructions and includes a mixture of instructions for the scalar processor 31 and instructions for the vector processors 32 and 33. When the CPU 13 designates an instruction to be executed from among a plurality of LIW instructions, the instruction to be executed by the vector processors 32 and 33 is designated by designating the start address of the instruction to be executed in the instruction memory 21. Notice.

図３は、各LIW命令の構成例を示す図である。１つのLIW命令５０は、ここでは、１２８ビット長を有している。スカラLIW命令とベクトルLIW命令のいずれの場合も、１つのLIW命令５０は、スカラプロセッサ２１用の命令なのか、あるいはベクトルプロセッサ３１，３２のいずれのプロセッサ用の命令なのかを示すフィールド５１を含む。すなわち、実行すべきプロセッサを識別するための識別フィールドを有する。さらに、本実施の形態では、１つのLIW命令５０は、４つのフィールド５２，５３，５４，５５を有する。
まず、LIW命令５０がスカラLIW命令の場合について説明する。スカラLIW命令には、DMAC２４に対するDMA命令である場合と、ベクトル演算用のパラメータを制御レジスタ２６に設定するためのベクトル演算パラメータ設定命令である場合とがある。 FIG. 3 is a diagram illustrating a configuration example of each LIW instruction. Here, one LIW instruction 50 has a length of 128 bits. In both of the scalar LIW instruction and the vector LIW instruction, one LIW instruction 50 includes a field 51 indicating whether the instruction is for the scalar processor 21 or the processor of the vector processors 31 and 32. . That is, it has an identification field for identifying the processor to be executed. Furthermore, in this embodiment, one LIW instruction 50 has four fields 52, 53, 54, and 55.
First, the case where the LIW instruction 50 is a scalar LIW instruction will be described. The scalar LIW instruction includes a DMA instruction for the DMAC 24 and a vector operation parameter setting instruction for setting a vector operation parameter in the control register 26.

スカラLIW命令がDMAC２４に対するDMA命令の場合、フィールド５２にはDMAC２４用の起動指示とDMA用のパラメータ（例えば、データメモリ２２中の対象データのアドレスデータ等）とが含まれ、他のフィールド５３，５４，５５には、何も実行しないことを示すNOPが含まれるように記述される。言い換えると、DMAC２４は、スカラLIW命令中のフィールド５２に含まれるパラメータ等に基づいて、ベクトルレジスタ２３とデータメモリ２２の間でDMAを行うように構成されている。
なお、DMAC２４用の起動命令とDMA用のパラメータを、スカラLIW命令中に含めるようにしたのは、スカラLIW命令と、ベクトルLIW命令とを比べた場合、スカラプロセッサ３１に供給させるスカラ演算用の命令は、ベクトルプロセッサ３２，３３に実行させる命令よりも短くて済むためである。 When the scalar LIW instruction is a DMA instruction for the DMAC 24, the field 52 includes an activation instruction for the DMAC 24 and a parameter for DMA (for example, address data of the target data in the data memory 22). 54 and 55 are described so as to include a NOP indicating that nothing is executed. In other words, the DMAC 24 is configured to perform DMA between the vector register 23 and the data memory 22 based on the parameters included in the field 52 in the scalar LIW instruction.
The start instruction for DMAC 24 and the parameters for DMA are included in the scalar LIW instruction because the scalar LIW instruction is compared with the vector LIW instruction for the scalar operation to be supplied to the scalar processor 31. This is because the instruction can be shorter than the instruction executed by the vector processors 32 and 33.

また、スカラLIW命令がベクトル演算のための各種パラメータを制御レジスタ２６に設定するための命令の場合、フィールド５３，５４，５５には各種パラメータを制御レジスタ２６に設定するためパラメータが含まれる。具体的には、フィールド５３，５４，５５には、それぞれ、制御レジスタ２６中のレジスタの番号とパラメータの値が含まれる。制御レジスタ２６中の各レジスタは、ベクトル演算における各種パラメータのそれぞれに対応しており、そのレジスタの番号を指定し、その番号のレジスタにパラメータの値を書き込むことにより、ベクトル演算の各種パラメータが設定され得る。各種パラメータは、例えば、ベクトル演算の繰り返し回数、ベクトル演算時の自動インクリメントの幅、ベクトル演算器で使用する定数値データ等である。よって、スカラLIW命令は、ベクトルプロセッサ３２，３３のそれぞれにおいて実行されるベクトル演算のための各種パラメータを、制御レジスタ２６に設定するための命令を含む。
以上のように、スカラLIW命令は、DMAC２４に対するDMA処理の指示をする命令である場合と、ベクトル演算のためのパラメータ設定の指示をする命令である場合がある。また、ベクトルLIW命令は、識別フィールドで指定されたベクトルプロセッサで実行される命令である。すなわち、命令メモリ２１に記憶される命令には、DMAC２４に対するDMA処理の指示をする命令と、ベクトル演算のためのパラメータ設定の指示をする命令と、ベクトルプロセッサで実行される命令とが含まれ、かつ、DMA処理の指示をする命令と、ベクトル演算のためのパラメータ設定の指示をする命令とは、１つのLIW命令中に含まれる。 When the scalar LIW instruction is an instruction for setting various parameters for vector operation in the control register 26, the fields 53, 54, and 55 include parameters for setting various parameters in the control register 26. Specifically, the fields 53, 54, and 55 include the register number and parameter value in the control register 26, respectively. Each register in the control register 26 corresponds to each parameter in the vector operation, and various parameters for the vector operation are set by designating the register number and writing the parameter value in the register of that number. Can be done. The various parameters are, for example, the number of repetitions of vector operation, the width of automatic increment at the time of vector operation, and constant value data used by the vector operation unit. Therefore, the scalar LIW instruction includes an instruction for setting various parameters for vector operation executed in each of the vector processors 32 and 33 in the control register 26.
As described above, the scalar LIW instruction may be an instruction for instructing DMAC 24 for DMA processing or an instruction for instructing parameter setting for vector operation. The vector LIW instruction is an instruction executed by the vector processor specified in the identification field. That is, the instructions stored in the instruction memory 21 include an instruction for instructing DMA processing to the DMAC 24, an instruction for instructing parameter setting for vector operation, and an instruction executed by the vector processor. An instruction for instructing DMA processing and an instruction for instructing parameter setting for vector operation are included in one LIW instruction.

ベクトル演算は、識別フィールドで指定されたベクトルプロセッサにおいて、対応するレジスタに格納されたパラメータを用いて実行される。例えば、ベクトルプロセッサ３２において、あるベクトル演算が、繰り返し実行される場合、制御レジスタ２６内の、ベクトルプロセッサ３の繰り返し回数を格納する所定のレジスタのデータに基づいて、命令メモリ２１中の１つの又は複数のベクトルLIW命令がその回数だけ繰り返し実行される。なお、そのような繰り返し回数等のパラメータは、上述したように、スカラプロセッサ３１によって、制御レジスタ２６に格納される。 The vector operation is executed using the parameters stored in the corresponding register in the vector processor specified by the identification field. For example, when a vector operation is repeatedly executed in the vector processor 32, one or more in the instruction memory 21 is stored in the control register 26 based on data in a predetermined register that stores the number of repetitions of the vector processor 3. Multiple vector LIW instructions are executed repeatedly for that number of times. Note that parameters such as the number of repetitions are stored in the control register 26 by the scalar processor 31 as described above.

従って、複数の命令をベクトルLIW命令中に含ませることができるので、複数の（ここでは２つの）ベクトルプロセッサのうち、一つのベクトルプロセッサが自動的に、ある演算を繰り返し実行している間に、別のベクトルプロセッサが、ベクトルLIW命令により起動されて、別の演算を自動的に繰り返し実行するようにすることができる。また、複数のベクトルプロセッサが自動的にある演算を繰り返し実行している間に、並行して、スカラプロセッサ３１がスカラLIW命令を実行することもできる。さらに、DMAC２４は、スカラプロセッサ３１とベクトルプロセッサ３２，３３による処理とは、並行してDMAを実行することができる。 Therefore, since a plurality of instructions can be included in the vector LIW instruction, one of the plurality (here, two) of vector processors is automatically executed while a certain operation is repeatedly executed. Another vector processor can be activated by the vector LIW instruction to automatically perform another operation repeatedly. In addition, the scalar processor 31 can execute the scalar LIW instruction in parallel while a plurality of vector processors automatically execute certain operations repeatedly. Furthermore, the DMAC 24 can execute DMA in parallel with the processing by the scalar processor 31 and the vector processors 32 and 33.

（処理の流れ）
次に、プロセッサ２とプロセッサ３との間で、画像データのデコード処理を行う際にお互いに処理の同期を取るための動作について説明する。図４は、プロセッサ２と３の間での処理の流れの例を説明するための図である。図４は、複数のマクロブロックのデータをデコードするときに、プロセッサ２からの1回の命令の起動で、複数のマクロブロックのデータに対して、同じ命令を実行する場合の例を示す。
上述したように、はじめに、プロセッサ２は、DMA４によって、実行すべき命令あるいは複数の命令群（すなわちプログラム）を、命令メモリ２１に予めロード、すなわち転送して記憶させておく（ステップS1）。これは、画像データのデコード処理のように、複数のブロックに対して同じ命令を繰り返して行うような場合は、その命令が命令メモリ２１に一旦格納されれば、その後は、その格納された命令を繰り返して実行すればよいからである。
なお、デコード処理の直前に、プロセッサ２が、命令メモリ２１に、DMAC４によって転送してもよい。 (Process flow)
Next, an operation for synchronizing processing between the processor 2 and the processor 3 when image data decoding processing is performed will be described. FIG. 4 is a diagram for explaining an example of the flow of processing between the processors 2 and 3. FIG. 4 shows an example in which the same instruction is executed for the data of a plurality of macroblocks by the activation of one instruction from the processor 2 when decoding the data of the plurality of macroblocks.
As described above, first, the processor 2 uses the DMA 4 to load, transfer, or store an instruction to be executed or a plurality of instruction groups (that is, a program) in the instruction memory 21 in advance (step S1). In the case where the same instruction is repeatedly performed on a plurality of blocks as in the image data decoding process, once the instruction is stored in the instruction memory 21, the stored instruction is thereafter stored. This is because it is only necessary to repeat the above.
Note that the processor 2 may transfer the instruction memory 21 by the DMAC 4 immediately before the decoding process.

そして、次に、プロセッサ２は、デコードすべき対象データを、プロセッサ３のデータメモリ２２へ転送して供給する（ステップS2）。ここでは、例えば、対象データとして、複数のマクロブロックのデータが、まとめてデータメモリ２２の対象データ記憶領域に転送される。そのとき、供給されるデータの構造すなわちサイズが、小さい場合と大きい場合がある。
小さなサイズの対象データをプロセッサ３へ転送する場合、プロセッサ２は、制御バス８経由でプロセッサ３内の制御バスレジスタ２５に、その対象データを直接に書き込むようにして、転送する。 Then, the processor 2 transfers and supplies the target data to be decoded to the data memory 22 of the processor 3 (step S2). Here, for example, data of a plurality of macro blocks is collectively transferred to the target data storage area of the data memory 22 as the target data. At that time, the structure or size of the supplied data may be small or large.
When transferring target data of a small size to the processor 3, the processor 2 transfers the target data by directly writing it to the control bus register 25 in the processor 3 via the control bus 8.

これに対して、大きなデータ構造の対象データをプロセッサ３へ転送する場合は、プロセッサ２は、まず、第１段階として、DMAC４を利用して、ローカルバス９経由でプロセッサ３内のデータメモリ２２の対象データ記憶領域にDMAによりデータ転送する。なお、プロセッサ２は、マクロブロック単位で対象データの転送を行うのではなく、１回のDMAC４の起動で複数のデータをまとめて転送するようにしてもよい。 On the other hand, when transferring the target data having a large data structure to the processor 3, the processor 2 first uses the DMAC 4 as a first step and stores the data in the data memory 22 in the processor 3 via the local bus 9. Data is transferred to the target data storage area by DMA. Note that the processor 2 may transfer a plurality of data at once by starting the DMAC 4 instead of transferring the target data in units of macroblocks.

そして、第２段階として、プロセッサ２は、転送した対象データのアドレスデータ、すなわちデータメモリ２２中のアドレスデータを、プロセッサ３の制御バスレジスタ２５に制御バス８経由で書き込む。その結果、プロセッサ３は、デコードすべき対象データのアドレスデータを得ることができる。 Then, as the second stage, the processor 2 writes the address data of the transferred target data, that is, the address data in the data memory 22 to the control bus register 25 of the processor 3 via the control bus 8. As a result, the processor 3 can obtain the address data of the target data to be decoded.

第３段階として、命令メモリ２１に格納されたDMAC２４に対する命令によって、プロセッサ３内のDMAC２４が起動される。DMAC２４は、制御バスレジスタ２５に格納されたアドレスデータを用いて、DMAにより、データメモリ２２中の１つのマクロブロックのデータをベクトルレジスタ２３に転送する。なお、プロセッサ２は、１回のDMAC２４の起動で複数のマクロブロックのデータをまとめて転送してもよい。
以上のようにして、デコードすべき対象データが、プロセッサ３のベクトルレジスタ２３に格納される。 As a third stage, the DMAC 24 in the processor 3 is activated by an instruction for the DMAC 24 stored in the instruction memory 21. The DMAC 24 uses the address data stored in the control bus register 25 to transfer the data of one macroblock in the data memory 22 to the vector register 23 by DMA. Note that the processor 2 may transfer data of a plurality of macroblocks collectively by starting the DMAC 24 once.
As described above, the target data to be decoded is stored in the vector register 23 of the processor 3.

また、DMAにより結果データを主メモリ６へ転送するために、プロセッサ２は、プロセッサ３に、結果データを格納する、データメモリ２２中のアドレスを通知する（ステップS3）。この通知は、上述したように、プロセッサ２が、制御バスレジスタ２５にそのアドレスを書き込むことによって行われる。 Further, in order to transfer the result data to the main memory 6 by DMA, the processor 2 notifies the processor 3 of the address in the data memory 22 where the result data is stored (step S3). This notification is performed by the processor 2 writing the address in the control bus register 25 as described above.

次に、プロセッサ２は、命令メモリ２１中の実行すべき命令を、プロセッサ３へ通知する。この通知は、具体的には、プロセッサ２が、制御バス８経由で、プロセッサ３の制御バスレジスタ２５の所定の領域に、実行すべき命令（あるいはサブルーチン）の、命令メモリ２６中の先頭アドレスデータを書き込むことによって行われる。これにより、プロセッサ２は、プロセッサ３に、命令メモリ２１中の、実行すべき命令を指定することができる。 Next, the processor 2 notifies the processor 3 of the instruction to be executed in the instruction memory 21. Specifically, this notification is made by the processor 2 via the control bus 8 in the predetermined area of the control bus register 25 of the processor 3, the start address data in the instruction memory 26 of the instruction (or subroutine) to be executed. Is done by writing Thereby, the processor 2 can designate the instruction to be executed in the instruction memory 21 to the processor 3.

そして、プロセッサ２が、制御バス８経由で、制御バスレジスタ２５の所定の領域（上記の命令の先頭アドレスのデータを格納した領域とは別の領域）に、起動指示を示すデータ、例えばフラグデータを書き込む。これにより、プロセッサ２は、例えばプロセッサ３のベクトルプロセッサ３２，３３に、命令の起動を指示することができる（ステップS4）。
その結果、プロセッサ３において、指定されたアドレスから命令が起動され、指定されたベクトルプロセッサにおいてその命令が実行される（ステップS11）。
このとき、命令の起動は１回行うだけで、その命令は、複数のマクロブロックのデータに対して繰り返すように行われる。例えば、マクロブロック毎に同じ命令を実行するような場合、複数のマクロブロックに対して繰り返しその命令を実行するように、所定のパラメータ（例えば繰り返し回数のパラメータ）を設定することによって、最初に１回だけ命令を起動するだけで、指定されたベクトルプロセッサが複数のマクロブロックに対して、同じ命令を実行するようにすることができる。 Then, the processor 2 sends, via the control bus 8, data indicating an activation instruction, for example, flag data, in a predetermined area of the control bus register 25 (an area other than the area where the data of the start address of the instruction is stored). Write. As a result, the processor 2 can instruct the vector processors 32 and 33 of the processor 3 to start the instruction (step S4).
As a result, the processor 3 starts an instruction from the designated address, and the designated vector processor executes the instruction (step S11).
At this time, the instruction is activated only once, and the instruction is repeated for the data of a plurality of macroblocks. For example, when the same instruction is executed for each macroblock, a predetermined parameter (for example, a parameter for the number of repetitions) is set to 1 so that the instruction is repeatedly executed for a plurality of macroblocks. It is possible to cause the designated vector processor to execute the same instruction for a plurality of macroblocks by activating the instruction only once.

プロセッサ２は、ステップS4の後は、プロセッサ３から１つのマクロブロックに対する命令の実行が終了したか否かを判定する（ステップS5）。
一方、プロセッサ３は、命令を１つのマクロブロックに対して実行し、その命令の実行が終了すると、割り込み命令をプロセッサ２へ出力する。その結果、割り込み命令はプロセッサ２に受信され、プロセッサ２は、１つのマクロブロックに対する命令の実行が終了したことが判る。よって、ステップS5において、命令の実行が終了したと判断されて、結果データを主メモリ６へ出力する（ステップS6）。 After step S4, the processor 2 determines whether or not the execution of the instruction for one macroblock from the processor 3 is completed (step S5).
On the other hand, the processor 3 executes an instruction for one macroblock, and outputs an interrupt instruction to the processor 2 when the execution of the instruction is completed. As a result, the interrupt instruction is received by the processor 2, and the processor 2 knows that the execution of the instruction for one macroblock is completed. Therefore, in step S5, it is determined that the instruction has been executed, and the result data is output to the main memory 6 (step S6).

なお、デコードデータ、すなわち結果データは、一旦、ベクトルレジスタ２３に格納されるが、その後は、DMAC２４により、データメモリ２２中の結果データ記憶領域中の指定されたアドレスに転送されて格納される。よって、結果データは、データメモリ２２から主メモリ６あるいはプロセッサ２へ出力される。
具体的には、デコードして得られた結果データのサイズあるいは構造が、小さい場合と大きい場合がある。小さなデータ構造のデータの場合、プロセッサ３が、制御バス８経由で、プロセッサ３内の制御バスレジスタ２５に、結果データを書き込む。そして、プロセッサ２は、その制御バスレジスタ２５からその結果データを直接に読み出すことによって、結果データを得ることができる。 The decoded data, that is, the result data is temporarily stored in the vector register 23, but thereafter transferred to the designated address in the result data storage area in the data memory 22 by the DMAC 24 and stored. Therefore, the result data is output from the data memory 22 to the main memory 6 or the processor 2.
Specifically, the size or structure of the result data obtained by decoding may be small or large. In the case of data having a small data structure, the processor 3 writes the result data to the control bus register 25 in the processor 3 via the control bus 8. The processor 2 can obtain the result data by directly reading the result data from the control bus register 25.

また、大きなデータ構造の結果データを、データメモリ２２から主メモリ６へ転送する場合は、ステップS3において、データメモリ２２のどこに結果データを格納すべきかのアドレスは指定されているので、プロセッサ３は、命令メモリ２１中のDMAC２４のDMA転送を実行させる命令により、DMAC２４を起動して、ベクトルレジスタ２３からデータメモリ２２の結果データ記憶領域の指定されたアドレスに、結果データを転送することができる。なお、プロセッサ３は、１回のDMAC２４の起動で複数のデータをまとめて、ベクトルレジスタ２３からデータメモリ２２へ転送するようにしてもよい。 Further, when transferring the result data having a large data structure from the data memory 22 to the main memory 6, the address of where the result data is to be stored in the data memory 22 is specified in step S3. The DMAC 24 is activated by an instruction for executing DMA transfer of the DMAC 24 in the instruction memory 21, and the result data can be transferred from the vector register 23 to the designated address in the result data storage area of the data memory 22. Note that the processor 3 may collectively transfer a plurality of data from the vector register 23 to the data memory 22 when the DMAC 24 is activated once.

そして、プロセッサ２は、DMAC４を用いて、ローカルバス９を経由して、データメモリ２２から主メモリ６へ結果データを転送して出力する。
なお、ここでも、プロセッサ２は、１回のDMAC４の起動で複数の結果データをまとめて、データメモリ２２から主メモリ６へ転送するようにしてもよい。 Then, the processor 2 uses the DMAC 4 to transfer the result data from the data memory 22 to the main memory 6 via the local bus 9 and output the result data.
Here again, the processor 2 may collectively transfer a plurality of result data from the data memory 22 to the main memory 6 when the DMAC 4 is activated once.

対象データが主メモリ６へ出力されると、次にデコードすべき対象データがあるか否かを判定、すなわちデコードすべき対象データである全てのマクロブロックについて命令の実行が終了したか否かを判断する（ステップS7）。 When the target data is output to the main memory 6, it is determined whether or not there is target data to be decoded next, that is, whether or not execution of instructions has been completed for all macroblocks that are target data to be decoded. Judgment is made (step S7).

全てのマクロブロックについて処理が終了していない場合は、処理は、ステップS5に戻り、プロセッサ２は、ステップS5において、プロセッサ３から割り込み命令を受信するのを待つ。割り込み命令が受信されると、ステップS5で、次のマクロブロックのデータに対する命令の実行が終了したと判断され、プロセッサ２は、全てのマクロブロックについて命令の実行が終了するまで、ステップS5からS7を繰り返す。 If the processing has not been completed for all macroblocks, the processing returns to step S5, and the processor 2 waits to receive an interrupt instruction from the processor 3 in step S5. When the interrupt instruction is received, it is determined in step S5 that the execution of the instruction for the data of the next macroblock has been completed, and the processor 2 performs steps S5 to S7 until the execution of the instruction is completed for all the macroblocks. repeat.

従って、プロセッサ３は、複数のマクロブロックのデータに対して順次、命令を実行し、各マクロブロックに対する命令の実行が終了する度に、割り込み命令を出力する。指定されたすべてのマクロブロックのデータに対する命令の実行が終了すると、プロセッサ３の処理は、停止する。 Accordingly, the processor 3 sequentially executes instructions for data of a plurality of macroblocks, and outputs an interrupt instruction each time execution of instructions for each macroblock is completed. When the execution of the instruction for the data of all designated macroblocks is completed, the processing of the processor 3 stops.

このとき、具体的には、プロセッサ３では、命令メモリ２１に格納されたDMAC２４に対する命令によって、DMAC２４が起動され、データメモリ２２中の対象データがベクトルレジスタ２３に転送される。そして、プロセッサ３は、次の対象データである各マクロブロックのデータに対して、命令を実行するが、その命令の起動は、プロセッサ２から行われず、上述したように、その命令は、繰り返し回数等のパラメータによって設定された回数だけ、プロセッサ３において自動的実行される。そして、各マクロブロックに対して命令の実行が終了する度に、結果データが出力され、かつ次の処理すべき対象データが入力されて、その命令が実行される。 At this time, specifically, in the processor 3, the DMAC 24 is activated by an instruction for the DMAC 24 stored in the instruction memory 21, and the target data in the data memory 22 is transferred to the vector register 23. Then, the processor 3 executes an instruction for the data of each macro block which is the next target data, but the instruction is not activated from the processor 2, and as described above, the instruction is repeated This is automatically executed by the processor 3 as many times as set by parameters such as. Each time execution of an instruction is completed for each macroblock, result data is output and target data to be processed next is input, and the instruction is executed.

従って、例えば、ｎ個のマクロブロックに対して、同じデコード処理を施すために命令をｎ回実行する場合であっても、プロセッサ２は、最初の１回目だけ、プロセッサ３に対して命令の起動を行うだけで、後は、プロセッサ３内で、ｎ個のマクロブロックに対して自動的に同じ命令を実行させることができる。よって、プロセッサ２は、その間、他の処理を実行することができる。
図５は、プロセッサ２と３の間での処理の流れの他の例を説明するための図である。図４と同じ処理については、同じ符号を付し、説明は省略する。図５は、複数のマクロブロックのデータをデコードするときに、プロセッサ２からの命令の起動の度に、１つのマクロブロックのデータに対して、命令が実行される場合の例を示す。
図５の場合も、はじめに、プロセッサ２は、DMA４によって、実行すべき命令あるいは複数の命令群を、命令メモリ２１にロードしておく（ステップS1）。
そして、次に、プロセッサ２は、デコードすべき対象データであるマクロブロックのデータを、DMA４によってプロセッサ３のデータメモリ２２へ転送して供給する（ステップS2）。ここでは、例えば、対象データとして、１つのマクロブロックのデータが、データメモリ２２の対象データ記憶領域に転送される。 Therefore, for example, even when an instruction is executed n times to perform the same decoding process on n macroblocks, the processor 2 activates the instruction to the processor 3 only for the first time. After that, the same instruction can be automatically executed for n macroblocks in the processor 3. Therefore, the processor 2 can execute other processes during that time.
FIG. 5 is a diagram for explaining another example of the flow of processing between the processors 2 and 3. The same processes as those in FIG. 4 are denoted by the same reference numerals and description thereof is omitted. FIG. 5 shows an example in which an instruction is executed for one macroblock data each time an instruction from the processor 2 is activated when data of a plurality of macroblocks is decoded.
Also in the case of FIG. 5, first, the processor 2 loads an instruction to be executed or a plurality of instruction groups to the instruction memory 21 by the DMA 4 (step S1).
Then, the processor 2 transfers the macroblock data, which is the target data to be decoded, to the data memory 22 of the processor 3 by the DMA 4 and supplies the data (step S2). Here, for example, data of one macro block is transferred to the target data storage area of the data memory 22 as the target data.

また、供給される対象データの構造すなわちサイズが、小さい場合と大きい場合があるが、図４で説明した手順と同じ手順で、対象データは、プロセッサ２からプロセッサ３へ転送される。
その結果、デコードすべき対象データが、プロセッサ３のベクトルレジスタ２３に格納される。
また、プロセッサ２は、結果データを記憶する、データメモリ２２中のアドレスを、プロセッサ３に通知する（ステップS3）。 Further, although the structure, that is, the size of the supplied target data may be small or large, the target data is transferred from the processor 2 to the processor 3 in the same procedure as described in FIG.
As a result, the target data to be decoded is stored in the vector register 23 of the processor 3.
Further, the processor 2 notifies the processor 3 of the address in the data memory 22 where the result data is stored (step S3).

そして、プロセッサ２は、プロセッサ３に、命令の起動を指示する（ステップS4）。
その結果、プロセッサ３において、指定されたアドレスから命令が起動され、命令が実行されるが、命令中に書かれたHALT命令（停止命令）が実行されると、ベクトルプロセッサは一旦停止する（ステップS21）。 Then, the processor 2 instructs the processor 3 to start an instruction (step S4).
As a result, in the processor 3, the instruction is started from the designated address and the instruction is executed. However, when the HALT instruction (stop instruction) written in the instruction is executed, the vector processor temporarily stops (step S21).

HALT命令が実行されると、プロセッサ３は、HALT命令が実行されたことをプロセッサ２へ通知し、プロセッサ２は、命令の実行の終了を検知することができる。 When the HALT instruction is executed, the processor 3 notifies the processor 2 that the HALT instruction has been executed, and the processor 2 can detect the end of execution of the instruction.

プロセッサ２は、ステップS4の後、命令の実行が終了したか否かを監視し（ステップS5）、その終了を検知すると、結果データを出力する（ステップS6）。結果データの出力は、図４の場合と同様である。 After step S4, the processor 2 monitors whether or not the execution of the instruction has ended (step S5), and outputs the result data when detecting the end (step S6). The output of the result data is the same as in the case of FIG.

対象データが主メモリ６へ出力されると、次にデコードすべき対象データがあるか否かを判定、すなわちデコードすべき全てのマクロブロックについて命令の実行が終了したか否かを判断する（ステップS22）。 When the target data is output to the main memory 6, it is determined whether or not there is target data to be decoded next, that is, whether or not instruction execution has been completed for all the macroblocks to be decoded (step). S22).

全てのマクロブロックについて命令の実行が終了していない場合は、プロセッサ２の処理は、ステップS2に戻り、次のマクロブロックの対象データを主メモリ６から、データメモリ２２へ転送して供給する（ステップS2）。
以下、上述したステップS2からS22までの処理が、指定された全てのブロックについて実行されるまで、繰り返される。 If the instruction execution has not been completed for all the macroblocks, the processing of the processor 2 returns to step S2, and the target data of the next macroblock is transferred from the main memory 6 to the data memory 22 and supplied ( Step S2).
Thereafter, the above-described processing from steps S2 to S22 is repeated until all the designated blocks are executed.

プロセッサ３は、対象データである１つのマクロブロックのデータに対する命令の実行の最後にHALT命令を実行し、その後は、その命令の実行については停止状態となる。ステップS4において、プロセッサ２から命令の起動がされると、再度、命令を実行する。
このようにして、複数のマクロブロックのデータに対して、同一の命令が連続して実行される。 The processor 3 executes the HALT instruction at the end of the execution of the instruction for the data of one macroblock which is the target data, and thereafter, the execution of the instruction is stopped. In step S4, when an instruction is activated from the processor 2, the instruction is executed again.
In this way, the same instruction is continuously executed on the data of a plurality of macroblocks.

図５のような処理においても、プロセッサ２は、ベクトル演算のための処理としては、命令の指定と起動を行うだけでよいので、プロセッサ３に対する制御のためのオーバヘッドが少なくて済む。 Also in the process as shown in FIG. 5, the processor 2 only needs to specify and start an instruction as a process for the vector operation, so that the overhead for controlling the processor 3 can be reduced.

以上のように、上述した実施の形態に係るプロセッサ１によれば、ベクトル演算用のプロセッサ３を制御する制御用のプロセッサ２のオーバヘッドを少なくしてプロセッサ全体の性能の低下を防止することができる。 As described above, according to the processor 1 according to the above-described embodiment, it is possible to reduce the overhead of the control processor 2 that controls the vector calculation processor 3 and to prevent the performance of the entire processor from being deteriorated. .

また、プロセッサ１によれば、ベクトル演算命令を命令メモリ２１からベクトルプロセッサ３２，３３に直接供給するので、ベクトル演算の起動頻度が高いアプリケーションに対しても、プロセッサ１の性能が低下し難い。 Further, according to the processor 1, since the vector operation instruction is directly supplied from the instruction memory 21 to the vector processors 32 and 33, the performance of the processor 1 is hardly lowered even for an application having a high frequency of vector operation.

さらにまた、適用するアプリケーションのスカラ演算とベクトル演算の比率に応じて、プロセッサ３内のベクトルプロセッサの数を柔軟に調整できるので、広範囲な種々のアプリケーションに応じて、低コストでかつ高い性能を達成できる構成を提供することができる。 Furthermore, the number of vector processors in the processor 3 can be flexibly adjusted according to the ratio between the scalar operation and the vector operation of the application to be applied, thereby achieving high performance at low cost according to a wide variety of applications. A possible configuration can be provided.

（変形例）
次に、変形例を説明する。
図６は、１つの制御用のプロセッサに対して、ベクトル演算を行う複数のプロセッサを設けた第１の変形例のプロセッサの構成例を示すブロック図である。図１の構成と同じ構成要素については、同じ符号を付して説明は省略する。図６に示すように、メディアプロセッサであるプロセッサ１Aは、１つのプロセッサと、それぞれがベクトル演算を行う複数のプロセッサを含んで構成されている。プロセッサ１Aにおいて、制御用のプロセッサ２Aは、図１のプロセッサ２と同じ構成を有する。ベクトル演算を行うｍ個のプロセッサ３A1〜３Am（以下、纏めてあるいはそれぞれのプロセッサを指すときは、プロセッサ３Aという）は、それぞれ図１のプロセッサ３と同じ構成を有し、プロセッサ３A1〜３Amのそれぞれにおいて、制御バスレジスタ２５は、制御バス８に接続されており、命令メモリ２１とデータメモリ２２は、ローカルバス９に接続されている。なお、ｍは、整数である。DMAC４Aは、上述したDMAC４と同様に、各プロセッサ３Aの命令メモリ２１と主メモリ６との間、及び各プロセッサ３Aのデータメモリ２２と主メモリ６との間、における命令及びデータの転送を、プロセッサ２Aの制御の下で実行する。 (Modification)
Next, a modified example will be described.
FIG. 6 is a block diagram illustrating a configuration example of a processor according to a first modified example in which a plurality of processors that perform vector operations are provided for one control processor. The same components as those in FIG. 1 are denoted by the same reference numerals and description thereof is omitted. As shown in FIG. 6, the processor 1A, which is a media processor, includes a single processor and a plurality of processors each performing vector operations. In the processor 1A, the control processor 2A has the same configuration as the processor 2 of FIG. The m processors 3A1 to 3Am (hereinafter collectively referred to as the processor 3A when referring to the respective processors) that perform vector operations have the same configuration as the processor 3 of FIG. 1, and each of the processors 3A1 to 3Am. The control bus register 25 is connected to the control bus 8, and the instruction memory 21 and the data memory 22 are connected to the local bus 9. Note that m is an integer. The DMAC 4A, like the DMAC 4 described above, transfers instructions and data between the instruction memory 21 and the main memory 6 of each processor 3A and between the data memory 22 and the main memory 6 of each processor 3A. Run under 2A control.

このような構成によれば、ベクトル演算に高い性能を要求された場合に、プロセッサ３Aを複数個設けて並列度を上げても、プロセッサ２は、ベクトル演算のための処理としては、命令の指定と起動を行うだけでよいので、オーバヘッドが少なくて済む。 According to such a configuration, when high performance is required for vector operation, even if a plurality of processors 3A are provided to increase the degree of parallelism, the processor 2 does not specify instructions as processing for vector operation. Since it only needs to be started, overhead can be reduced.

上述した特開平10-149341号公報に開示の構成に比べても、図６の構成のプロセッサ１Aの方が、オーバヘッドは少なくて済むものである。
すなわち、上述した特開平10-149341号公報に開示の構成においては、ベクトル演算に高い性能を要求された場合、ベクトルコプロセッサを複数設けて並列度を上げることにより、より高い性能に対応することもできるが、コントロールプロセッサは、複数のベクトルコプロセッサに対する、開始、停止等のための制御のオーバヘッドも高くなり、性能の向上は上がり難い。なお、より高い性能を上げるには、コントロールプロセッサとベクトルコプロセッサからなるプロセッシングコアを、複数並べて設けることも考えられるが、そのようなマルチプロセッサが適用装置される装置によっては、コストパフォーマンスが低くなる。 Compared to the configuration disclosed in Japanese Patent Laid-Open No. 10-149341 described above, the processor 1A having the configuration of FIG. 6 requires less overhead.
That is, in the configuration disclosed in Japanese Patent Laid-Open No. 10-149341 described above, when high performance is required for vector calculation, it is possible to cope with higher performance by providing a plurality of vector coprocessors to increase the degree of parallelism. However, the control processor has a high control overhead for starting, stopping, etc. for a plurality of vector coprocessors, and it is difficult to improve the performance. In order to increase the performance, it may be possible to arrange a plurality of processing cores composed of a control processor and a vector coprocessor. However, depending on the device to which such a multiprocessor is applied, the cost performance is lowered. .

これに対して、第１の変形例に係るプロセッサによれば、一つの汎用の制御用のプロセッサ２Aに対して複数のサブプロセッサを接続できるので、ベクトル演算処理の比率が、制御処理に対して高いアプリケーションについては、従来よりも効率よく高い性能を達成できる。また、プロセッサ３Aの単位でサブプロセッサを増やすことが可能であるので、プロセッサ１Aは、アプリケーションに応じてプロセッサ３Aの数を調整することにより、さまざまなアプリケーションに対して、効率よく高い性能を達成できる構成を提供することができる。さらに、各プロセッサ３A内にスカラプロセッサを含み、そのスカラプロセッサがベクトルプロセッサを素早く起動するための処理を行うので、プロセッサ３Aの数が増えてもベクトル演算の起動処理が律速しにくい。 On the other hand, according to the processor according to the first modification, a plurality of sub-processors can be connected to one general-purpose control processor 2A. For high applications, high performance can be achieved more efficiently than before. Further, since it is possible to increase the number of sub-processors in units of the processor 3A, the processor 1A can achieve high performance efficiently for various applications by adjusting the number of processors 3A according to the application. A configuration can be provided. Further, each processor 3A includes a scalar processor, and the scalar processor performs a process for quickly starting the vector processor. Therefore, even if the number of processors 3A increases, the starting process of the vector operation is hardly rate-determined.

図７は、制御用のプロセッサにSIMD（Single Instruction/Multiple Data）コプロセッサを設け、さらに、ベクトル演算を行うプロセッサにもSIMDコプロセッサを設けた第２の変形例のプロセッサの構成例を示すブロック図である。図１の構成と同じ構成要素については、同じ符号を付して説明は省略する。 FIG. 7 is a block diagram showing a configuration example of a processor according to a second modified example in which a SIMD (Single Instruction / Multiple Data) coprocessor is provided in a control processor, and a SIMD coprocessor is also provided in a processor that performs vector operations. FIG. The same components as those in FIG. 1 are denoted by the same reference numerals and description thereof is omitted.

プロセッサ２Bは、拡張可能な汎用プロセッサであるCPU１３を有し、CPU１３には、拡張機能を実行するプロセッサとしてSIMDコプロセッサ１４が接続されている。また、サブプロセッサとしてのプロセッサ３Bには、図１と同様の構成に加えて、SIMDコプロセッサ３４を有する。SIMDコプロセッサ３４は、内部にコプロセッサ用の汎用レジスタを有する。この場合、スカラLIW命令は、ベクトル演算用パラメータを設定するためのスカラ演算を指定するフィールドと、DMAC２４へのパラメータを設定するフィールドとに加えて、SIMDコプロセッサ３４における演算を指定するフィールドと、を含む。 The processor 2B has a CPU 13 that is a general-purpose processor that can be expanded, and a SIMD coprocessor 14 is connected to the CPU 13 as a processor that executes an expansion function. The processor 3B as a sub processor has a SIMD coprocessor 34 in addition to the same configuration as that shown in FIG. The SIMD coprocessor 34 has a general purpose register for the coprocessor. In this case, the scalar LIW instruction includes a field for specifying an operation in the SIMD coprocessor 34 in addition to a field for specifying a scalar operation for setting a parameter for vector operation and a field for setting a parameter to the DMAC 24. including.

このような構成によれば、ベクトル演算を行うプロセッサ３Bにおいて、画像データに対するデブロッキング処理の前処理における検出処理等の処理を、スカラプロセッサであるSIMDコプロセッサ１４又は３４にさせることができるので、プロセッサ１Bの処理能力の向上を図ることができる。すなわち、例えば、画像等の処理において、ベクトル演算による画像処理した結果に対して、スカラ量を算出し、さらにそのスカラ量に基づいて別な処理を行うような、ベクトル演算とスカラ演算とが混在している複雑な応用の場合には、スカラ演算をSIMDコプロセッサ１４又は３４にさせることができる。 According to such a configuration, in the processor 3B that performs the vector operation, the SIMD coprocessor 14 or 34 that is a scalar processor can perform processing such as detection processing in the preprocessing of the deblocking processing on the image data. The processing capability of the processor 1B can be improved. That is, for example, in the processing of an image or the like, a vector operation and a scalar operation are mixed so that a scalar amount is calculated with respect to a result of image processing by vector operation, and further processing is performed based on the scalar amount. For complex applications, scalar operations can be made to the SIMD coprocessor 14 or 34.

すなわち、上述した特開平10-149341号公報に開示の構成においては、このようなベクトル演算とスカラ演算とが混在している複雑な応用の場合、スカラ演算量の増加に対して、ベクトル演算量は相対的に増えないので、プロセッサ全体の性能は上がり難い。 That is, in the configuration disclosed in the above-mentioned Japanese Patent Application Laid-Open No. 10-149341, in the case of a complicated application in which such vector calculation and scalar calculation are mixed, the amount of vector calculation is increased with respect to an increase in the amount of scalar calculation. Does not increase relatively, so the overall performance of the processor is unlikely to increase.

これに対して、第２の変形例に係るプロセッサ１Bによれば、ベクトル長が短い処理に対してはSIMDコプロセッサ１４又は３４で処理するようにできるので、ベクトル長の長い処理と短い処理が混在しているときに、プロセッサ３Bにおいて効率よく高い性能を達成することができる。 On the other hand, according to the processor 1B according to the second modification, the SIMD coprocessor 14 or 34 can process a process with a short vector length, so a process with a long vector length and a process with a short vector length can be performed. When they are mixed, high performance can be achieved efficiently in the processor 3B.

以上説明した実施の形態に係わる装置は、デジタルテレビジョン装置であったが、上述したプロセッサ１、１A、１Bは、デジタルテレビジョン装置以外にも、例えば、DVDレコーダ等の他のデジタル家電製品、車載用画像認識処理装置、ロボット用画像認識処理装置等の各種画像処理装置、パーソナルコンピュータ、携帯電話等の機器にも適用可能である。
本発明は、上述した実施の形態に限定されるものではなく、本発明の要旨を変えない範囲において、種々の変更、改変等が可能である。 The apparatus according to the embodiment described above is a digital television apparatus. However, the above-described processors 1, 1A, and 1B may be other digital home appliances such as a DVD recorder, The present invention can also be applied to various image processing apparatuses such as an in-vehicle image recognition processing apparatus and a robot image recognition processing apparatus, and devices such as a personal computer and a mobile phone.
The present invention is not limited to the above-described embodiments, and various changes and modifications can be made without departing from the scope of the present invention.

本発明の実施の形態に係わるプロセッサの構成を示すブロック図である。It is a block diagram which shows the structure of the processor concerning embodiment of this invention. 本発明の実施の形態に係わる、複数の命令が記憶された命令メモリの構成を説明するための図である。It is a figure for demonstrating the structure of the instruction | indication memory in which the some instruction | command memorize | stored concerning embodiment of this invention. 本発明の実施の形態に係わる、各LIW命令の構成例を示す図である。It is a figure which shows the structural example of each LIW instruction concerning embodiment of this invention. 本発明の実施の形態に係わる、２つのプロセッサ間での処理の流れの例を説明するための図である。It is a figure for demonstrating the example of the flow of a process between two processors concerning embodiment of this invention. 本発明の実施の形態に係わる、２つのプロセッサ間での処理の流れの他の例を説明するための図である。It is a figure for demonstrating the other example of the flow of a process between two processors concerning embodiment of this invention. 本発明の実施の形態の第１の変形例のプロセッサの構成例を示すブロック図である。It is a block diagram which shows the structural example of the processor of the 1st modification of embodiment of this invention. 本発明の実施の形態の第２の変形例のプロセッサの構成例を示すブロック図である。It is a block diagram which shows the structural example of the processor of the 2nd modification of embodiment of this invention.

Explanation of symbols

１，１A，1B プロセッサ、２，２A，２B，３，３A，３B サブプロセッサ、７グローバルバス、８制御バス、９ローカルバス、２１命令メモリ、４１，４２データ転送用バス、５０ LIW命令、５１〜５５フィールド 1, 1A, 1B processor, 2, 2A, 2B, 3, 3A, 3B sub-processor, 7 global bus, 8 control bus, 9 local bus, 21 instruction memory, 41, 42 data transfer bus, 50 LIW instruction, 51 ~ 55 fields

Claims

A first processor for outputting vector operation parameters for vector operation;
A second processor that executes the vector operation based on the vector operation parameters;
Including
The second processor is
A first register for storing the vector operation parameters;
A scalar processor for writing the vector operation parameter to the first register in accordance with an instruction designated by the first processor;
One or more vector processors that execute instructions designated by the first processor based on the vector operation parameters stored in the first register;
A processor characterized by comprising:

A plurality of signal lines are provided to connect the scalar processor and the first register, and write a plurality of vector operation parameters in parallel. The processor of claim 1.

An instruction holding unit for holding a plurality of instructions executable in the one or more vector processors;
A first transfer unit configured to transfer the plurality of instructions from the storage unit to the instruction holding unit in order to write the plurality of instructions from the storage unit to the instruction holding unit;
Have
3. The one or more vector processors each execute an instruction designated by the first processor from the plurality of instructions based on an address of the designated instruction. The processor described.

A first processor for outputting vector operation parameters for vector operation;
A plurality of second processors each executing the vector operation based on the vector operation parameters;
Including
Each of the plurality of second processors is
A first register for storing the vector operation parameters;
A scalar processor for writing the vector operation parameter to the first register in accordance with an instruction designated by the first processor;
One or more vector processors that execute instructions designated by the first processor based on the vector operation parameters stored in the first register;
A processor characterized by comprising:

A first processor for outputting vector operation parameters for vector operation;
A second processor that executes the vector operation based on the vector operation parameters;
Including
The second processor is
A first register for storing the vector operation parameters;
A first scalar processor for writing the vector operation parameter to the first register in accordance with an instruction designated by the first processor;
A second scalar processor that performs a predetermined scalar operation in accordance with an instruction designated by the first processor;
One or more vector processors that execute instructions designated by the first processor based on the vector operation parameters stored in the first register;
A processor characterized by comprising: