JP2002245448A

JP2002245448A - Arithmetic unit

Info

Publication number: JP2002245448A
Application number: JP2001321186A
Authority: JP
Inventors: 康介 ▲よし▼岡; Kosuke Yoshioka; Makoto Hirai; 誠平井; Tokuzo Kiyohara; 督三清原; Kozo Kimura; 浩三木村
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1997-04-07
Filing date: 2001-10-18
Publication date: 2002-08-30

Abstract

PROBLEM TO BE SOLVED: To provide an arithmetic unit which performs a series of processes, such as inputting, decoding and outputting of compressed images and compressed sound data and has high throughput even though the arithmetic unit is not operated in a high frequency. SOLUTION: This arithmetic device has first and second control/storage parts 506 and 507 for respectively storing a microprogram, a first program counter 504 for designating a first read address in the first control/storage part, a second program counter 505 for designating a second read address, a selector 508 for selecting either the first read address or the second read address and outputting the selected read address to the second control/storage part, and an execution unit 501 for performing an arithmetic by microprogram control by the first and second control/storage parts.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、デジタル信号処理
の技術分野に属するものであって、圧縮された映像及び
音声データの伸長、映像及び音声データの圧縮、グラフ
ィックス処理などを行う映像音声処理装置（演算装置）
に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention belongs to the technical field of digital signal processing, and more particularly to video / audio processing for expanding compressed video and audio data, compressing video and audio data, and performing graphics processing. Equipment (arithmetic unit)
About.

【０００２】[0002]

【従来の技術】近年、ディジタル動画データの圧縮／伸
長技術が確立されてきたことや、ＬＳＩ技術が向上して
きたこととがあいまって、圧縮映像及び音声データを伸
長するデコーダ、映像及び音声データを圧縮するエンコ
ーダ、グラフィックス処理を行うグラフィックス処理な
どの種々の映像音声処理装置が重要視されている。2. Description of the Related Art In recent years, a digital video data compression / decompression technology has been established and an LSI technology has been improved. Various video and audio processing devices such as an encoder for compression and a graphics process for performing a graphics process are regarded as important.

【０００３】第１の従来技術として、ＭＰＥＧ（Moving
Picture Experts Group）規格の圧縮映像及び音声デー
タを伸長する映像音声デコーダ（特開平８−１１１６４
２９）がある。この映像音声デコーダは、１つの信号処
理ユニットを用いて映像デコードと音声デコードの両方
を行う。図１に、この映像音声デコーダによるデコード
処理の説明図を示す。同図の縦軸は時間を、横軸は演算
量を表している。As a first conventional technique, MPEG (Moving)
(Video Experts Group) Standard video / audio decoder for expanding compressed video and audio data
29). This video / audio decoder performs both video decoding and audio decoding using one signal processing unit. FIG. 1 is an explanatory diagram of a decoding process by the video / audio decoder. In the figure, the vertical axis represents time, and the horizontal axis represents the amount of calculation.

【０００４】縦軸に沿って大きく見ると、映像デコード
と音声デコードとが交互に処理される。これは、共通の
ハードウェアで映像、音声の両者をデコードするためで
ある。同図のように映像デコードは、逐次処理とブロッ
ク処理とに分けられる。逐次処理は、ブロック以外のデ
コード、つまりＭＰＥＧストリームのヘッダ解析など多
岐にわたる条件判断を必要とする処理であり、その演算
量は少ない。ブロックデコードは、ＭＰＥＧストリーム
の可変長符号を復号しさらにブロック単位に逆量子化、
逆ＤＣＴ（離散余弦変換）を行う処理であり、その演算
量は大きい。同図のように音声デコードも、多岐にわた
る条件判断を必要とする上記と同様の逐次処理と、音声
データ本体のデコード処理とに分けられる。音声データ
本体のデコード処理は、画像データよりも高い精度が要
求され、かつ限られた時間内に処理しなければならない
ので、精度よく高速に処理する必要があり、その演算量
は大きい。When viewed along the vertical axis, video decoding and audio decoding are performed alternately. This is for decoding both video and audio with common hardware. As shown in the figure, video decoding is divided into sequential processing and block processing. Sequential processing is processing that requires a wide variety of condition determinations, such as decoding of blocks other than blocks, that is, analysis of the header of an MPEG stream, and the amount of calculation is small. Block decoding decodes the variable-length code of the MPEG stream and dequantizes it in block units.
This is a process of performing inverse DCT (discrete cosine transform), and the amount of calculation is large. As shown in the figure, audio decoding is also divided into sequential processing similar to the above, which requires a variety of condition judgments, and decoding processing of audio data itself. The decoding process of the audio data itself requires higher precision than the image data and must be processed within a limited time. Therefore, it is necessary to perform the processing with high precision and high speed, and the amount of calculation is large.

【０００５】このように、第１の従来技術は、１チップ
化を可能にし、１チップという少ないハードウェアで効
率的な音声映像デコードを実現している。第２の従来技
術として、２チップ構成のデコーダがある。１チップは
映像デコーダ、他の１チップは音声デコーダとして用い
られる。図２に２チップ構成のデコーダによるデコード
処理の説明図を示す。映像デコーダ、音声デコーダとも
にヘッダ解析等の条件判断を多数含む逐次処理と、デー
タ本体のデコードを主とするブロックデコード処理とを
行う。映像デコーダ、音声デコーダともに、独立に処理
するので第１の従来技術と比べて個々のチップの能力は
低くてよい。[0005] As described above, the first conventional technique enables one chip, and realizes efficient audio / video decoding with hardware as small as one chip. As a second conventional technique, there is a decoder having a two-chip configuration. One chip is used as a video decoder, and the other chip is used as an audio decoder. FIG. 2 is an explanatory diagram of a decoding process by a two-chip decoder. Both the video decoder and the audio decoder perform sequential processing including a large number of condition determinations such as header analysis, and block decoding processing mainly for decoding the data body. Since both the video decoder and the audio decoder process independently, the performance of each chip may be lower than in the first prior art.

【０００６】[0006]

【発明が解決しようとする課題】しかしながら上記従来
技術によれば、次のような問題があった。第１の従来技
術によれば、信号処理ユニットが映像も音声もデコード
しなねればならないので、高い処理能力が要求される。
つまり１００ＭＨｚ以上の高速クロックを用いて動作さ
せる必要があり、民生用の半導体としてはコストが高い
という問題がある。また、高速クロックを用いずに処理
能力を高めるために、ＶＬＩＷ(Very Long Instruction
Word)プロセッサなどを用いることも考えられなくはな
いが、ＶＬＩＷプロセッサそのもののコストが高いうえ
に、別途逐次処理を行うプロセッサを用いなければ全体
の処理としては非効率になるという問題がある。However, according to the above prior art, there are the following problems. According to the first prior art, the signal processing unit must decode both video and audio, so that high processing performance is required.
That is, it is necessary to operate using a high-speed clock of 100 MHz or more, and there is a problem that the cost is high as a semiconductor for consumer use. Also, in order to increase processing capacity without using a high-speed clock, VLIW (Very Long Instruction
Although it is conceivable to use a Word) processor or the like, there is a problem that the cost of the VLIW processor itself is high and the overall processing becomes inefficient unless a processor that performs sequential processing is used separately.

【０００７】第２の従来技術によれば、２つのプロセッ
サを用いるのでコストが高いという問題があった。つま
り、映像用プロセッサも音声用プロセッサも、処理能力
の低い汎用の安価なプロセッサをそのまま使用すること
はできない。なぜなら映像用のプロセッサは、大量の画
像データをリアルタイムに処理する能力が要求されるか
らである。また音声用のプロセッサは、映像用プロセッ
サほど多くの演算量を要求されないけれども、音声デー
タの方が画像データよりも高い精度を要求されるからで
ある。それゆえ、安価なあるいは処理能力の低いプロセ
ッサでは、映像用としても音声用としても、要求される
処理能力を満たさない。According to the second prior art, there is a problem that the cost is high because two processors are used. That is, neither the video processor nor the audio processor can use a general-purpose and inexpensive processor with low processing capability. This is because a video processor is required to be capable of processing a large amount of image data in real time. Also, although the processor for audio does not require as much computational complexity as the processor for video, the audio data is required to have higher accuracy than the image data. Therefore, an inexpensive or low-performance processor does not satisfy the required processing capability for both video and audio.

【０００８】さらに、ディジタル（衛星）放送用チュー
ナー（ＳＴＢ（Set Top Box）と呼ばれる）やＤＶＤ（D
igital Versatile/Video Disc）再生装置などに用いら
れるＡＶデコーダ中に上記映像音声処理装置が用いられ
る場合には、放送波から受信されたあるいはディスクか
ら読み出されたＭＰＥＧストリームを入力し、そのＭＰ
ＥＧストリームをデコードし、最終的にディスプレイ、
スピーカなどへ映像信号出力及び音声信号出力をするま
でに必要とされる一連の処理量は膨大なものとなる。最
近では、このような一連の膨大な処理を効率良く実行す
る映像音声処理装置（演算装置）に対する要求が高まっ
ている。Further, a tuner for digital (satellite) broadcasting (called STB (Set Top Box)) or a DVD (D
In the case where the video / audio processing apparatus is used in an AV decoder used in an apparatus such as a digital versatile / video disc (playback) apparatus, an MPEG stream received from a broadcast wave or read from a disc is input and its MP
Decode the EG stream and finally display,
A series of processing amounts required until outputting a video signal and an audio signal to a speaker or the like becomes enormous. Recently, there has been an increasing demand for a video / audio processing device (arithmetic device) that efficiently performs such a series of enormous processes.

【０００９】本発明は、圧縮画像及び圧縮音声データを
表すストリームデータの入力、デコード、出力という一
連の処理を行い、高い周波数で動作させなくても高い処
理能力を有し、製造コストを低減させることができる演
算装置を提供することを目的とする。また本発明の他の
目的は、圧縮映像データのデコード、映像データのエン
コード、グラフィックス処理を低コストで実現する演算
装置を提供することにある。The present invention performs a series of processing of inputting, decoding, and outputting stream data representing compressed image and compressed audio data, has a high processing capability without operating at a high frequency, and reduces manufacturing costs. It is an object of the present invention to provide an arithmetic device capable of performing the above. It is another object of the present invention to provide an arithmetic unit that realizes decoding of compressed video data, encoding of video data, and graphics processing at low cost.

【００１０】[0010]

【課題を解決するための手段】上記の課題を解決するた
め本発明の映像音声処理装置は、圧縮音声データと圧縮
映像データとを含むデータストリームを外部から入力、
デコードし、デコードしたデータを出力装置に出力する
装置であって、外部要因により非同期に発生する入出力
処理を行う入出力処理手段と、前記入出力処理と並行し
て、メモリに格納されたデータストリームのデコードを
主とするデコード処理を行うデコード処理手段とを備
え、前記デコード処理手段によりデコードされた映像デ
ータ、デコードされた音声データはメモリに格納され、
前記入出力処理は、外部から非同期に入力される前記デ
ータストリームを入力し、さらにメモリに格納すること
と、メモリに格納されたデータストリームをデコード処
理手段に供給することと、外部の表示装置、音声出力装
置それぞれの出力レートに合わせてメモリから読み出
し、それらに出力することとを入出力処理として行うよ
うに構成されている。According to the present invention, there is provided a video / audio processing apparatus for inputting a data stream including compressed audio data and compressed video data from the outside.
An input / output processing means for performing input / output processing which is asynchronously generated due to an external factor, and data stored in a memory in parallel with the input / output processing. Decoding processing means for performing decoding processing mainly for decoding a stream, wherein the video data decoded by the decoding processing means and the decoded audio data are stored in a memory;
The input / output processing is to input the data stream asynchronously input from the outside, further store the data in the memory, supply the data stream stored in the memory to the decoding processing unit, and an external display device; Reading from a memory in accordance with the output rate of each audio output device and outputting to them are performed as input / output processing.

【００１１】この構成によれば、入出力処理手段とデコ
ード処理手段とがパイプライン的に並列動作することに
加えて、非同期処理とデコード処理とを入出力処理手段
とデコード処理手段とに分担させるので、デコード処理
手段は非同期に発生する処理から解放されてデコード処
理に専従することができる。その結果、本映像音声処理
装置は、ストリームデータ入力、デコード、出力という
一連の処理を効率良く実行するので、ストリームデータ
のフルデコード（フレーム落ちなし）を高速な動作クロ
ックを用いなくても可能にしている。According to this configuration, in addition to the input / output processing means and the decoding processing means operating in parallel in a pipeline, the asynchronous processing and the decoding processing are shared between the input / output processing means and the decoding processing means. Therefore, the decoding processing means can be released from the processing that occurs asynchronously and can exclusively use the decoding processing. As a result, the video and audio processing apparatus efficiently executes a series of processing of stream data input, decode, and output, thereby enabling full decoding of stream data (without dropping frames) without using a high-speed operation clock. ing.

【００１２】[0012]

【発明の実施の形態】本発明の映像音声処理装置につい
て、その実施の形態を次のように項分けして記載する。 1 第１の実施形態 1.1 映像音声処理装置の概略構成 1.1.1 入出力処理部 1.1.2 デコード処理部 1.1.2.1 逐次処理部 1.1.2.2 定型処理部 1.2 映像音声処理装置の構成 1.2.1 入出力処理部の構成 1.2.2 デコード処理部 1.2.2.1 逐次処理部 1.2.2.2 定型処理部 1.3 各部の詳細構成 1.3.1 プロセッサ７（逐次処理部） 1.3.2 定型処理部 1.3.2.1 コード変換部 1.3.2.2 画素演算部 1.3.2.3 画素読み書き部 1.3.3 入出力処理部 1.3.3.1 ＩＯプロセッサ 1.3.3.1.1 命令読出回路 1.3.3.1.2 タスク管理部 1.4 動作説明 2 第２の実施形態 2.1 映像音声処理装置の構成 2.1.1 画素演算部＜1. 第１の実施形態＞本実施形態における映像音声処
理装置は、衛星放送受信装置（ＳＴＢ:Set TopBoxと呼
ばれる）、ＤＶＤ(Digital Versatile Disc)再生装置、
ＤＶＤ−ＲＡＭ記録再生装置などに備えられ、圧縮映像
／音声データとして衛星放送から又はＤＶＤからのＭＰ
ＥＧストリームを入力し、伸長処理（以下単にデコード
と呼ぶ）を行って、映像信号及び音声信号を外部の出力
装置に出力する。＜1.1 映像音声処理装置の概略構成＞図３は、本発明の
第１の実施形態における映像音声処理装置の概略構成を
示すブロック図である。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Embodiments of a video and audio processing apparatus according to the present invention will be described in the following sections. 1 First embodiment 1.1 Schematic configuration of video / audio processing device 1.1.1 Input / output processing unit 1.1.2 Decoding processing unit 1.1.2.1 Sequential processing unit 1.1.2.2 Standard processing unit 1.2 Configuration of video / audio processing device 1.2.1 Input Configuration of output processing unit 1.2.2 Decode processing unit 1.2.2.1 Sequential processing unit 1.2.2.2 Routine processing unit 1.3 Detailed configuration of each unit 1.3.1 Processor 7 (sequential processing unit) 1.3.2 Routine processing unit 1.3.2.1 Code conversion unit 1.3.2.2 Pixel operation unit 1.3.2.3 Pixel read / write unit 1.3.3 Input / output processing unit 1.3.3.1 IO processor 1.3.3.1.1 Instruction readout circuit 1.3.3.1.2 Task management unit 1.4 Operation description 2 Second embodiment 2.1 Configuration of Video / Audio Processing Device 2.1.1 Pixel Calculation Unit <1. First Embodiment> The video / audio processing device in the present embodiment is a satellite broadcast receiving device (called STB: Set TopBox), DVD (Digital Versatile Disc). Playback device,
It is provided in a DVD-RAM recording / reproducing device, etc., and receives compressed video / audio data from satellite broadcasts or from a DVD.
The EG stream is input, decompressed (hereinafter, simply referred to as decoding), and a video signal and an audio signal are output to an external output device. <1.1 Schematic Configuration of Video / Audio Processing Device> FIG. 3 is a block diagram showing a schematic configuration of a video / audio processing device according to the first embodiment of the present invention.

【００１３】映像音声処理装置１０００は、入出力処理
部１００１、デコード処理部１００２、メモリコントロ
ーラ６を備え、入出力処理とデコード処理とを分離して
並行して行うように構成されている。また、外部メモリ
３は、ＭＰＥＧストリームやデコード後の音声データを
一時的に記憶する作業用メモリ、デコード後の映像デー
タを記憶するフレームメモリとして利用される。＜1.1.1 入出力処理部＞入出力処理部１００１は、映像
音声処理装置１０００の内部動作とは非同期に発生する
入出力処理を行う。この入出力処理は、（ａ）外部から
非同期に入力されるＭＰＥＧストリームを入力して外部
メモリ３に一時的に格納すること、（ｂ）外部メモリ３
に格納されたＭＰＥＧストリームをデコード処理部１０
０２に供給すること、（ｃ）デコードされた映像デー
タ、音声データを外部メモリ３から読み出し、外部の表
示装置、音声出力装置（図外）それぞれの出力レートに
合わせて出力することを内容とする。＜1.1.2 デコード処理部＞デコード処理部１００２は、
入出力処理部１００１の動作とは独立に並行して、入出
力処理部１００１によって供給されるＭＰＥＧストリー
ムのデコードし、デコード後の映像データ及び音声デー
タを外部メモリ３に格納する。ＭＰＥＧストリームのデ
コード処理は演算量が多く処理内容も多岐にわたるた
め、デコード処理部１００２は、逐次処理部１００３、
定型処理部１００４とを備え、多岐に亘る条件判断を主
とする逐次処理と、定型的な大量の演算を主としかつ並
列演算に適した定型処理とを分離して並行して実行する
ように構成されている。ここで、逐次処理は、ＭＰＥＧ
ストリームのヘッダ解析などであり、ヘッダの検出及び
ヘッダ内容の判定等の多数の条件判断含む。また定型処
理は、所定数の画素からなるブロック単位に各種演算を
施す必要があるので、パイプライン的な並列処理に適し
ていて、かつ、異なるデータ（画素）に対して全く同じ
演算を施すというベクトル演算のような並列処理に適し
ている。＜1.1.2.1 逐次処理部＞逐次処理部１００３は、入出
力処理部１００１から供給される圧縮音声データ及び圧
縮映像データのヘッダ解析と、定型処理部１００４をマ
クロブロック毎に起動する制御と、圧縮音声データのデ
コード処理とを上記逐次処理として行う。ヘッダ解析
は、ＭＰＥＧストリームにおけるマクロブロックヘッダ
の解析と、動きベクトルの復号を含む。ここでブロック
とは、８＊８画素からなる画像を表す。マクロブロック
とは、４つの輝度ブロックと２つの色差ブロックからな
る。動きベクトルとは、参照フレーム中の８＊８画素の
矩形領域を指すベクトルであり、当該ブロックが参照フ
レーム中のどの矩形領域との差分がとられたかを指し示
す。＜1.1.2.2 定型処理部＞定型処理部１００４は、逐次
処理部１００３からマクロブロック毎にデコードの起動
指示を受けて逐次処理部１００３の音声デコード処理と
並行して、マクロブロックのデコード処理を上記定型処
理として行う。このデコード処理は、可変長符号の復号
（ＶＬＤ:Variable Length code Decoding）、逆量子化
（ＩＱ：Inverse Quantization）、逆離散余弦変換（Ｉ
ＤＣＴ:Inverse Discrete Cosine Transform）、動き補
償（ＭＣ:Motion Compensation）を同順に施すことを内
容とする。定型処理部１００４は、動き補償において、
復号後のブロックをフレームメモリとしての外部メモリ
３にメモリコントローラ６を介して格納する。＜1.2 映像音声処理装置の構成＞図４は、映像音声処
理装置１０００のより詳細な構成を示すブロック図であ
る。＜1.2.1 入出力処理部の構成＞同図において入出力処
理部１００１は、ストリーム入力部１、バッファメモリ
２、入出力プロセッサ５（以下ＩＯプロセッサ５と略
す）、ＤＭＡＣ（Direct Memory Access Controller）
５ａ、ビデオ出力部１２、音声出力部１３、ホストI/F
部１４とを備える。The video / audio processing apparatus 1000 includes an input / output processing unit 1001, a decoding processing unit 1002, and a memory controller 6, and is configured to perform input / output processing and decoding processing separately and in parallel. The external memory 3 is used as a working memory for temporarily storing an MPEG stream and audio data after decoding, and a frame memory for storing video data after decoding. <1.1.1 Input / Output Processing Unit> The input / output processing unit 1001 performs input / output processing that occurs asynchronously with the internal operation of the video / audio processing apparatus 1000. This input / output processing includes (a) inputting an MPEG stream asynchronously input from the outside and temporarily storing the same in the external memory 3;
Processing unit 10 decodes the MPEG stream stored in
02, and (c) read the decoded video data and audio data from the external memory 3 and output them in accordance with the output rates of the external display device and audio output device (not shown). . <1.1.2 Decoding Processing Unit> The decoding processing unit 1002
In parallel with the operation of the input / output processing unit 1001, the MPEG stream supplied by the input / output processing unit 1001 is decoded, and the decoded video data and audio data are stored in the external memory 3. Since the decoding processing of the MPEG stream requires a large amount of computation and a wide variety of processing contents, the decoding processing unit 1002 includes a sequential processing unit 1003,
A routine processing unit 1004 is provided to separate serial processing mainly based on a wide variety of conditional judgments and routine processing mainly based on a large number of routine operations and suitable for parallel operations so as to be executed in parallel. It is configured. Here, the sequential processing is MPEG
This is a header analysis of a stream, and includes a number of condition determinations such as header detection and header content determination. In addition, since the routine processing needs to perform various operations on a block unit including a predetermined number of pixels, it is suitable for parallel processing in a pipeline manner, and performs exactly the same operation on different data (pixels). Suitable for parallel processing such as vector operation. <1.1.2.1 Sequential Processing Unit> The sequential processing unit 1003 performs header analysis of the compressed audio data and compressed video data supplied from the input / output processing unit 1001, control to activate the standard processing unit 1004 for each macroblock, and compression. The decoding of the audio data is performed as the above-described sequential processing. The header analysis includes analysis of a macroblock header in an MPEG stream and decoding of a motion vector. Here, the block represents an image composed of 8 * 8 pixels. The macro block is composed of four luminance blocks and two color difference blocks. The motion vector is a vector indicating a rectangular area of 8 * 8 pixels in the reference frame, and indicates a difference between the block and the rectangular area in the reference frame. <1.1.2.2 Routine Processing Unit> The routine processing unit 1004 receives the decoding start instruction for each macroblock from the sequential processing unit 1003, and performs the decoding processing of the macroblock in parallel with the audio decoding processing of the sequential processing unit 1003. Perform as routine processing. This decoding processing includes decoding of a variable length code (VLD: Variable Length code Decoding), inverse quantization (IQ: Inverse Quantization), and inverse discrete cosine transform (I
The content is to perform DCT (Inverse Discrete Cosine Transform) and motion compensation (MC: Motion Compensation) in the same order. In the motion compensation, the routine processing unit 1004
The decoded block is stored in the external memory 3 as a frame memory via the memory controller 6. <1.2 Configuration of Video / Audio Processing Device> FIG. 4 is a block diagram showing a more detailed configuration of the video / audio processing device 1000. <1.2.1 Configuration of Input / Output Processing Unit> In the figure, an input / output processing unit 1001 includes a stream input unit 1, a buffer memory 2, an input / output processor 5 (hereinafter abbreviated as an IO processor 5), a DMAC (Direct Memory Access Controller).
5a, video output unit 12, audio output unit 13, host I / F
And a unit 14.

【００１４】ストリーム入力部１は、外部からシリアル
に入力されるＭＰＥＧデータストリームをパラレルデー
タ（以降、ＭＰＥＧデータと呼ぶ）に変換する。その
際、ストリーム入力部１は、ＭＰＥＧデータストリーム
からＧＯＰ(Group Of Picture:Ｉピクチャを１つ含み、
約０．５秒分の動画に相当するＭＰＥＧデータストリー
ム）のスタートコードを検出し、その旨をＩＯプロセッ
サ５に通知する。この通知により変換後のＭＰＥＧデー
タは、ＩＯプロセッサ５の制御によりバッファメモリ２
に転送される。The stream input unit 1 converts an externally input MPEG data stream into parallel data (hereinafter referred to as MPEG data). At this time, the stream input unit 1 includes one GOP (Group Of Picture: I picture) from the MPEG data stream,
The start code of the MPEG data stream corresponding to a moving image for about 0.5 seconds) is detected, and the IO processor 5 is notified of the start code. The MPEG data converted by this notification is stored in the buffer memory 2 under the control of the IO processor 5.
Is forwarded to

【００１５】バッファメモリ２は、ストリーム入力部１
から転送されたＭＰＥＧデータを一時的に保持する緩衝
用メモリである。バッファメモリ２に保持されたＭＰＥ
Ｇデータは、さらに入出力プロセッサ５の制御の下でメ
モリコントローラ６を介して外部メモリ３に転送され
る。外部メモリ３は、ＳＤＲＡＭ（Synchronous Dynami
c Random Access Memory）チップにより構成され、バッ
ファメモリ２からメモリコントローラ６を介して転送さ
れたＭＰＥＧデータを一時的に保持する。さらに、外部
メモリ３は復号後の映像データ（以降、フレームデータ
とも呼ぶ）および復号後の音声データも保持する。The buffer memory 2 includes a stream input unit 1
Is a buffer memory for temporarily holding the MPEG data transferred from. MPE held in buffer memory 2
The G data is further transferred to the external memory 3 via the memory controller 6 under the control of the input / output processor 5. The external memory 3 is an SDRAM (Synchronous Dynami
c Random Access Memory) chip, and temporarily holds the MPEG data transferred from the buffer memory 2 via the memory controller 6. Further, the external memory 3 holds decoded video data (hereinafter also referred to as frame data) and decoded audio data.

【００１６】入出力プロセッサ５は、ストリーム入力部
１、バッファメモリ２、外部メモリ３（メモリコントロ
ーラ６が介在する）、ＦＩＦＯメモリ４の間のデータ入
出力を制御する。すなわち以下の(1)〜(4)に示す経路の
データ転送（ＤＭＡ転送）を制御する。 (1)ストリーム入力部１→バッファメモリ２→メモリコントローラ
６→外部メモリ３ (2)外部メモリ３→メモリコントローラ６→ＦＩＦＯメモリ４ (3)外部メモリ３→メモリコントローラ６→バッファメモリ２→ビ
デオ出力部１２ (4)外部メモリ３→メモリコントローラ６→バッファメモリ２→音
声出力部１３これらの径路では入出力プロセッサ５は、ＭＰＥＧデー
タ中の映像データと音声データとを独立にそれぞれの転
送を制御する。また、(1)、(2)は復号前のＭＰＥＧデー
タの転送経路である。(1)、(2)の転送経路において入出
力プロセッサ５は、圧縮映像データと圧縮音声データと
を別個に転送する。(3)、(4)はそれぞれ、復号後の映
像、音声データの転送経路である。復号後の映像、音声
データは、外部の表示装置（図外）、音声出力装置（図
外）それぞれの出力レートに合わせて転送される。The input / output processor 5 controls data input / output between the stream input unit 1, the buffer memory 2, the external memory 3 (with the memory controller 6 interposed), and the FIFO memory 4. That is, it controls the data transfer (DMA transfer) of the paths shown in the following (1) to (4). (1) Stream input unit 1 → buffer memory 2 → memory controller 6 → external memory 3 (2) external memory 3 → memory controller 6 → FIFO memory 4 (3) external memory 3 → memory controller 6 → buffer memory 2 → video output Unit 12 (4) External memory 3 → Memory controller 6 → Buffer memory 2 → Audio output unit 13 In these routes, the input / output processor 5 controls the transfer of video data and audio data in MPEG data independently. . (1) and (2) are transfer paths of MPEG data before decoding. In the transfer paths (1) and (2), the input / output processor 5 transfers the compressed video data and the compressed audio data separately. (3) and (4) are transfer paths for decoded video and audio data, respectively. The decoded video and audio data are transferred in accordance with the output rates of the external display device (not shown) and the audio output device (not shown).

【００１７】ＤＭＡＣ５ａは、ストリーム入力部１、ビ
デオ出力部１２、音声出力部１３とバッファメモリ２と
の間のＤＭＡ転送、バッファメモリ２と外部メモリ３と
の間のＤＭＡ転送、外部メモリ３とＦＩＦＯメモリ４の
間のＤＭＡ転送をＩＯプロセッサ５の制御に従って実行
する。ビデオ出力部１２は、外部の表示装置（ＣＲＴ
等）の出力レート（たとえば水平同期信号Ｈsyncの周
期）に合せて入出力プロセッサ５にデータ要求を出し、
入出力プロセッサ５により上記(3)の転送経路により入
力される映像データをその表示装置に出力する。The DMAC 5a includes a DMA transfer between the stream input unit 1, the video output unit 12, the audio output unit 13 and the buffer memory 2, a DMA transfer between the buffer memory 2 and the external memory 3, and a DMA transfer between the external memory 3 and the FIFO. The DMA transfer between the memories 4 is executed under the control of the IO processor 5. The video output unit 12 is connected to an external display device (CRT)
Etc.) in accordance with the output rate (for example, the period of the horizontal synchronization signal Hsync), and issues a data request to the input / output processor 5.
The video data input by the input / output processor 5 through the transfer path (3) is output to the display device.

【００１８】音声出力部１３は、外部の音声出力装置の
出力レートに合せて入出力プロセッサ５にデータ要求を
出し、入出力プロセッサ５により上記(4)の転送経路に
より入力される音声データを音声出力装置（Ｄ／Ａコン
バータ、音声アンプ、スピーカの組み合わせ等）に出力
する。ホストI/F部１４は、外部のホストプロセッサ、
たとえばＤＶＤ再生装置の場合にはその制御全般を行う
プロセッサとの通信を行うためのインターフェースであ
る。この通信では、ホストプロセッサからＭＰＥＧスト
リームのデコード開始、停止、早送り再生、逆再生等の
指示などが送られる。＜1.2.2 デコード処理部＞図４のデコード処理部１０
０２は、ＦＩＦＯメモリ４、逐次処理部１００３、定型
処理部１００４と備え、入出力処理部１００１からＦＩ
ＦＯメモリ４を介して供給されるＭＰＥＧデータのデコ
ード処理を行う。また、逐次処理部１００３は、プロセ
ッサ７と内部メモリ８とを備える。定型処理部１００４
は、コード変換部９、画素演算部１０、画素読み書き部
１１、バッファ２００、バッファ２０１を備える。The audio output unit 13 issues a data request to the input / output processor 5 in accordance with the output rate of the external audio output device, and converts the audio data input by the input / output processor 5 through the transfer path (4) into audio data. Output to an output device (D / A converter, audio amplifier, combination of speakers, etc.). The host I / F unit 14 includes an external host processor,
For example, in the case of a DVD playback device, this is an interface for performing communication with a processor that performs overall control. In this communication, instructions such as decoding start, stop, fast-forward playback, and reverse playback of the MPEG stream are sent from the host processor. <1.2.2 Decoding unit> Decoding unit 10 in FIG.
02 includes a FIFO memory 4, a sequential processing unit 1003, and a standard processing unit 1004.
The decoding processing of the MPEG data supplied via the FO memory 4 is performed. Further, the sequential processing unit 1003 includes a processor 7 and an internal memory 8. Standard processing unit 1004
Includes a code conversion unit 9, a pixel operation unit 10, a pixel read / write unit 11, a buffer 200, and a buffer 201.

【００１９】ＦＩＦＯメモリ４は、２つのＦＩＦＯ（以
下映像ＦＩＦＯ、音声ＦＩＦＯと呼ぶ）からなり、入出
力プロセッサ５の制御の下で外部メモリ３から転送され
た圧縮映像データ、圧縮音声データをそれぞれ先入れ先
出し式に記憶する。＜1.2.2.1 逐次処理部＞プロセッサ７は、ＦＩＦＯメ
モリ４の圧縮映像データ及び圧縮音声データの読み出し
を制御するとともに、圧縮映像データに対する一部のデ
コード処理と、圧縮音声データに対する全デコード処理
とを行う。圧縮映像データの一部のデコード処理とは、
ＭＰＥＧデータ中のヘッダ情報の解析と動きベクトルの
計算と圧縮映像デコード処理の制御とを含む。これは、
圧縮映像データの全デコード処理を、プロセッサ７と、
定型処理部１００４とで分担して行うためである。つま
りプロセッサ７は多岐にわたる条件判断を必要とする逐
次処理を分担し、定型処理部１００４は、大量の定型的
な演算処理を分担する。これに対し音声デコードは、映
像デコードに比べて演算量が少ないのでプロセッサ７が
全部を担当している。The FIFO memory 4 is composed of two FIFOs (hereinafter referred to as a video FIFO and an audio FIFO). Remember in the formula. <1.2.2.1 Sequential Processing Unit> The processor 7 controls the reading of the compressed video data and the compressed audio data from the FIFO memory 4 and performs a partial decoding process on the compressed video data and a full decoding process on the compressed audio data. Do. Decoding of part of compressed video data
It includes analysis of header information in MPEG data, calculation of motion vectors, and control of compressed video decoding processing. this is,
The processor 7 performs all decoding processing of the compressed video data,
This is because the sharing is performed with the standard processing unit 1004. That is, the processor 7 shares the sequential processing requiring various condition judgments, and the routine processing unit 1004 shares a large amount of routine arithmetic processing. On the other hand, the processor 7 is in charge of the audio decoding since the amount of calculation is smaller than that of the video decoding.

【００２０】プロセッサ７の機能を図５を用いて具体的
に説明する。図５はＭＰＥＧストリームを階層的に示と
ともに映像音声処理装置各部の動作タイミングを示して
いる。同図において横軸は時間軸である。第１階層はＭ
ＰＥＧストリームの流れを示す。第２階層のように１秒
間のＭＰＥＧストリームは、複数のフレーム（Ｉ、Ｐ、
Ｂピクチャ）を含む。第３階層のように１フレームは、
ピクチャヘッダと複数のスライスを含む。第４階層のよ
うに１スライスは、スライスヘッダと複数のマクロブロ
ックを含む。第５階層のように１マクロブロックは、マ
クロブロックヘッダと６つのブロックを含む。The function of the processor 7 will be specifically described with reference to FIG. FIG. 5 shows the operation timing of each section of the video / audio processing apparatus together with the MPEG stream in a hierarchical manner. In the figure, the horizontal axis is the time axis. The first level is M
5 shows the flow of a PEG stream. An MPEG stream for one second as in the second layer includes a plurality of frames (I, P,
B picture). As in the third layer, one frame is
Includes a picture header and multiple slices. As in the fourth layer, one slice includes a slice header and a plurality of macroblocks. As in the fifth layer, one macroblock includes a macroblock header and six blocks.

【００２１】同図に示す第１〜第５階層のデータ構成
は、公知文献、例えば株式会社アスキー「ポイント図解
式最新ＭＰＥＧ教科書」に詳しく説明されている。プロ
セッサ７は、同図の第５階層以下に示すように、ＭＰＥ
Ｇストリーム中のマクロブロック層までのヘッダ解析と
圧縮音声データの復号とを行う。その際、プロセッサ７
は、マクロブロック単位のヘッダ解析結果に従って、コ
ード変換部９、画素演算部１０及び画素読み書き部１１
に対してマクロブロックのデコードを開始を指示し、コ
ード変換部９、画素演算部１０及び画素読み書き部１１
によってマクロブロックのデコードがなされている間、
ＦＩＦＯメモリ４から圧縮音声データの読み出してデコ
ードする。コード変換部９、画素演算部１０及び画素読
み書き部１１によりマクロブロックのデコードが終了し
たと、プロセッサ７は、割込み信号によりその旨の通知
を受け、圧縮音声データのデコードを中断して、次のマ
クロブロックのヘッダ解析を開始する。The data structure of the first to fifth layers shown in FIG. 1 is described in detail in a well-known document, for example, ASCII Corporation "Point Illustrated Latest MPEG Textbook". As shown in the fifth and lower layers of FIG.
The header analysis up to the macroblock layer in the G stream and the decoding of the compressed audio data are performed. At that time, the processor 7
Is a code conversion unit 9, a pixel operation unit 10, and a pixel read / write unit 11 according to a header analysis result in macroblock units.
To start decoding the macroblock, and the code conversion unit 9, the pixel operation unit 10, and the pixel read / write unit 11
While the macroblock is being decoded by
The compressed audio data is read from the FIFO memory 4 and decoded. When the decoding of the macroblock is completed by the code conversion unit 9, the pixel operation unit 10, and the pixel read / write unit 11, the processor 7 receives a notification to that effect by an interrupt signal, suspends the decoding of the compressed audio data, and Start parsing the macroblock header.

【００２２】内部メモリ８は、プロセッサ７のワークメ
モリであり、復号された音声データを一時的に保持す
る。保持された音声データは、入出力プロセッサ５によ
り上記(4)の経路で外部メモリ３に転送される。＜1.2.2.2 定型処理部＞コード変換部９は、ＦＩＦＯ
メモリ４から読み出された圧縮映像データを可変長復号
（ＶＬＤ）する。図５に示すように、コード変換部９
は、復号後のデータのうち、ヘッダ情報及び動きベクト
ルに関する情報（図中の破線区間）をプロセッサ７に転
送し、マクロブロック（輝度ブロックＹ０〜Ｙ３と色差
ブロックＣｂ、Ｃｒとからなる６ブロック）のデータ
（図中の実線区間）をバッファ２００を介して画素演算
部１０に転送する。コード変換部９による復号後のマク
ロブロックのデータは空間周波数成分を表すデータであ
る。The internal memory 8 is a work memory of the processor 7, and temporarily stores the decoded audio data. The held audio data is transferred to the external memory 3 by the input / output processor 5 through the path (4). <1.2.2.2 Routine processing unit> The code conversion unit 9 is a FIFO
The compressed video data read from the memory 4 is subjected to variable length decoding (VLD). As shown in FIG.
Transfers the header information and the information on the motion vector (broken line section in the figure) of the decoded data to the processor 7, and the macro block (six blocks including the luminance blocks Y0 to Y3 and the color difference blocks Cb and Cr) (The solid line section in the figure) is transferred to the pixel operation unit 10 via the buffer 200. The macroblock data decoded by the code conversion unit 9 is data representing a spatial frequency component.

【００２３】バッファ２００は、コード変換部９により
書き込まれる１ブロック（８×８画素分）分の空間周波
数成分を表すデータを保持する。画素演算部１０は、コ
ード変換部９からバッファ２００を介して転送されたブ
ロックデータに対して、逆量子化処理（ＩＱ）及び逆離
散余弦変換（ＩＤＣＴ）をブロック単位に行う。画素演
算部１０による処理結果は、輝度ブロックであれば画素
の輝度値又はその差分を表すデータであり、色差ブロッ
クであれば画素の色差又はその差分を表すデータであ
り、バッファ２０１を介して画素読み書き部１１に転送
される。The buffer 200 holds data representing the spatial frequency components for one block (8 × 8 pixels) written by the code converter 9. The pixel operation unit 10 performs an inverse quantization process (IQ) and an inverse discrete cosine transform (IDCT) on the block data transferred from the code conversion unit 9 via the buffer 200 in block units. The processing result by the pixel operation unit 10 is data representing the luminance value of a pixel or its difference in the case of a luminance block, and data representing the color difference or its difference in the case of a chrominance block. The data is transferred to the read / write unit 11.

【００２４】バッファ２０１は、１ブロック（８×８画
素分）分の画素データを保持する。画素読み書き部１１
は、画素演算部１０の処理結果に対して、ブロック単位
に動き補償を行う。すなわち、Ｐピクチャ、Ｂピクチャ
については、外部メモリ３内の復号済みの参照フレーム
から動きベクトルが示す矩形領域をメモリコントローラ
６を介して切り出して、画素演算部１０の処理結果のブ
ロックと合成することにより、元のブロック画像に復号
する。画素読み書き部１１による復号結果は、メモリコ
ントローラ６を介して外部メモリ３に格納される。The buffer 201 holds one block (8 × 8 pixels) of pixel data. Pixel read / write unit 11
Performs motion compensation on the processing result of the pixel operation unit 10 in block units. That is, for the P picture and the B picture, a rectangular area indicated by the motion vector is cut out from the decoded reference frame in the external memory 3 via the memory controller 6 and synthesized with the block of the processing result of the pixel operation unit 10. To decode the original block image. The decoding result by the pixel read / write unit 11 is stored in the external memory 3 via the memory controller 6.

【００２５】上記の動き補償、ＩＱ、ＩＤＣＴの各内容
については公知技術なので詳しい説明は省略する（上記
文献参照）。＜1.3 各部の詳細構成＞次に、映像音声処理装置１０
００の主要な各部の詳細な構成について説明する。＜1.3.1 プロセッサ７（逐次処理部）＞図６は、プロ
セッサ７によるマクロブロックヘッダの解析と、他の各
部への制御内容とを示す図である。まず同図に略語で示
してあるマクロブロックヘッダ中の各データは上記文献
等に説明されているのでここでは説明を省略する。Since the contents of the motion compensation, IQ, and IDCT are well-known techniques, detailed descriptions thereof are omitted (see the above-mentioned document). <1.3 Detailed Configuration of Each Unit> Next, the video / audio processing device 10
The detailed configuration of each main part of 00 will be described. <1.3.1 Processor 7 (Sequential Processing Unit)> FIG. 6 is a diagram showing the analysis of the macroblock header by the processor 7 and the contents of control to other units. First, the respective data in the macroblock header indicated by the abbreviations in the figure have been described in the above-mentioned documents and the like, and thus description thereof is omitted here.

【００２６】同図のようにプロセッサ７は、コード変換
部９にコマンドを発行して可変長復号されたヘッダ部分
のデータを逐次取得し、その内容に従ってコード変換部
９、画素演算部１０、画素読み書き部１１に対してマク
ロブロックのデコードに必要なデータを設定する。具体
的には、まずプロセッサ７は、コード変換部９にＭＢＡ
Ｉ（Macro BlockAddress Increment）を取得するための
コマンドを発行して（Ｓ１０１）、コード変換部９から
ＭＢＡＩを取得する。このＭＢＡＩに基づき当該マクロ
ブロックデータがスキップマクロブロックであれば（今
デコードしようとしているマクロブロックが前回と同じ
であれば）、マクロブロックデータが省略されているの
でＳ１１７に進み、スキップマクロブロックでなければ
ヘッダ解析を続ける（Ｓ１０２、１０３）。As shown in the figure, the processor 7 issues a command to the code converter 9 and sequentially obtains the data of the header portion subjected to the variable length decoding, and according to the contents thereof, the code converter 9, the pixel operation unit 10, the pixel Data necessary for decoding a macroblock is set in the read / write unit 11. Specifically, the processor 7 first sends the code conversion unit 9 the MBA
A command for acquiring I (Macro Block Address Increment) is issued (S101), and the MBAI is acquired from the code conversion unit 9. If the macroblock data is a skip macroblock based on this MBAI (if the macroblock to be decoded is the same as the previous macroblock), the macroblock data has been omitted, so the process proceeds to S117 and must be a skip macroblock. If so, the header analysis is continued (S102, 103).

【００２７】次いで、プロセッサ７はＭＢＴ（Macro Bl
ock Type）を取得するためのコマンドを発行して、コー
ド変換部９からＭＢＴを取得する。このＭＢＴからブロ
ックのスキャンタイプがジグザグスキャンかオールタネ
ートスキャンかを判断し、画素演算部１０にバッファ２
００の読み出し順序を指示する（Ｓ１０４）。さらに、
プロセッサ７は既に取得したヘッダデータからＳＴＷＣ
（Spartial Temporal Weight Code）が存在するか否か
を判定し（Ｓ１０５）、存在する場合にはコマンドを発
行して取得する（Ｓ１０６）。Next, the processor 7 sets the MBT (Macro Bl
A command for acquiring the lock type is issued to acquire the MBT from the code converter 9. From the MBT, it is determined whether the scan type of the block is a zigzag scan or an alternate scan.
00 is designated (S104). further,
The processor 7 performs the STWC from the already obtained header data.
It is determined whether (Spartial Temporal Weight Code) exists (S105), and if it exists, a command is issued and acquired (S106).

【００２８】同様にしてプロセッサ７は、ＦｒＭＴ（Fr
ame Motion Type）、ＦｉＭＴ（Field Motion Type）、
ＤＴ（DCT type）、ＱＳＣ（Quantizer Scale Code）、
ＭＶ（Motion Vector）、ＣＢＰ（Coded Block Patter
n）を取得する（Ｓ１０７〜１１６）。その際、プロセ
ッサ７は、ＦｒＭＴ、ＦｉＭＴ、ＤＴの解析結果を画素
読み書き部１１に通知し、ＱＳＣの解析結果を画素演算
部１０に通知し、ＣＢＰの解析結果をコード変換部９に
通知する。これによりＩＱ、ＩＤＣＴ、動き補償に必要
が情報が、コード変換部９、画素演算部１０、画素読み
書き部１１に設定される。Similarly, the processor 7 sets the FrMT (Fr
ame Motion Type), FiMT (Field Motion Type),
DT (DCT type), QSC (Quantizer Scale Code),
MV (Motion Vector), CBP (Coded Block Patter
n) is acquired (S107-116). At this time, the processor 7 notifies the pixel read / write unit 11 of the analysis results of FrMT, FiMT, and DT, notifies the pixel operation unit 10 of the analysis result of QSC, and notifies the code conversion unit 9 of the analysis result of CBP. As a result, information necessary for IQ, IDCT, and motion compensation is set in the code conversion unit 9, the pixel operation unit 10, and the pixel read / write unit 11.

【００２９】また２プロセッサ構成では、多岐にわたる
条件判断を必要とする上記の逐次処理を各プロセッサが
個別に行うため冗長な構成になっていた。次いで、プロ
セッサ７はコード変換部９に対してマクロブロックのデ
コード開始指示を発行する（Ｓ１１７）。これによりコ
ード変換部９は、マクロブロック内の各ブロックについ
てＶＬＤを開始し、ＶＬＤの結果をバッファ２００を介
して画素演算部１０に出力する。さらにプロセッサ７
は、ＭＶデータに基づいて動きベクトルを計算し（Ｓ１
１８）、その計算結果を画素読み書き部１１に通知する
（Ｓ１１９）。The two-processor configuration has a redundant configuration because each processor individually performs the above-described sequential processing requiring a variety of condition judgments. Next, the processor 7 issues a macroblock decoding start instruction to the code converter 9 (S117). Thus, the code conversion unit 9 starts VLD for each block in the macroblock, and outputs the result of VLD to the pixel operation unit 10 via the buffer 200. Processor 7
Calculates a motion vector based on the MV data (S1
18), and notifies the calculation result to the pixel read / write unit 11 (S119).

【００３０】上記処理において、動きベクトルに関して
は、動きベクトルのデータ（ＭＶ）取得（Ｓ１１３）
し、動きベクトルの計算（Ｓ１１８）し、動きベクトル
を画素読み書き部１１に設定する（Ｓ１１９）という一
連の処理が必要である。この点、プロセッサ７は、動き
ベクトルデータ（ＭＶ）を取得（Ｓ１１３）した直後に
動きベクトルの計算及び設定（Ｓ１１８、１１９）しな
いで、定型処理部１００４へのデコード開始指示を発行
してから動きベクトルを計算及び設定を行うようにして
いる。これにより、プロセッサ７の動きベクトル計算お
よび設定処理と、定型処理部１００４へのデコード処理
とが並列に処理されるようになる。つまり定型処理部１
００４のデコード開始タイミングを早くしている。In the above processing, regarding the motion vector, the data (MV) of the motion vector is obtained (S113).
Then, a series of processes of calculating a motion vector (S118) and setting the motion vector in the pixel read / write unit 11 (S119) are required. At this point, the processor 7 does not calculate and set the motion vector (S118, 119) immediately after acquiring the motion vector data (MV) (S113, 119), Vectors are calculated and set. Thereby, the motion vector calculation and setting processing of the processor 7 and the decoding processing to the routine processing unit 1004 are processed in parallel. That is, the routine processing unit 1
The decoding start timing of 004 is advanced.

【００３１】以上のようにしてマクロブロック１つ分の
圧縮映像データのヘッダ解析が完了するので、プロセッ
サ７は、ＦＩＦＯメモリ４から圧縮音声データを取得し
て、音声デコード処理を開始する（Ｓ１２０）。音声デ
コード処理は、コード変換部９からマクロブロックのデ
コード完了を示す割り込み信号が入力されるまで続けら
れる。この割り込み信号によりプロセッサ７は次のマク
ロブロックに対して上記ヘッダ解析を開始する。＜1.3.2 定型処理部＞次に、定型処理部１００４は、
マクロブロック内の６つのブロックをコード変換部９、
画素演算部１０、画素読み書き部１１を並列に（パイプ
ライン的に）に動作させることによりデコード処理を行
っている。ここでは、画素演算部１０、画素読み書き部
１１、コード変換部９の順にそれらの構成をより詳細に
説明する。＜1.3.2.1 コード変換部９＞図１９は、コード変換部
９の構成を示すブロック図である。As described above, since the header analysis of the compressed video data for one macroblock is completed, the processor 7 acquires the compressed audio data from the FIFO memory 4 and starts the audio decoding process (S120). . The audio decoding process is continued until an interrupt signal indicating that macroblock decoding has been completed is input from the code converter 9. In response to this interrupt signal, the processor 7 starts the header analysis for the next macroblock. <1.3.2 Routine processing unit> Next, the routine processing unit 1004
The code conversion unit 9 converts the six blocks in the macro block into
The decoding process is performed by operating the pixel operation unit 10 and the pixel read / write unit 11 in parallel (in a pipeline). Here, the configurations of the pixel operation unit 10, the pixel read / write unit 11, and the code conversion unit 9 will be described in detail in this order. <1.3.2.1 Code Conversion Unit 9> FIG. 19 is a block diagram showing the configuration of the code conversion unit 9.

【００３２】同図のコード変換部９は、ＶＬＤ部９０
１、カウンタ９０２、インクリメンタ９０３、セレクタ
９０４、スキャンテーブル９０５、スキャンテーブル９
０６、フリップフロップ（以下ＦＦと略す）９０７、セ
レクタ９０８とを備え、可変長復号（ＶＬＤ）した結果
をブロック単位に、ジグザグスキャン又はオルタネート
スキャンの順に配列するようにバッファ２００に書き込
むよう構成されている。The code converter 9 shown in FIG.
1, counter 902, incrementer 903, selector 904, scan table 905, scan table 9
06, a flip-flop (hereinafter abbreviated as FF) 907, and a selector 908, and are configured to write the result of the variable length decoding (VLD) to the buffer 200 in a block unit so as to be arranged in the order of zigzag scan or alternate scan. I have.

【００３３】ＶＬＤ部９０１は、ＦＩＦＯメモリ４から
読み出された圧縮映像データを可変長復号（ＶＬＤ）
し、復号後のデータのうち、ヘッダ情報及び動きベクト
ルに関する情報（図５中の破線区間）をプロセッサ７に
転送し、マクロブロック（輝度ブロックＹ０〜Ｙ３と色
差ブロックＣｂ、Ｃｒとからなる６ブロック）のデータ
（図５中の実線区間）をブロック（６４個の空間周波数
データ）単位にバッファ２００に出力する。The VLD unit 901 performs variable length decoding (VLD) on the compressed video data read from the FIFO memory 4.
Then, of the decoded data, the header information and the information on the motion vector (broken line section in FIG. 5) are transferred to the processor 7, and the macro blocks (the six blocks including the luminance blocks Y0 to Y3 and the color difference blocks Cb and Cr) are transferred. ) Is output to the buffer 200 in block (64 spatial frequency data) units.

【００３４】カウンタ９０２、インクリメンタ９０３、
セレクタ９０４からなる回路部分は、ＶＬＤ部９０１か
らの空間周波数データの出力に同期して、０から６３ま
でを繰り返しカウントする。スキャンテーブル９０５
は、バッファ２００のブロック記憶領域のアドレスをジ
グザグスキャンの順に記憶しているテーブルであり、カ
ウンタ９０２の出力値（０〜６３）が順に入力され、順
次そのアドレスを出力する。図２０にバッファ２００中
の８×８個の空間周波数データを記憶するブロック記憶
領域と、ジグザグスキャンの順路を示す。スキャンテー
ブル９０５は、同図の順路における画素アドレスを順次
出力する。A counter 902, an incrementer 903,
The circuit portion including the selector 904 repeatedly counts from 0 to 63 in synchronization with the output of the spatial frequency data from the VLD unit 901. Scan table 905
Is a table storing the addresses of the block storage areas of the buffer 200 in the order of the zigzag scan. The output values (0 to 63) of the counter 902 are sequentially input, and the addresses are sequentially output. FIG. 20 shows a block storage area for storing 8 × 8 pieces of spatial frequency data in the buffer 200 and a zigzag scan route. The scan table 905 sequentially outputs the pixel addresses in the route shown in FIG.

【００３５】スキャンテーブル９０６は、バッファ２０
０のブロック記憶領域のアドレスをオルタネートスキャ
ンの順に記憶しているテーブルであり、カウンタ９０２
の出力値（０〜６３）が順に入力され、順次そのアドレ
スを出力する。図２１にバッファ２００中の８×８個の
空間周波数データを記憶するブロック記憶領域と、オル
タネートスキャンの順路を示す。スキャンテーブル９０
５は、同図の順路における画素アドレスを順次出力す
る。The scan table 906 is stored in the buffer 20
This is a table storing the addresses of the block storage areas of 0 in the order of the alternate scan.
Output values (0 to 63) are sequentially input, and the addresses are sequentially output. FIG. 21 shows a block storage area in the buffer 200 for storing 8 × 8 pieces of spatial frequency data and a route of the alternate scan. Scan table 90
Reference numeral 5 sequentially outputs the pixel addresses in the route shown in FIG.

【００３６】ＦＦ９０７は、スキャンタイプ（ジグザグ
スキャンかオルタネートスキャンか）を示すフラグを保
持する。このフラグは、プロセッサ７により設定され
る。セレクタ９０８は、ＦＦ９０７のフラグに応じてス
キャンテーブル９０５とスキャンテーブル９０６とから
出力されるアドレスを選択し、バッファ２００に書き込
みアドレスとして出力する。＜1.3.2.2 画素演算部＞図７は、画素演算部１０の構
成を示すブロック図である。The FF 907 holds a flag indicating a scan type (zigzag scan or alternate scan). This flag is set by the processor 7. The selector 908 selects an address output from the scan table 905 and the scan table 906 according to the flag of the FF 907 and outputs the address to the buffer 200 as a write address. <1.3.2.2 Pixel Operation Unit> FIG. 7 is a block diagram showing a configuration of the pixel operation unit 10.

【００３７】同図のように画素演算部１０は、乗算器５
０２と加減算器５０３からなる実行部５０１と、第１プ
ログラムカウンタ（以降、第１ＰＣと略す）５０４と、
第２プログラムカウンタ（以降、第２ＰＣと略す）５０
５と、第１命令メモリ５０６と、第２命令メモリ５０７
と、セレクタ５０８とを有し、ＩＱとＩＤＣＴの一部と
をオーバラップさせて並列に実行できるように構成され
ている。。As shown in the figure, the pixel operation unit 10 includes a multiplier 5
02, an execution unit 501 including an adder / subtractor 503, a first program counter (hereinafter abbreviated as a first PC) 504,
Second program counter (hereinafter abbreviated as second PC) 50
5, a first instruction memory 506, and a second instruction memory 507
And a selector 508 so that the IQ and a part of the IDCT can be overlapped and executed in parallel. .

【００３８】実行部５０１は、第１命令メモリ５０６、
第２命令メモリ５０７から順次出力されるマイクロ命令
に従って、バッファ２００、２０１のアクセス及び演算
を実行する。第１命令メモリ５０６、第２命令メモリ５
０７は、バッファ２００に保持されたブロック（周波数
成分）に対して、ＩＱ、ＩＤＣＴを実現するためのマイ
クロプログラムを記憶する制御記憶である。図８に、第
１命令メモリ５０６及び第２命令メモリ５０７に記憶さ
れたマイクロプログラムの一例を示す。The execution unit 501 includes a first instruction memory 506,
The access and operation of the buffers 200 and 201 are executed according to the micro-instructions sequentially output from the second instruction memory 507. First instruction memory 506, second instruction memory 5
Reference numeral 07 denotes a control storage for storing a microprogram for realizing IQ and IDCT for the block (frequency component) held in the buffer 200. FIG. 8 shows an example of the microprogram stored in the first instruction memory 506 and the second instruction memory 507.

【００３９】同図において、第１命令メモリ５０６はＩ
ＤＣＴ１Ａマイクロプログラムと、ＩＱマイクロプログ
ラムとを記憶し、第１ＰＣ５０４によって読み出しアド
レスが指定される。ＩＱマイクロプグラムは、バッファ
２００の読み出しと、乗算とを主体とする演算処理であ
り、加減算器５０３を用いない。第２命令メモリ５０７
はＩＤＣＴ１Ｂマイクロプログラムと、ＩＤＣＴ２マイ
クロプログラムとを記憶し、セレクタ５０８を介して第
１ＰＣ５０４又は第２ＰＣ５０５により読出アドレスが
指定される。ここで、ＩＤＣＴ１は、乗算及び加減算を
主とするＩＤＣＴの前半部分の処理を意味し、ＩＤＣＴ
１ＡマイクロプログラムとＩＤＣＴ１Ｂマイクロプログ
ラムとが同時に読み出されることにより実行部５０１全
体を使って実行される。また、ＩＤＣＴ２は、加減算を
主とするＩＤＣＴの後半部分の処理とバッファ２０１へ
の書き出し処理を意味し、第２命令メモリ５０７のＩＤ
ＣＴ２マイクロプログラムが読み出されることによって
加減算器５０３を使って実行される。In the figure, the first instruction memory 506 stores I
The DCT1A microprogram and the IQ microprogram are stored, and a read address is specified by the first PC 504. The IQ microprogram is an arithmetic process mainly including reading of the buffer 200 and multiplication, and does not use the adder / subtractor 503. Second instruction memory 507
Stores an IDCT1B microprogram and an IDCT2 microprogram, and the read address is specified by the first PC 504 or the second PC 505 via the selector 508. Here, IDCT1 means processing of the first half of IDCT mainly including multiplication and addition / subtraction.
The 1A microprogram and the IDCT1B microprogram are simultaneously read and executed using the entire execution unit 501. Also, IDCT2 means processing of the latter half of IDCT mainly including addition and subtraction and processing of writing to the buffer 201, and IDCT2 of the second instruction memory 507
The CT2 microprogram is read and executed using the adder / subtractor 503.

【００４０】ＩＱは乗算器５０２により、ＩＤＣＴ２は
加減算器５０３により処理されるので、これらは並列動
作可能になっている。図９に、画素演算部１０によるＩ
Ｑ、ＩＤＣＴ１、ＩＤＣＴ２の動作タイミング図を示
す。図９において、コード変換部９はバッファ２００に
輝度ブロックＹ０のデータを書き込むと（タイミングｔ
０）、その旨を制御信号１０２にて画素演算部１０に通
知する。画素演算部１０は、プロセッサ７のヘッダ解析
時に設定されたＱＳ（Quantizer Scale）値を用いて、
第１ＰＣ５０４のアドレス指定に従って第１命令メモリ
５０６のＩＱマイクロプログラムを読み出すことによっ
てバッファ２００のデータに対してＩＱを行う。このと
き、セレクタ５０８は第１ＰＣ５０４を選択する（タイ
ミングｔ１）。Since IQ is processed by the multiplier 502 and IDCT2 is processed by the adder / subtractor 503, they can be operated in parallel. FIG. 9 shows the relationship between the I
The operation timing chart of Q, IDCT1, and IDCT2 is shown. 9, when the code conversion unit 9 writes the data of the luminance block Y0 into the buffer 200 (at timing t)
0), the control signal 102 notifies the pixel operation unit 10 of the fact. The pixel operation unit 10 uses a QS (Quantizer Scale) value set at the time of header analysis of the processor 7,
By reading the IQ microprogram in the first instruction memory 506 in accordance with the address specification of the first PC 504, IQ is performed on the data in the buffer 200. At this time, the selector 508 selects the first PC 504 (timing t1).

【００４１】さらに、画素演算部１０は、第１ＰＣ５０
４のアドレス指定に従ってＩＤＣＴ１Ａ及びＩＤＣＴ１
Ｂマイクロプログラムを読み出すことによってバッファ
２００のデータに対してＩＤＣＴ１を行う。このとき、
セレクタ５０８は第１ＰＣ５０４を選択するので、第１
命令メモリ５０６、第２命令メモリ５０７の双方に第１
ＰＣ５０４からのアドレスが指定される（タイミングｔ
２）。Further, the pixel operation section 10 includes a first PC 50
IDCT1A and IDCT1 according to the address designation
The IDCT1 is performed on the data in the buffer 200 by reading the B microprogram. At this time,
Since the selector 508 selects the first PC 504, the first
Both the instruction memory 506 and the second instruction memory 507 have the first
An address from PC 504 is specified (at timing t
2).

【００４２】次に、画素演算部１０は、上記ＱＳ（Quan
tizer Scale）値を用いて、第１ＰＣ５０４のアドレス
指定に従って第１命令メモリ５０６のＩＱマイクロプロ
グラムを読み出すことによってバッファ２００のブロッ
クＹ１のデータに対してＩＱを行い、同時に、第２ＰＣ
５０５のアドレス指定に従って第２命令メモリ５０７の
ＩＤＣＴ２マイクロプログラムを読み出すことによって
ブロックＹ０に対してＩＤＣＴ処理の後半部分を処理す
る。このときセレクタ５０８は第２ＰＣ５０５を選択す
る。第１ＰＣ５０４と第２ＰＣ５０５とは独立にアドレ
スを指定することになる（タイミングｔ３）。Next, the pixel operation unit 10 executes the QS (Quan
Using the tizer scale) value, the IQ microprogram in the first instruction memory 506 is read according to the address designation of the first PC 504 to perform IQ on the data in the block Y1 of the buffer 200, and at the same time, the second PC
The latter half of the IDCT processing is performed on the block Y0 by reading the IDCT2 microprogram in the second instruction memory 507 according to the address specification of 505. At this time, the selector 508 selects the second PC 505. The first PC 504 and the second PC 505 specify addresses independently (timing t3).

【００４３】この後も同様に画素演算部１０はブロック
単位に処理を続ける（タイミングｔ４以降）。＜1.3.2.3 画素読み書き部＞図１０は、画素読み書き
部１１の詳細な構成を示すブロック図である。同図のよ
うに画素読み書き部１１は、バッファ７１〜７４（以
下、バッファＡ〜Ｄと呼ぶ）と、ハーフぺル補間部７５
と、合成部７６と、セレクタ７７、７８と、読み書き制
御部７９とからなる。Thereafter, similarly, the pixel operation section 10 continues the processing in block units (after timing t4). <1.3.2.3 Pixel Read / Write Unit> FIG. 10 is a block diagram showing a detailed configuration of the pixel read / write unit 11. As shown in the figure, the pixel read / write unit 11 includes buffers 71 to 74 (hereinafter, buffers A to D) and a half-pel interpolator 75.
, A combining unit 76, selectors 77 and 78, and a read / write control unit 79.

【００４４】読み書き制御部７９は、バッファ２０１を
介して入力されるブロックデータに対して、バッファＡ
〜Ｄを用いて動き補償を行い、最終的な復号画像を２ブ
ロック単位で外部メモリ３に転送する。より具体的に
は、プロセッサ７のヘッダ解析時に設定された動きベク
トルに従って、外部メモリ３中の参照フレームから２ブ
ロック分に相当する矩形領域を読み出すようメモリコン
トローラ６を制御する。その結果、バッファＡ又はバッ
ファＢに動きベクトルが指し示す２ブロック分の矩形領
域のデータが格納される。その後、ピクチャの種類（Ｉ
かＰかＢピクチャか）に応じて２ブロック分の矩形領域
のハーフペル補間を合成部７６にて行う。さらにバッフ
ァ２０１を介して入力されるブロックデータと、ハーフ
ペル補間後の矩形領域とを合成（加算）することによ
り、当該ブロックの画素値を算出し、バッファＢに格納
する。こうしてバッファＢに格納された最終的な復号ブ
ロックはメモリコントローラ６を介して外部メモリ３に
転送される。＜1.3.3 入出力処理部＞入出力処理部１００１は、上
記のように多数のデータ入出力（データ転送）を実行す
るために、種々のデータ転送を分担する複数のタスクを
オーバーヘッドなく切り替え、しかもデータ入出力要求
に対して応答遅延を生じさせないように構成されてい
る。ここでいうオーバーヘッドは、タスクスイッチ時に
発生するコンテキストの退避及び復帰である。つまり入
出力プロセッサ５は、プログラムカウンタの命令アドレ
スやレジスタデータをメモリ（スタック領域）に退避及
び復帰することにより生ずるオーバーヘッドを解消する
ように構成されている。ここでは、その詳細な構成につ
いて説明する。＜1.3.3.1 ＩＯプロセッサ＞図１１は、ＩＯプロセッ
サ５の構成を示すブロック図である。同図において、Ｉ
Ｏプロセッサ５は、状態監視レジスタ５１、命令メモリ
５２、命令読出回路５３、命令レジスタ５４、デコーダ
５５、演算実行部５６、汎用レジスタセット群５７、タ
スク管理部５８を備え、非同期に発生する複数のイベン
トに対応するために、極めて短い周期（例えば４命令サ
イクル）毎にタスクを切り替えながら実行するよう構成
されている。The read / write control unit 79 controls the buffer A for the block data input through the buffer 201.
To D, and the final decoded image is transferred to the external memory 3 in units of two blocks. More specifically, the memory controller 6 is controlled so that a rectangular area corresponding to two blocks is read from the reference frame in the external memory 3 according to the motion vector set at the time of the header analysis of the processor 7. As a result, data of a rectangular area for two blocks indicated by the motion vector is stored in the buffer A or the buffer B. Then, the picture type (I
Or P picture or B picture), the combining unit 76 performs half-pel interpolation of a rectangular area of two blocks. Further, by combining (adding) the block data input via the buffer 201 and the rectangular area after the half-pel interpolation, the pixel value of the block is calculated and stored in the buffer B. The final decoded block stored in the buffer B is transferred to the external memory 3 via the memory controller 6. <1.3.3 Input / Output Processing Unit> The input / output processing unit 1001 switches a plurality of tasks sharing various data transfers without overhead in order to execute a large number of data inputs / outputs (data transfers) as described above. In addition, the configuration is such that a response delay to a data input / output request is not caused. The overhead referred to here is the saving and restoring of the context that occurs at the time of task switching. That is, the input / output processor 5 is configured to eliminate the overhead caused by saving and restoring the instruction address and the register data of the program counter to and from the memory (stack area). Here, the detailed configuration will be described. <1.3.3.1 IO Processor> FIG. 11 is a block diagram showing a configuration of the IO processor 5. In FIG.
The O processor 5 includes a state monitoring register 51, an instruction memory 52, an instruction reading circuit 53, an instruction register 54, a decoder 55, an operation execution unit 56, a general-purpose register set group 57, and a task management unit 58. In order to respond to an event, the system is configured to execute a task while switching tasks at an extremely short cycle (for example, four instruction cycles).

【００４５】状態監視レジスタ５１は、レジスタＣＲ１
〜ＣＲ３からなり、ＩＯプロセッサ５が種々の入出力状
態を監視するための種々の状態データ（フラグなど）を
保持する。例えば、状態監視レジスタ５１は、ストリー
ム入力部１の状態（ＭＰＥＧストリームにおけるスター
トコード検出フラグ）、ビデオ出力部１２の状態（水平
ブランキング期間を示すフラグ、フレームデータの転送
完了フラグ）、音声出力部１３の状態（音声フレームデ
ータの転送完了フラグ）や、それらとバッファメモリ
２、外部メモリ３及びＦＩＦＯメモリ４との間でのデー
タ転送の状態（データ転送数、ＦＩＦＯメモリ４へのデ
ータ要求フラグ）などを示す状態データを保持する。The state monitoring register 51 is provided in the register CR1.
To CR3, and the IO processor 5 holds various state data (flags, etc.) for monitoring various input / output states. For example, the state monitoring register 51 includes a state of the stream input unit 1 (start code detection flag in the MPEG stream), a state of the video output unit 12 (a flag indicating a horizontal blanking period, a transfer completion flag of frame data), and an audio output unit. 13 (transfer completion flag of audio frame data) and the state of data transfer between them and the buffer memory 2, external memory 3 and FIFO memory 4 (number of data transfer, data request flag to FIFO memory 4) Holds state data indicating such as

【００４６】より具体的には、以下のフラグ等を含む。・スタートコード検出フラグ（以下フラグ１とも呼ぶ）このフラグは、ストリーム入力部１によってＭＰＥＧス
トリームにおけるスタートコードが検出されたとき設定
される。・水平ブランキングフラグ（フラグ２）このフラグは、水平ブランキング期間を示すフラグであ
り、ビデオ出力部１２により設定される。約６０マイク
ロ秒周期で設定される。・映像フレームデータの転送完了フラグ（フラグ３）このフラグは、外部メモリ３からビデオ出力部１２へ１
フレーム分の復号された画像データが転送されたときＤ
ＭＡＣ５ａによって設定される。・音声フレームデータの転送完了フラグ（フラグ４）このフラグは、外部メモリ３から音声出力部１３へ１フ
レーム分の復号された音声データが転送されたときＤＭ
ＡＣ５ａによって設定される。・データ転送完了フラグ（フラグ５）このフラグは、ストリーム入力部１からバッファメモリ
２へＩＯプロセッサ５により指定されたデータ数の圧縮
画像データがＤＭＡＣ５ａによりＤＭＡ転送されたとき
（ターミナルカウントになったとき）に設定される。・ＤＭＡ要求フラグ（フラグ６）このフラグは、バッファメモリ２の圧縮画像データ又は
圧縮音声データを外部メモリ３へＤＭＡ転送すべきデー
タがあることを示すフラグであり、ＩＯプロセッサ５に
より設定される（後述するタスク１からタスク２への要
求）。・映像ＦＩＦＯへのデータ要求フラグ（フラグ７）このフラグは、外部メモリ３からＦＩＦＯメモリ４中の
映像ＦＩＦＯへのデータ転送を要求するフラグであり、
映像ＦＩＦＯの圧縮映像データが所定量以下になったと
き設定される。このフラグは、約５〜４０マイクロ秒周
期で設定される。・音声ＦＩＦＯへのデータ要求フラグ（フラグ８）このフラグは、外部メモリ３からＦＩＦＯメモリ４中の
音声ＦＩＦＯへのデータ転送を要求するフラグであり、
音声ＦＩＦＯの圧縮音声データが所定量以下になったと
きに設定される。このフラグは、約１５〜６０マイクロ
秒周期で設定される。・デコーダ通信要求フラグ（フラグ９）このフラグは、デコード処理部１００２から入出力処理
部１００１へ通信を要求するフラグである。・ホスト通信要求フラグ（フラグ１０）このフラグは、ホストプロセッサから入出力処理部１０
０１へ通信を要求するフラグである。More specifically, the following flags are included. Start code detection flag (hereinafter also referred to as flag 1) This flag is set when the stream input unit 1 detects a start code in an MPEG stream. -Horizontal blanking flag (flag 2) This flag indicates a horizontal blanking period, and is set by the video output unit 12. It is set at a period of about 60 microseconds. -Transfer completion flag of video frame data (flag 3) This flag is transmitted from the external memory 3 to the video output unit 12 by
D when the decoded image data for the frame is transferred
Set by the MAC 5a. • Audio frame data transfer completion flag (flag 4) This flag is set when the decoded audio data for one frame is transferred from the external memory 3 to the audio output unit 13.
Set by AC5a. Data transfer completion flag (flag 5) This flag indicates that when the compressed image data of the number of data specified by the IO processor 5 has been DMA-transferred from the stream input unit 1 to the buffer memory 2 by the DMAC 5a (when the terminal count has been reached). ). DMA request flag (flag 6) This flag indicates that there is data to be subjected to DMA transfer of the compressed image data or the compressed audio data in the buffer memory 2 to the external memory 3, and is set by the IO processor 5 ( Request from task 1 to task 2 described later). Data request flag for video FIFO (flag 7) This flag is a flag for requesting data transfer from the external memory 3 to the video FIFO in the FIFO memory 4.
This is set when the compressed video data of the video FIFO becomes equal to or less than a predetermined amount. This flag is set at a period of about 5 to 40 microseconds. Data request flag for audio FIFO (flag 8) This flag is a flag for requesting data transfer from the external memory 3 to the audio FIFO in the FIFO memory 4.
This is set when the compressed audio data of the audio FIFO becomes equal to or less than a predetermined amount. This flag is set at a period of about 15 to 60 microseconds. Decoder communication request flag (flag 9) This flag is a flag for requesting communication from the decoding processing unit 1002 to the input / output processing unit 1001. Host communication request flag (flag 10) This flag is transmitted from the host processor to the input / output processing unit 10
01 is a flag for requesting communication.

【００４７】上記のフラグ類は、ＩＯプロセッサ５によ
り実行される各タスクにより、割り込みではなく、定常
的に監視される。命令メモリ５２は、多数のデータ入出
力（データ転送）制御を分担する複数のタスクプログラ
ムを記憶する。本実施例では、タスク０〜５の６つのタ
スクプログラムを記憶する。・タスク０（ホストI/Fタスク）本タスクは、上記フラグ１０が設定されたとき、ホスト
コンピュ−タとの通信、つまりホストI/F部１４を介し
たホストコンピュ−タとの通信処理を行うためのタスク
である。例えば、ホストプロセッサからのＭＰＥＧスト
リームのデコード開始、停止、早送り再生、逆再生等の
受け付けと、デコード状況（エラー等）の通知などが行
われる。この処理は、上記フラグ１０をトリガーとす
る。・タスク１（パージングタスク）本タスクは、ストリーム入力部１によりスタートコード
が検出されたとき（上記フラグ１）、ストリーム入力部
１から入力されるＭＰＥＧデータを解析（パージング）
して、個々のエレメンタリストリームを抽出して、抽出
されたエレメンタリストリームを、ＤＭＡ転送(上記転
送経路(1)の前半部分)によりバッファメモリ２に転送す
るプログラムである。ここで抽出されるエレメンタリス
トリームの種類は、圧縮映像データ（ビデオエレメンタ
リーストリームとも呼ぶ）、圧縮音声データ（オーディ
オエレメンタリーストリームとも呼ぶ）、プライベート
データなどがある。エレメンタリストリームをバッファ
メモリ２に格納したときに、上記フラグ６が設定され
る。・タスク２（ストリーム転送／オーディオタスク）本タスクは、次の（ａ）〜（ｃ）の転送を制御するプロ
グラムである。The above-mentioned flags are constantly monitored, not interrupted, by each task executed by the IO processor 5. The instruction memory 52 stores a plurality of task programs sharing a large number of data input / output (data transfer) controls. In this embodiment, six task programs of tasks 0 to 5 are stored. Task 0 (Host I / F task) This task is to perform communication with the host computer, that is, communication processing with the host computer via the host I / F unit 14 when the flag 10 is set. Task to do. For example, start, stop, fast-forward playback, reverse playback, and the like of the MPEG stream are received from the host processor, and a notification of the decoding status (error or the like) is given. This process uses the flag 10 as a trigger. Task 1 (parsing task) This task analyzes (parsing) the MPEG data input from the stream input unit 1 when a start code is detected by the stream input unit 1 (the flag 1).
Then, the program extracts individual elementary streams and transfers the extracted elementary streams to the buffer memory 2 by DMA transfer (the first half of the transfer path (1)). The types of elementary streams extracted here include compressed video data (also called video elementary streams), compressed audio data (also called audio elementary streams), and private data. When the elementary stream is stored in the buffer memory 2, the flag 6 is set. Task 2 (stream transfer / audio task) This task is a program that controls the following transfers (a) to (c).

【００４８】(a)バッファメモリ２から外部メモリ３へ
個々のエレメンタリーストリームのＤＭＡ転送(上記転
送経路(1)の後半部分)。この転送は、上記フラグ１、３
をトリガーとする。 (b)オーディオＦＩＦＯに保持されている圧縮音声デー
タのデータサイズ（残量）に応じて、外部メモリ３から
ＦＩＦＯメモリ４のオーディオＦＩＦＯへの圧縮音声デ
ータのＤＭＡ転送（上記転送経路(2)におけるオーディ
オＦＩＦＯへの転送）。このデータ転送は、オーディオ
ＦＩＦＯに保持されている圧縮音声データのデータサイ
ズが一定量よりも少なくなった場合になされる。この転
送は、上記フラグ８をトリガーとする。(A) DMA transfer of each elementary stream from the buffer memory 2 to the external memory 3 (the latter half of the transfer path (1)). This transfer is based on the flags 1, 3
Is the trigger. (b) DMA transfer of the compressed audio data from the external memory 3 to the audio FIFO of the FIFO memory 4 according to the data size (remaining amount) of the compressed audio data held in the audio FIFO (in the transfer path (2)) Transfer to audio FIFO). This data transfer is performed when the data size of the compressed audio data held in the audio FIFO becomes smaller than a certain amount. This transfer is triggered by the flag 8 described above.

【００４９】(c)外部メモリ３からバッファメモリ２
へ、さらにバッファメモリ２から音声出力部１３へ復号
後のオーディオデータのＤＭＡ転送（上記転送経路
(4)）。この転送は、上記フラグ２をトリガーとする。・タスク３（映像供給タスク）本タスクは、映像ＦＩＦＯに保持されている圧縮映像デ
ータのデータサイズ（残量）に応じて、外部メモリ３か
らＦＩＦＯメモリ４の映像ＦＩＦＯへの圧縮映像データ
のＤＭＡ転送（上記転送経路(2)における映像ＦＩＦＯ
への転送）を処理するプログラムである。このデータ転
送は、映像ＦＩＦＯに保持されている圧縮映像データの
データサイズが一定量よりも少なくなった場合になされ
る。この転送は、上記フラグ７をトリガーとする。・タスク４（ビデオ出力タスク）本タスクは、外部メモリ３からバッファメモリ２へ、さ
らにバッファメモリ２からビデオ出力部１２へ復号後の
映像データのＤＭＡ転送（上記転送経路(4)）を処理す
るプログラムである。この転送は、上記フラグ２をトリ
ガーとする。・タスク５（デコーダＩ／Ｆタスク）本タスクは、デコード処理部１００２からＩＯプロセッ
サ５に向けてのコマンドを処理するプログラムである。
コマンドには、「getAPTS」、「getVPTS」、「getSTC」
などがある。getVPTS（Video Presentation Time Stam
p）は、デコード処理部１００２がＩＯプロセッサ５に
対して圧縮映像データに付与されているＶＰＴＳの取得
を要求するコマンドである。getAPTS（Audio Presentat
ion Time Stamp）は、デコード処理部１００２がＩＯプ
ロセッサ５に対して圧縮音声データに付与されているＡ
ＰＴＳの取得を要求するコマンドである。getSTC（Syst
em Time Clock）は、デコード処理部１００２がＩＯプ
ロセッサ５に対してＳＴＣの取得を要求するコマンドで
ある。これらのコマンドを受けたＩＯプロセッサ５は、
デコード処理部１００２にＳＴＣ、ＶＰＴＳ、ＡＰＴＳ
をそれぞれ通知する。ＳＴＣ、ＶＰＴＳ、ＡＰＴＳは、
デコード処理部１００２において音声と映像とのデコー
ドを同期させたり、フレーム単位でデコードの進度を調
整するために用いられる。この処理は、上記フラグ９を
トリガーとする。(C) From the external memory 3 to the buffer memory 2
DMA transfer of the decoded audio data from the buffer memory 2 to the audio output unit 13 (the transfer path described above).
(Four)). This transfer is triggered by the flag 2 described above. Task 3 (video supply task) This task is to perform DMA of the compressed video data from the external memory 3 to the video FIFO of the FIFO memory 4 according to the data size (remaining amount) of the compressed video data held in the video FIFO. Transfer (video FIFO in transfer path (2) above)
Transfer). This data transfer is performed when the data size of the compressed video data held in the video FIFO becomes smaller than a certain amount. This transfer is triggered by the flag 7 described above. Task 4 (video output task) This task processes the DMA transfer of the decoded video data from the external memory 3 to the buffer memory 2 and further from the buffer memory 2 to the video output unit 12 (the above transfer path (4)). It is a program. This transfer is triggered by the flag 2 described above. Task 5 (Decoder I / F task) This task is a program that processes commands from the decode processing unit 1002 to the IO processor 5.
Commands include "getAPTS", "getVPTS", "getSTC"
and so on. getVPTS (Video Presentation Time Stam
p) is a command by which the decode processing unit 1002 requests the IO processor 5 to acquire the VPTS added to the compressed video data. getAPTS (Audio Presentat
(Ion Time Stamp) is the A assigned to the compressed audio data by the decode processing unit 1002 to the IO processor 5.
This is a command for requesting acquisition of a PTS. getSTC (Syst
em Time Clock) is a command by which the decode processing unit 1002 requests the IO processor 5 to acquire an STC. Upon receiving these commands, the IO processor 5
STC, VPTS, APTS in decoding processing unit 1002
Notify each. STC, VPTS, APTS,
The decoding processing unit 1002 is used to synchronize the decoding of audio and video, and to adjust the progress of decoding in units of frames. This process uses the flag 9 as a trigger.

【００５０】命令読出回路５３は、命令フェッチアドレ
スを指すプログラムカウンタ（以下ＰＣと略す）を複数
個備え、タスク管理部５８により指定されたＰＣを用い
て命令メモリ５２から命令を読み出して命令レジスタ５
４に格納する。具体的には、命令読出回路５３は、上記
タスク０〜５に対応するＰＣ０〜５を有し、タスク管理
部５８によるＰＣの指定が変更されたとき、ハードウェ
アにより高速にＰＣを切り替えるように構成されてい
る。この構成によりＩＯプロセッサ５は、タスクスイッ
チに際して現在のタスクのＰＣ値をメモリに退避し、メ
モリから次のタスクのＰＣ値を復帰する処理から解放さ
れている。The instruction reading circuit 53 includes a plurality of program counters (hereinafter, abbreviated as PCs) each indicating an instruction fetch address.
4 is stored. Specifically, the instruction readout circuit 53 has PCs 0 to 5 corresponding to the tasks 0 to 5, and switches the PC at high speed by hardware when the designation of the PC by the task management unit 58 is changed. It is configured. With this configuration, the IO processor 5 saves the PC value of the current task in the memory at the time of the task switch and is released from the process of restoring the PC value of the next task from the memory.

【００５１】デコーダ５５は、命令メモリ５２から読み
出されて命令レジスタ５４に格納された命令を解読し、
当該命令を実行するように演算実行部５６を制御する。
加えて、デコーダ５５は、ＩＯプロセッサ５全体を、命
令読出回路５３の命令読み出しステージ、デコーダ５５
の解読ステージ、演算実行部５６の実行ステージの少な
くとも３段からなるパイプライン制御を行う。The decoder 55 decodes the instruction read from the instruction memory 52 and stored in the instruction register 54,
The arithmetic execution unit 56 is controlled so as to execute the instruction.
In addition, the decoder 55 sets the entire IO processor 5 to the instruction reading stage of the instruction reading circuit 53,
And the execution stage of the operation execution unit 56 performs at least three stages of pipeline control.

【００５２】演算実行部５６は、ＡＬＵ（Arithmetic L
ogical Unit）、乗算器、ＢＳ(Barrel Shifter)などを
有し、デコーダ５５の制御に従って、命令で指定された
演算を実行する。汎用レジスタセット群５７は、タスク
０〜タスク５に対応する６つのレジスタセット（１レジ
スタセットは４本の３２ビットレジスタと４本の１６ビ
ットレジスタ）を備えている。全部で２４本の３２ビッ
トレジスタと２４本の１６ビットレジスタとを有し、実
行中のタスクに対応するレジスタセットが使用される。
これによりＩＯプロセッサ５は、タスクスイッチに際し
て現在の全レジスタデータをメモリに退避し、メモリか
ら次のタスクのレジスタデータを復帰する処理から解放
されている。The arithmetic execution unit 56 is provided with an ALU (Arithmetic L
ogical unit), a multiplier, a BS (Barrel Shifter), and the like, and execute the operation specified by the instruction under the control of the decoder 55. The general-purpose register set group 57 includes six register sets corresponding to task 0 to task 5 (one register set includes four 32-bit registers and four 16-bit registers). A register set having a total of 24 32-bit registers and 24 16-bit registers, and corresponding to the task being executed is used.
As a result, the IO processor 5 saves all the current register data in the memory at the time of the task switch and is released from the process of restoring the register data of the next task from the memory.

【００５３】タスク管理部５８は、所定数の命令サイク
ル数毎に、命令読出回路５３のＰＣ及び汎用レジスタセ
ット群５７のレジスタセットを切り替えることによりタ
スク切替えを行う。本実施例では上記所定数は４であ
る。またＩＯプロセッサ５は１命令を１命令サイクルで
パイプライン処理するので、タスク管理部５８は、上記
オーバーヘッドを生じることなしに４命令毎にタスクを
切り替えることになる。これにより非同期に発生する各
種の入出力要求に対して応答遅延を抑えている。つまり
入出力要求に対する応答遅延は、最大でもわずか２４命
令サイクルしか生じない。＜1.3.3.1.1 命令読出回路＞図１２は、命令読出回路
５３の詳細な構成例を示すブロック図である。The task management section 58 performs task switching by switching the PC of the instruction reading circuit 53 and the register set of the general-purpose register set group 57 every predetermined number of instruction cycles. In this embodiment, the predetermined number is four. In addition, since the IO processor 5 processes one instruction in one instruction cycle, the task management unit 58 switches tasks every four instructions without causing the above-mentioned overhead. As a result, response delays for various asynchronous input / output requests are suppressed. In other words, the response delay to the input / output request is at most only 24 instruction cycles. <1.3.3.1.1 Instruction Read Circuit> FIG. 12 is a block diagram showing a detailed configuration example of the instruction read circuit 53.

【００５４】同図において命令読出回路５３は、タスク
別ＰＣ格納部５３ａ、現ＩＦＡＲ（Instruction Fetch
Address Register）５３ｂ、インクリメンタ５３ｃ、次
ＩＦＡＲ５３ｄ、セレクタ５３ｅ、セレクタ５３ｆ、Ｄ
ＥＣＡＲ（DECode Address Register）５３ｇを備え、
タスク切替えに際してオーバーヘッドなしに命令読み出
しアドレスを切り替えるように構成されている。In the figure, an instruction reading circuit 53 includes a task-specific PC storage unit 53a and a current IFAR (Instruction Fetch).
Address Register) 53b, incrementer 53c, next IFAR 53d, selector 53e, selector 53f, D
Equipped with ECAR (DECode Address Register) 53g,
At the time of task switching, the instruction read address is switched without overhead.

【００５５】タスク別ＰＣ格納部５３ａは、タスク０〜
５に対応する６本のアドレスレジスタを有し、タスク毎
にプログラムカウント値を保持する。各プログラムカウ
ント値は、対応するタスクの再開アドレスである。タス
ク切替えに際して、タスク管理部５８及びデコーダ５５
の制御の下で、次に実行すべきタスクに対応するアドレ
スレジスタからプログラムカウント値が読み出され、現
に実行しているタスクに対応するアドレスレジスタのプ
ログラムカウント値が新たな再開アドレスに更新され
る。このとき、次に実行すべきタスク、現タスクは、そ
れぞれタスク管理部５８により"nexttaskid（rd add
r）"信号（以下タスクＩＤとも呼ぶ）、”taskid（wr a
ddr）”信号により指定される。The task-specific PC storage unit 53a stores tasks 0 to
It has six address registers corresponding to 5 and holds a program count value for each task. Each program count value is a restart address of the corresponding task. At the time of task switching, the task management unit 58 and the decoder 55
, The program count value is read from the address register corresponding to the task to be executed next, and the program count value of the address register corresponding to the task currently being executed is updated to a new restart address. . At this time, the task to be executed next and the current task are respectively "nexttaskid (rd add
r) "signal (hereinafter also referred to as task ID)," taskid (wr a
ddr) "signal.

【００５６】タスク０、１、２に対応するプログラムカ
ウント値を図１３のＰＣ０、１、２に示す。同図におい
て、（０−０）はタスク０の命令０を、（１−４）はタ
スク１の命令４を表す。例えば、ＰＣ０は、タスク０の
再開に際して読み出され（命令サイクルｔ０）、次のタ
スクへの切替に際して、命令（０−４）のアドレスに更
新される（命令サイクルｔ４）。The program count values corresponding to tasks 0, 1, and 2 are shown in PC0, PC1, and PC2 of FIG. In the figure, (0-0) represents instruction 0 of task 0, and (1-4) represents instruction 4 of task 1. For example, PC0 is read when task 0 is resumed (instruction cycle t0), and is updated to the address of the instruction (0-4) when switching to the next task (instruction cycle t4).

【００５７】インクリメンタ５３ｃ、次ＩＦＡＲ５３
ｄ、セレクタ５３ｅからなるループ回路は、セレクタ５
３ｅにより選択された命令読み出しアドレスを更新する
回路である。セレクタ５３ｅから出力されるアドレスを
図１３のＩＦ１に示す。同図において、例えばタスク０
からタスク１への切替えに際して、セレクタ５３ｅは、
サイクルｔ４においてタスク別ＰＣ格納部５３ａから読
み出された命令（１−０）アドレスを選択し、サイクル
ｔ５〜ｔ７において次ＩＦＡＲ５３ｄからのインクリメ
ントされた命令アドレスを選択する。Incrementor 53c, next IFAR 53
d, and the loop circuit composed of the selector 53e
This is a circuit for updating the instruction read address selected by 3e. The address output from the selector 53e is shown as IF1 in FIG. In the figure, for example, task 0
When switching from to the task 1, the selector 53e
In cycle t4, the instruction (1-0) address read from the task-specific PC storage unit 53a is selected, and in cycle t5 to t7, the incremented instruction address from the next IFAR 53d is selected.

【００５８】現ＩＦＡＲ５３ｂは、セレクタ５３ｅの選
択出力ＩＦ１を１サイクル遅れて保持し、命令メモリ５
２に命令読み出しアドレスとして出力する。言い換えれ
ば、現在アクティブなタスクの命令読み出しアドレスを
保持する。現ＩＦＡＲ５３ｂの命令読み出しアドレス
を、図１３のＩＦ２に示す。同図に示すように、ＩＦ２
は４命令サイクル毎に異なるタスクの命令アドレスを指
している。The current IFAR 53b holds the selected output IF1 of the selector 53e with a delay of one cycle, and
2 is output as an instruction read address. In other words, it holds the instruction read address of the currently active task. The instruction read address of the current IFAR 53b is shown in IF2 of FIG. As shown in FIG.
Indicates an instruction address of a different task every four instruction cycles.

【００５９】ＤＥＣＡＲ５３ｇは、命令レジスタ５４に
保持されている命令のアドレスを保持する。つまり、デ
コード中の命令を指す。図１３中のＤＥＣに、ＤＥＣＡ
Ｒ５３ｇに保持されたアドレスを示す。また、図１３中
のＥＸは、実行中の命令アドレスを示す。セレクタ５３
ｆは、分岐命令実行時や割込み発生時に分岐アドレスを
選択し、それ以外は次ＩＦＡＲ５３ｄのアドレスを選択
する。The DECAR 53g holds the address of the instruction held in the instruction register 54. That is, it indicates the instruction being decoded. The DEC in FIG.
It shows the address held in R53g. EX in FIG. 13 indicates an instruction address being executed. Selector 53
f selects a branch address when a branch instruction is executed or an interrupt occurs, and otherwise selects the address of the next IFAR 53d.

【００６０】このような命令読出回路５３を備えること
により、ＩＯプロセッサ５は、図１３に示すように４段
（ＩＦ１、ＩＦ２、ＤＥＣ、ＥＸ）のパイプライン処理
を行っている。このうちＩＦ１ステージは、複数プログ
ラムカウント値の選択及び更新を行うステージである。
ＩＦ２ステージは、命令を読み出すステージである。＜1.3.3.1.2 タスク管理部＞図１４は、タスク管理部
５８の詳細な構成を示すブロック図である。同図におい
てタスク管理部５８は、タスクの切替えタイミングを管
理するスロットマネージャと、タスクの順序を管理する
スケジューラとに大別される。With the provision of such an instruction read circuit 53, the IO processor 5 performs four stages (IF1, IF2, DEC, EX) of pipeline processing as shown in FIG. The IF1 stage is a stage for selecting and updating a plurality of program count values.
The IF2 stage is a stage for reading an instruction. <1.3.3.1.2 Task Management Unit> FIG. 14 is a block diagram showing a detailed configuration of the task management unit 58. In the figure, the task management unit 58 is broadly divided into a slot manager that manages task switching timing and a scheduler that manages task order.

【００６１】スロットマネージャは、カウンタ５８ａ、
ラッチ５８ｂ、比較器５８ｃ、ラッチユニット５８ｄを
有し、４命令サイクル毎にタスク切替えを指示するタス
ク切替信号（chgtaskex）を命令読出回路５３へ出力す
る。具体的には、ラッチ５８ｂは、カウンタ５８ａの出
力の下位２ビットを保持する２個のＦＦ（Flip Flop）
回路である。カウンタ５８ａは、命令サイクルを示すク
ロック毎にラッチ５８ｂの２ビットの出力値を＋１イン
クリメントした３ビットを出力する。その結果、カウン
タ５８ａは、１、２、３、４を繰り返し出力することに
なる。比較器５８ｃは、カウンタ５８ａの出力値が定数
４と一致したときにタスク切替信号（chgtaskex）を命
令読出回路５３とスケジューラとに出力する。The slot manager includes a counter 58a,
It has a latch 58b, a comparator 58c, and a latch unit 58d, and outputs a task switching signal (chgtaskex) for instructing task switching to the instruction reading circuit 53 every four instruction cycles. Specifically, the latch 58b includes two flip-flops (FFs) that hold the lower two bits of the output of the counter 58a.
Circuit. The counter 58a outputs 3 bits obtained by incrementing the 2-bit output value of the latch 58b by +1 every clock indicating an instruction cycle. As a result, the counter 58a repeatedly outputs 1, 2, 3, and 4. The comparator 58c outputs a task switching signal (chgtaskex) to the instruction reading circuit 53 and the scheduler when the output value of the counter 58a matches the constant 4.

【００６２】スケジューラは、タスクラウンド管理部５
８ｅ、プライオリティエンコーダ５８ｆ、ラッチ５８ｇ
を備え、タスク切替信号（chgtaskex）が出力されるご
とに、タスクｉｄを更新し、現在のタスクｉｄと次に実
行すべきタスクｉｄとを命令読出回路５３に出力する。
具体的には、ラッチユニット５８ｄ、ラッチ５８ｇは、
ともに現在のタスクｉｄをエンコードされた形式（３ビ
ット）で保持する。エンコードされた形式は、その値が
タスクｉｄを表す。The scheduler has a task round management unit 5
8e, priority encoder 58f, latch 58g
Each time a task switching signal (chgtaskex) is output, the task id is updated, and the current task id and the task id to be executed next are output to the instruction reading circuit 53.
Specifically, the latch unit 58d and the latch 58g
Both hold the current task id in encoded form (3 bits). The value of the encoded format represents the task id.

【００６３】タスクラウンド管理部５８ｅは、タスク切
替信号（chgtaskex）が入力されたとき、ラッチユニッ
ト５８ｄを参照して、次に実行すべきタスクｉｄを、デ
コードされた形式（６ビット）で出力する。デコードさ
れた形式（６ビット）は、１ビットが１タスクに対応
し、ビット位置がタスクｉｄを表す。プライオリティエ
ンコーダ５８ｆは、タスクラウンド管理部５８ｅから出
力されるタスクｉｄを、デコードされた形式からエンコ
ードされた形式に変換する。上記ラッチユニット５８
ｄ、ラッチ５８ｇは、ともにエンコードされたタスクｉ
ｄを１サイクル遅れて保持する。When a task switching signal (chgtaskex) is input, the task round management unit 58e refers to the latch unit 58d and outputs a task id to be executed next in a decoded format (6 bits). . In the decoded format (6 bits), one bit corresponds to one task, and the bit position represents the task id. The priority encoder 58f converts the task id output from the task round management unit 58e from a decoded format to an encoded format. The latch unit 58
d, latch 58g, together with encoded task i
d is held one cycle later.

【００６４】この構成により、タスクラウンド管理部５
８ｅは、比較器５８ｃからタスク切替信号（chgtaske
x）が出力されたとき、プライオリティエンコーダ５８
ｆから次に実行すべきタスクのｉｄを"nexttaskid（rd
addr）"信号として、ラッチ５８ｅから現タスクｉｄ
を”taskid（wr addr）”信号として出力する。＜1.4 動作説明＞以上のように構成された第１の実施
形態における映像音声処理装置１０００について、その
動作を説明する。With this configuration, the task round management unit 5
8e receives a task switching signal (chgtaske) from the comparator 58c.
x) is output, the priority encoder 58
From f, change the id of the task to be executed next to "nexttaskid (rd
addr) "as a signal from the latch 58e to the current task id.
As a “taskid (wr addr)” signal. <1.4 Description of Operation> The operation of the video / audio processing apparatus 1000 according to the first embodiment configured as described above will be described.

【００６５】入出力処理部１００１において、ストリー
ム入力部１から非同期に入力されるＭＰＥＧストリーム
は、入出力プロセッサ５の制御によって、バッファメモ
リ２、メモリコントローラ６を介して一旦外部メモリ３
に格納され、さらに、メモリコントローラ６を介してＦ
ＩＦＯメモリ４に保持される。このときＦＩＦＯメモリ
４に対して、ＩＯプロセッサ５は、上記タスク２
（ｂ）、タスク３を実行することによりその残量に応じ
て、圧縮動画データ、圧縮音声データを供給する。これ
により、ＦＩＦＯメモリ４には過不足なく一定量の圧縮
動画データ、圧縮音声データが供給されるので、デコー
ド処理部１００２は、非同期の入出力とは切り離され
て、デコード処理に専従することができる。ここまでの
処理は、上記入出力処理部１００１により、デコード処
理部１００２とは独立に並行してなされる。In the input / output processing unit 1001, the MPEG stream input asynchronously from the stream input unit 1 is temporarily controlled by the input / output processor 5 via the buffer memory 2 and the memory controller 6.
And further via the memory controller 6 to F
It is held in the IFO memory 4. At this time, for the FIFO memory 4, the IO processor 5
(B) By executing task 3, compressed video data and compressed audio data are supplied according to the remaining amount. As a result, a fixed amount of compressed moving image data and compressed audio data are supplied to the FIFO memory 4 without excess and deficiency, so that the decoding processing unit 1002 can be dedicated to the decoding processing while being separated from asynchronous input / output. it can. The processing so far is performed by the input / output processing unit 1001 independently and in parallel with the decoding processing unit 1002.

【００６６】一方、デコード処理部１００２において、
ＦＩＦＯメモリ４に保持されたＭＰＥＧストリームデー
タは、以降プロセッサ７、コード変換部９、画素演算部
１０、画素読み書き部１１により復号される。ＦＩＦＯ
メモリ４以降の復号動作を示す説明図を図１５に示す。
同図では、横軸を時間軸としておおよそ１マクロブロッ
ク分のヘッダ解析及び各ブロック毎のデコードの様子を
示している。また縦方向はデコード処理部１００２の各
部においてブロック毎のデコードがパイプライン的に実
行される様子を示している。On the other hand, in the decoding processing unit 1002,
The MPEG stream data held in the FIFO memory 4 is decoded by the processor 7, the code conversion unit 9, the pixel operation unit 10, and the pixel read / write unit 11 thereafter. FIFO
FIG. 15 is an explanatory diagram showing the decoding operation after the memory 4.
In the figure, the horizontal axis represents the time axis, and the header analysis of approximately one macroblock and the decoding of each block are shown. The vertical direction indicates that the decoding of each block is executed in a pipeline manner in each unit of the decoding processing unit 1002.

【００６７】同図に示すように、プロセッサ７は、圧縮
映像データのヘッダ解析と、圧縮音声データに対するデ
コード処理とを時分割で繰り返す。すなわち、プロセッ
サ７は、１マクロブロック分のヘッダ解析を行い、解析
結果をコード変換部９、画素演算部１０、画素読み書き
部１１に通知した後、コード変換部９に対してマクロブ
ロックのデコード開始を指示する。その後プロセッサ７
は、コード変換部９からの割込み信号が通知されるま
で、圧縮音声データのデコード処理を行う。デコード後
の音声データは内部メモリ８に一旦保持され、さらにメ
モリコントローラ６により外部メモリ３にＤＭＡ転送さ
れる。As shown in the figure, the processor 7 repeats the header analysis of the compressed video data and the decoding process on the compressed audio data in a time-division manner. That is, the processor 7 analyzes the header of one macroblock, notifies the code conversion unit 9, the pixel operation unit 10, and the pixel read / write unit 11 of the analysis result, and then starts decoding the macroblock for the code conversion unit 9. Instruct. Then processor 7
Performs decoding processing of the compressed audio data until an interrupt signal is notified from the code conversion unit 9. The decoded audio data is temporarily held in the internal memory 8 and further DMA-transferred to the external memory 3 by the memory controller 6.

【００６８】また、コード変換部９は、プロセッサ７か
らマクロブロックのデコード開始指示を受けて、マクロ
ブロック内の各ブロック毎にバッファ２００に格納す
る。このときコード変換部９は、プロセッサ７のヘッダ
解析時に通知されたブロックのスキャンタイプに応じて
バッファ２００への書き込みアドレスの順番を変更す
る。つまりジグザグスキャンの場合と、オルタネートス
キャンの場合とで書き込みアドレスの順番を変更する。
これにより画素演算部１０は、読み出しアドレスの順番
を変更しなくてもよく、スキャンタイプに拘らず常に同
じに読み出しアドレスの順番にて読み出すことができ
る。コード変換部９は、マクロブロック内の６つのブロ
ックをＶＬＤ処理をし終えるまで上記動作を繰り返して
バッファ２００に書き出す。６ブロックのＶＬＤを終え
るとプロセッサ７に割込みを発生する。この割込み信号
は、マクロブロックデコード終了信号End Of Macro Blo
ck(EOMB)である。コード変換部９は６つ目のブロックの
ブロック終了信号End Of Block(EOB)を検出することに
よりEOMBを生成している。The code converter 9 receives a macroblock decoding start instruction from the processor 7 and stores it in the buffer 200 for each block in the macroblock. At this time, the code conversion unit 9 changes the order of the write address to the buffer 200 according to the scan type of the block notified when the processor 7 analyzes the header. That is, the order of the write addresses is changed between the case of the zigzag scan and the case of the alternate scan.
Accordingly, the pixel operation unit 10 does not need to change the order of the read addresses, and can always read in the same order of the read addresses regardless of the scan type. The code conversion unit 9 repeats the above operation until the six blocks in the macroblock have been subjected to the VLD processing, and writes the data into the buffer 200. When the VLD of six blocks is completed, an interrupt is generated in the processor 7. This interrupt signal is a macroblock decode end signal End Of Macro Blo
ck (EOMB). The code converter 9 generates the EOMB by detecting the block end signal End Of Block (EOB) of the sixth block.

【００６９】画素演算部１０は、コード変換部９と並行
して、図９に示したようにバッファ２００に格納された
ブロックデータをブロック単位にＩＱ、ＩＤＣＴを施
し、その処理結果をバッファ２０１に格納する。画素読
み書き部１１は、画素演算部１０と並行して、バッファ
２０１のブロックデータと、プロセッサ７によるヘッダ
解析により通知された動きベクトルとに基づいて、図１
５に示すように外部メモリ３の参照フレームからの矩形
領域の切り出しと、ブロック合成とを行う。ブロック合
成結果は、ＦＩＦＯメモリ４を介して外部メモリ３に格
納される。The pixel operation unit 10 performs IQ and IDCT on the block data stored in the buffer 200 in block units as shown in FIG. Store. The pixel read / write unit 11 performs, in parallel with the pixel operation unit 10, based on the block data in the buffer 201 and the motion vector notified by the header analysis by the processor 7, as shown in FIG.
As shown in FIG. 5, a rectangular area is cut out from a reference frame in the external memory 3 and block synthesis is performed. The result of block synthesis is stored in the external memory 3 via the FIFO memory 4.

【００７０】上記は、スキップマクロブロックではない
場合の動作であるが、スキップマクロブロックの場合に
はコード変換部９及び画素演算部１０は動作せず、画素
読み書き部１１のみが動作する。スキップマクロブロッ
クがある場合には、参照フレーム中の矩形領域と同じ画
像なので、画素読み書き部１１により、その画像が復号
画像として外部メモリ３にコピーされることになる。The above is the operation when the block is not a skip macro block. In the case of a skip macro block, the code conversion section 9 and the pixel operation section 10 do not operate, and only the pixel read / write section 11 operates. If there is a skipped macroblock, the image is the same as the rectangular area in the reference frame, so that the image is copied by the pixel read / write unit 11 to the external memory 3 as a decoded image.

【００７１】この場合、コード変換部９からプロセッサ
７への割込み信号は次のようにして生成される。すなわ
ち、プロセッサ７が画素読み書き部１１に対して動き補
償動作の開始の制御信号を送付したことを示す信号と、
画素読み書き部１１が動き補償動作が可能であることを
示す信号と、スキップマクロブロックであることを示す
信号との論理積を取り、さらにこの論理積と上記のEOMB
信号との論理和として割込み信号がプロセッサ７に入力
される。In this case, an interrupt signal from the code converter 9 to the processor 7 is generated as follows. That is, a signal indicating that the processor 7 has sent a control signal for starting a motion compensation operation to the pixel read / write unit 11,
The pixel read / write unit 11 calculates the logical product of the signal indicating that the motion compensation operation is possible and the signal indicating that the pixel is a skip macro block, and further calculates the logical product and the EOMB.
An interrupt signal is input to the processor 7 as a logical sum with the signal.

【００７２】以上説明してきたように本発明の第１実施
形態の映像音声処理装置によれば、記憶媒体や通信媒体
からのＭＰＥＧストリーム入力処理と、表示装置及び音
声出力装置への表示画像データ及び音声データの出力処
理と、デコード処理部１００２へストリームを供給する
処理とを入出力処理部１００１が分担し、圧縮映像デー
タ及び圧縮音声データのデコード処理をデコード処理部
１００２が分担するように構成されている。これによ
り、デコード処理部１００２は、非同期に発生する処理
から解放されてデコード処理に専従することができる。
その結果、ＭＰＥＧストリーム入力、デコード、出力と
いう一連の処理を効率良く実行するので、高速な動作ク
ロックを用いなくてもＭＰＥＧストリームのフルデコー
ド（フレーム落ちなし）を実現することができる。As described above, according to the video / audio processing apparatus of the first embodiment of the present invention, the MPEG stream input processing from the storage medium or the communication medium, the display image data to the display apparatus and the audio output apparatus, The input / output processing unit 1001 shares output processing of audio data and processing of supplying a stream to the decoding processing unit 1002, and the decoding processing unit 1002 shares decoding processing of compressed video data and compressed audio data. ing. As a result, the decoding processing unit 1002 is released from the processing that occurs asynchronously and can exclusively use the decoding processing.
As a result, since a series of processing of inputting, decoding, and outputting the MPEG stream is efficiently executed, full decoding of the MPEG stream (without dropped frames) can be realized without using a high-speed operation clock.

【００７３】また、本映像音声処理装置は、１チップに
ＬＳＩ化することが望ましい。この場合、１００ＭＨｚ
以下の動作クロック（実際には５４ＭＨｚ）で上記フル
デコードが可能である。この点、動作クロックが１００
ＭＨｚさらには２００ＭＨｚを越える近年の高性能ＣＰ
Ｕは、画像サイズが小さければ上記フルデコードを可能
にしているが、その反面製造コストが高価である。これ
に対して、本映像音声処理装置は、製造コストの点とフ
ルデコードの点で優れている。Further, it is desirable that the present video / audio processing apparatus be integrated into an LSI on one chip. In this case, 100MHz
The full decoding is possible with the following operation clock (actually 54 MHz). In this regard, the operating clock is 100
MHz High performance CP over 200MHz
U enables full decoding when the image size is small, but the manufacturing cost is high. On the other hand, the present video / audio processing apparatus is superior in terms of manufacturing cost and full decoding.

【００７４】さらに、本映像音声処理装置のデコード処
理部１００２は、次のように役割分担している。つま
り、プロセッサ７が圧縮映像データに対しても圧縮音声
データに対しても多岐にわたる条件判断を必要とするヘ
ッダ解析を担当するとともに音声圧縮データのデコード
も担当する。圧縮映像データのブロックデータに対して
は、定型的な大量の演算量が要求されるので、コード変
換部９、画素演算部１０、画素読み書き部１１という専
用のハードウェア（ファームウェア）が、デコード処理
を担当する。図１５に示したようにコード変換部９、画
素演算部１０、画素読み書き部１１は、パイプライン化
されている。画素演算部１０は、ＩＱとＩＤＣＴとが並
列処理が可能になっている。画素読み書き部１１は２ブ
ロック単位の参照フレームのアクセスを実現している。
これらにより圧縮音声デコード処理の効率化が達成され
ているので、映像デコード専用のハードウェア部分は高
速クロックを用いなくとも、高い処理能力を得ることが
できる。具体的には１００ＭＨｚを越える高速クロック
を用いずに５０〜６０ＭＨｚ程度のクロックで従来と同
程度以上の処理能力が得られた。従って、高速素子を用
いる必要がなく製造コストを押さえることができる。Further, the decoding processing section 1002 of the present video / audio processing apparatus has the following roles. In other words, the processor 7 is in charge of header analysis that requires a wide variety of condition judgments for both compressed video data and compressed audio data, and is also responsible for decoding audio compressed data. Since a large amount of routine calculation is required for block data of compressed video data, dedicated hardware (firmware) such as a code conversion unit 9, a pixel operation unit 10, and a pixel read / write unit 11 performs decoding processing. In charge of As shown in FIG. 15, the code conversion unit 9, the pixel operation unit 10, and the pixel read / write unit 11 are pipelined. In the pixel operation unit 10, IQ and IDCT can be processed in parallel. The pixel read / write unit 11 implements access to a reference frame in units of two blocks.
As a result, the efficiency of the compressed audio decoding process is improved, so that the hardware dedicated to video decoding can obtain high processing performance without using a high-speed clock. More specifically, a processing ability equal to or higher than that of the related art was obtained with a clock of about 50 to 60 MHz without using a high-speed clock exceeding 100 MHz. Therefore, it is not necessary to use a high-speed element, and the manufacturing cost can be reduced.

【００７５】また、映像デコードの基本単位をプロセッ
サ７においてマクロブロック単位、コード変換部９およ
び画素演算部１０においてブロック、画素読み書き部１
１において２ブロックとしているので、映像デコードに
おける緩衝バッファの容量を最小限に抑えることが可能
となる。＜2 第２の実施形態＞本実施形態の映像音声処理装置
は、圧縮ストリームデータのデコード機能に加えて、さ
らに、圧縮機能（以降、エンコード処理と呼ぶ）とグラ
フィックス機能を果たすように構成されている。＜2.1 映像音声処理装置の構成＞図１６は、本発明の
第２の実施形態における映像音声処理装置の構成を示す
ブロック図である。The basic unit of video decoding is a macro block unit in the processor 7, a block in the code conversion unit 9 and the pixel operation unit 10, and a pixel read / write unit 1.
Since two blocks are used in one, the capacity of the buffer buffer in video decoding can be minimized. <2 Second Embodiment> The video and audio processing apparatus according to the present embodiment is configured to perform a compression function (hereinafter, referred to as an encoding process) and a graphics function in addition to a decoding function of compressed stream data. ing. <2.1 Configuration of Video / Audio Processing Apparatus> FIG. 16 is a block diagram showing a configuration of a video / audio processing apparatus according to the second embodiment of the present invention.

【００７６】この映像音声処理装置２０００は、ストリ
ーム入出力部２１、バッファメモリ２２、ＦＩＦＯメモ
リ２４、入出力プロセッサ２５、メモリコントローラ２
６、プロセッサ２７、内部メモリ２８、コード変換部２
９、画素演算部３０、画素読み書き部３１、ビデオ出力
部１２、音声出力部１３、バッファ２００、バッファ２
０１とからなる。映像音声処理装置２０００は、図４に
示した映像音声処理装置１０００の機能に加えて、次の
機能が付加されている。すなわち、映像データと音声デ
ータの圧縮機能と、ポリゴンデータを描画するグラフィ
ックス機能とが付加されている。The video / audio processing device 2000 includes a stream input / output unit 21, a buffer memory 22, a FIFO memory 24, an input / output processor 25, and a memory controller 2.
6, processor 27, internal memory 28, code conversion unit 2
9, pixel operation unit 30, pixel read / write unit 31, video output unit 12, audio output unit 13, buffer 200, buffer 2
01. The video / audio processing device 2000 has the following functions in addition to the functions of the video / audio processing device 1000 shown in FIG. That is, a compression function of video data and audio data and a graphics function of drawing polygon data are added.

【００７７】そのため、映像音声処理装置２０００にお
いて、図４と同名称の構成要素は全く同じ機能を有し、
さらに、圧縮機能とグラフィックス機能を果たす機能が
付加されている。以下図４と同じ点は説明を省略し、異
なる点を中心に説明する。ストリーム入出力部２１は、
双方向になっている点が異なる。つまり、入出力プロセ
ッサ２５の制御によりバッファメモリ２２からＭＰＥＧ
データを転送されると、転送されたパラレルデータをシ
リアルデータに変換して、ＭＰＥＧデータストリームと
して外部に出力する。Therefore, in the video / audio processing apparatus 2000, components having the same names as those in FIG.
Further, a function for performing a compression function and a graphics function is added. Hereinafter, description of the same points as in FIG. 4 will be omitted, and different points will be mainly described. The stream input / output unit 21
The difference is that it is bidirectional. That is, the MPEG memory is stored in the buffer memory 22 under the control of the input / output processor 25.
When the data is transferred, the transferred parallel data is converted into serial data and output to the outside as an MPEG data stream.

【００７８】バッファメモリ２２、ＦＩＦＯメモリ２４
も双方向になった点が異なる。入出力プロセッサ２５
は、第１実施形態に示した(1)〜(4)に示すの経路のデー
タ転送を制御することに加えて、(5)〜(8)の径路の転送
をも制御する。 (1)ストリーム入出力部２１→バッファメモリ２２→メモリコントロー
ラ２６→外部メモリ３ (2)外部メモリ３→メモリコントローラ２６→ＦＩＦＯメモリ２４ (3)外部メモリ３→メモリコントローラ２６→バッファメモリ２２
→ビデオ出力部１２ (4)外部メモリ３→メモリコントローラ２６→バッファメモリ２２
→音声出力部１３ (5)外部メモリ３→メモリコントローラ２６→内部メモリ２８ (6)外部メモリ３→メモリコントローラ２６→画素読み書き部３１ (7)ＦＩＦＯメモリ２４→メモリコントローラ２６→外部メモリ３ (8)外部メモリ３→メモリコントローラ２６→バッファメモリ２２
→ストリーム入出力部２１ (5)(6)の径路は、映像データ、音声データのエンコード
処理を行う場合の元のデータの径路であり、(7)(8)は、
圧縮後のＭＰＥＧストリームの径路を示す。Buffer memory 22, FIFO memory 24
Is also bidirectional. I / O processor 25
Controls not only the data transfer on the routes (1) to (4) shown in the first embodiment, but also the transfer on the routes (5) to (8). (1) Stream input / output unit 21 → buffer memory 22 → memory controller 26 → external memory 3 (2) external memory 3 → memory controller 26 → FIFO memory 24 (3) external memory 3 → memory controller 26 → buffer memory 22
→ Video output unit 12 (4) External memory 3 → Memory controller 26 → Buffer memory 22
→ Audio output unit 13 (5) External memory 3 → Memory controller 26 → Internal memory 28 (6) External memory 3 → Memory controller 26 → Pixel read / write unit 31 (7) FIFO memory 24 → Memory controller 26 → External memory 3 (8 ) External memory 3 → memory controller 26 → buffer memory 22
→ The stream of the stream input / output unit 21 (5) (6) is the path of the original data when encoding processing of video data and audio data is performed, and (7) (8)
4 shows the path of an MPEG stream after compression.

【００７９】まず、エンコード処理について説明する。
エンコードすべきデータは外部メモリ３に格納されてい
るものとする。外部メモリ３の映像データは、メモリコ
ントローラ２６を画素読み書き部３１が制御することに
より画素読み書き部３１に転送される。画素読み書き部
３１は映像データを第２のバッファ２０１に書き込む処
理と差分画像生成処理を行なう。差分画像生成処理は、
ブロック単位の動き検出（動きベクトルの算出）と差分
画像の生成とからなる。そのため、画素読み書き部３１
は、符号化対象ブロックと類似する矩形領域と参照フレ
ーム内で探索することにより動きベクトルを検出する動
き検出回路を内部に有している。なお動き検出回路の代
わりに、隣接するフレームの既に計算済みのブロックの
動きベクトルを利用して符号化対象の動きベクトルを見
積もる動き見積回路を備えるようにしてもよい。First, the encoding process will be described.
It is assumed that data to be encoded is stored in the external memory 3. The video data in the external memory 3 is transferred to the pixel read / write unit 31 by controlling the memory controller 26 by the pixel read / write unit 31. The pixel read / write unit 31 performs a process of writing video data to the second buffer 201 and a difference image generation process. The difference image generation processing includes:
It consists of motion detection (calculation of a motion vector) for each block and generation of a difference image. Therefore, the pixel read / write unit 31
Has a motion detection circuit for detecting a motion vector by searching in a rectangular area similar to the encoding target block and a reference frame. Instead of the motion detection circuit, a motion estimation circuit for estimating a motion vector to be encoded using a motion vector of an already calculated block of an adjacent frame may be provided.

【００８０】画素演算部２５は、ブロック単位に差分画
像データを受け取り、ＤＣＴ、ＩＤＣＴ、量子化処理
（以降、Ｑ処理）、ＩＱを行なう。こうして量子化され
た映像データはバッファ２００に格納される。コード変
換部２９は、バッファ２００から量子化データを受け取
り可変長符号処理（ＶＬＣ）を行なう。可変長符号化さ
れたデータは先入れ先出しメモリ２４に格納され、メモ
リコントローラ２６を通して外部メモリ３に格納される
とともに、プロセッサ２７によりマクロブロック毎にヘ
ッダ情報が付加される。The pixel operation unit 25 receives the difference image data in block units, and performs DCT, IDCT, quantization processing (hereinafter, Q processing), and IQ. The video data thus quantized is stored in the buffer 200. The code conversion unit 29 receives the quantized data from the buffer 200 and performs variable length code processing (VLC). The variable-length coded data is stored in the first-in first-out memory 24, is stored in the external memory 3 through the memory controller 26, and the processor 27 adds header information for each macroblock.

【００８１】また、外部メモリ３の映像データは、メモ
リコントローラ２６を介して内部メモリ２８に転送され
る。プロセッサ２７は、マクロブロック毎にヘッダ情報
を付加する処理と時分割で、内部メモリ２８の音声デー
タの圧縮処理を行う。以上のように、エンコード処理
は、第１の実施形態と逆の径路で処理されることにな
る。The video data in the external memory 3 is transferred to the internal memory 28 via the memory controller 26. The processor 27 compresses the audio data in the internal memory 28 by a process of adding header information for each macroblock and a time division. As described above, the encoding process is performed in a path reverse to that of the first embodiment.

【００８２】次に、グラフィックス処理について説明す
る。グラフィックス処理は、ポリゴンと呼ばれる矩形型
図形の組合せによって行なわれる三次元画像生成処理で
ある。本装置においてはポリゴンの頂点座標における画
素データからポリゴン内部の画素データを生成する処理
を行う。最初にポリゴンの頂点データは外部メモリ３に
格納されている。Next, the graphics processing will be described. The graphics processing is a three-dimensional image generation processing performed by a combination of rectangular figures called polygons. In this apparatus, processing for generating pixel data inside the polygon from pixel data at the vertex coordinates of the polygon is performed. First, the vertex data of the polygon is stored in the external memory 3.

【００８３】頂点データは、プロセッサ２７がメモリコ
ントローラ２６を制御することにより内部メモリ２８に
格納される。プロセッサ２７は内部メモリ２８より頂点
データを読みだしＤＤＡ(Digital Difference Analyze)
の前処理を行ないＦＩＦＯメモリ２４に書き込む。コー
ド変換部２９は、画素演算部３０の指示に従ってＦＩＦ
Ｏメモリ２４から頂点データを読みだし画素演算部３０
に転送する。The vertex data is stored in the internal memory 28 by the processor 27 controlling the memory controller 26. The processor 27 reads out the vertex data from the internal memory 28 and performs DDA (Digital Difference Analyze).
Is performed, and the data is written in the FIFO memory 24. The code conversion unit 29 performs a FIFO
The vertex data is read from the O memory 24 and the pixel operation unit 30
Transfer to

【００８４】画素演算部３０は、DDA処理を行ない画素
読み書き部３１に送信する。画素読み書き部３１は、プ
ロセッサ２７の指示に従い、Zバッファ処理あるいはα
ブレンディング処理を行ないメモリコントローラ２６を
介して外部メモリ３に画像データを書き出す。＜2.1.1 画素演算部＞図１７は、画素演算部３０の構
成を示すブロック図である。The pixel operation section 30 performs DDA processing and transmits the result to the pixel read / write section 31. The pixel read / write unit 31 performs Z buffer processing or α
The image data is written to the external memory 3 via the memory controller 26 by performing the blending process. <2.1.1 Pixel Operation Unit> FIG. 17 is a block diagram showing a configuration of the pixel operation unit 30.

【００８５】同図は、図７に示した画素演算部１０と同
じ構成要素には同じ番号を付与し、説明を省略し、以下
異なる点を中心に説明する。異なる点は、同図のように
画素演算部３０は、図７に示した画素演算部１０に対し
て実行部が３面（５０１ａ〜５０１ｃ）になっている点
と、命令ポインタ保持部３０８と命令レジスタ３０９と
分配部３１０とが追加された点とである。In the figure, the same components as those of the pixel operation unit 10 shown in FIG. 7 are denoted by the same reference numerals, description thereof will be omitted, and the following description will focus on differences. The difference is that, as shown in the figure, the pixel operation unit 30 is different from the pixel operation unit 10 shown in FIG. That is, an instruction register 309 and a distribution unit 310 are added.

【００８６】実行部５０１ａ〜５０１ｃが３面になって
いるのは、演算性能を向上させるためである。具体的に
は、グラフィックス処理においてはカラー画像ＲＧＢを
独立に並列演算する。ＩＱおよびＱ処理では、乗算器５
０２を３つ用いて高速化を図っている。ＩＤＣＴにおい
ては乗算器５０２および加減算器５０３を複数用いるこ
とによって時間短縮を図っている。ＩＤＣＴにおいては
バタフライ演算と呼ばれる演算が存在し、これは演算の
元となる全てのデータ間で依存関係があるので、実行部
５０１ａ〜５０１ｃのユニット間通信を行なうデータ線
１０３を設けている。The reason why the execution units 501a to 501c have three surfaces is to improve the calculation performance. Specifically, in the graphics processing, color images RGB are independently processed in parallel. In the IQ and Q processing, the multiplier 5
02 is used to increase the speed. In the IDCT, time is reduced by using a plurality of multipliers 502 and a plurality of adders / subtracters 503. In the IDCT, there is an operation called a butterfly operation, and since there is a dependency between all the data on which the operation is based, a data line 103 for performing inter-unit communication of the execution units 501a to 501c is provided.

【００８７】第１命令メモリ５０６、第２命令メモリ５
０７は、ＩＤＣＴ、ＩＱに加えてＤＣＴ、Ｑ処理、ＤＤ
Ａ用のマイクロプログラムが格納されている。図１８
に、第１命令メモリ５０６、第２命令メモリ５０７の記
憶内容の一例を示す。図８に比べてＱ処理マイクロプロ
グラムと、ＤＣＴマイクロプログラムと、ＤＤＡマイク
ロプログラムとが追加されている。First instruction memory 506, second instruction memory 5
07 is DCT, Q processing, DD in addition to IDCT and IQ.
A microprogram for A is stored. FIG.
3 shows an example of the storage contents of the first instruction memory 506 and the second instruction memory 507. Compared to FIG. 8, a Q processing microprogram, a DCT microprogram, and a DDA microprogram are added.

【００８８】命令ポインタ保持部３０８ａ〜３０８ｃ
は、実行部５０１ａ〜５０１ｃに対応して設けられ、そ
れぞれ第１プログラムカウンタから入力されるアドレス
を変換して命令レジスタ部３０９に出力する変換テーブ
ルを有する。変換後のアドレスは、命令レジスタ部３０
９のレジスタ番号を意味する。さらに、命令ポインタ保
持部３０８ａ〜３０８ｃは、それぞれ後述するモディフ
ァイフラグを保持し命令実行部５０１ａ〜５０１ｃに出
力する。Instruction pointer holding units 308a to 308c
Are provided corresponding to the execution units 501a to 501c, and have conversion tables for converting addresses input from the first program counter and outputting the converted addresses to the instruction register unit 309. The converted address is stored in the instruction register 30
9 means the register number. Further, the instruction pointer holding units 308a to 308c hold the later-described modify flags and output them to the instruction execution units 501a to 501c.

【００８９】変換テーブルについては命令ポインタ保持
部３０８ａ、３０８ｂ、３０８ｃは、例えば入力アドレ
スが1,2,3,4,5,6,7,8,9,10,11,12である場合に、それぞ
れ次のような変換後アドレスを出力する。命令ポインタ保持部３０８ａ:1,2,3,4,5,6,7,8,9,10,1
1,12 命令ポインタ保持部３０８ｂ:2,1,4,3,6,5,8,7,10,9,1
2,11 命令ポインタ保持部３０８ｃ:4,3,2,1,8,7,6,5,12,11,1
0,9 命令レジスタ部３０９は、図２３に示すように、マイク
ロ命令を保持する複数のレジスタ３つのセレクタと３つ
の出力ポートとからなる。３つのセレクタは、命令ポイ
ンタ部３０８ａ、３０８ｂ、３０８ｃから入力される変
換アドレス（レジスタ番号）に指定されるレジスタのマ
イクロ命令を選択する。３つの出力ポートは、セレクタ
に対応して設けられ、それぞれセレクタに選択されたマ
イクロ命令を分配部３１０を介して実行部５０１ａ〜５
０１ｃに出力する。３つのセレクタ及び出力ポートが設
けられているのは、３つの加減算器５０３（又は３つの
乗算器５０２）に同時に異なるマイクロ命令を供給する
ためである。本実施例では３つの出力ポートは、分配部
３１０を介して３つの加減算器５０３と３つの乗算器５
０２の何れかに選択的に供給するものとする。Regarding the conversion table, the instruction pointer holding units 308a, 308b, 308c, for example, when the input address is 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 The converted address is output as follows. Instruction pointer holding unit 308a: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1
1,12 instruction pointer holding unit 308b: 2,1,4,3,6,5,8,7,10,9,1
2,11 instruction pointer holding unit 308c: 4,3,2,1,8,7,6,5,12,11,1
As shown in FIG. 23, the 0,9 instruction register unit 309 includes a plurality of registers for holding microinstructions, three selectors, and three output ports. The three selectors select the microinstruction of the register specified by the conversion address (register number) input from the instruction pointer units 308a, 308b, 308c. The three output ports are provided corresponding to the selectors, and each of the micro-instructions selected by the selectors is executed by the execution units 501 a to 501
01c. Three selectors and output ports are provided to supply different microinstructions to the three adder / subtracters 503 (or the three multipliers 502) at the same time. In the present embodiment, three output ports are connected to three adder / subtracters 503 and three multipliers
02 is selectively supplied.

【００９０】例えば、命令レジスタ部３０９はレジスタ
Ｒ１〜Ｒ１６（レジスタ番号１〜１６）を備えている。
レジスタＲ１〜Ｒ１６に格納されているマイクロプログ
ラムは、ＤＣＴ及びＩＤＣＴにおいて必要な行列演算処
理を表し、上記の３つのレジスタ番号順のいずれによっ
ても同一処理を行うように格納されている。つまり、上
記３つの実行順をもつマイクロプログラムは、実行順序
が可換な一部のマイクロ命令の順序が入れ換えられてい
る。これは、実行部５０１ａ〜５０１ｃが並列にマイク
ロプログラムを実行するので、実行部５０１ａ〜５０１
ｃ間でレジスタ（図外）アクセスの競合など資源干渉を
回避するためである。また、上記行列演算処理は、８×
８行列の乗算、転置、転送をその内容とする。For example, the instruction register section 309 includes registers R1 to R16 (register numbers 1 to 16).
The microprogram stored in the registers R1 to R16 represents a matrix operation process required in DCT and IDCT, and is stored so as to perform the same process in any of the three register numbers. That is, in the microprogram having the above three execution orders, the order of some microinstructions whose execution order is interchangeable is changed. This is because the execution units 501a to 501c execute the microprogram in parallel, so that the execution units 501a to 501c
This is to avoid resource interference such as contention of register (not shown) access between c. Also, the above matrix operation processing is performed by 8 ×
The contents include multiplication, transposition, and transfer of eight matrices.

【００９１】次に、命令レジスタ部３０９の各レジスタ
に格納されるマイクロ命令はニーモニック形式では、「ｏｐＲｉ，Ｒｊ，ｄｅｓｔ，（モディファイフラ
グ）」と表記される。ただし命令レジスタ部３０９のマイクロ
命令は、「ｏｐとＲｉ，Ｒｊと（モディファイフラ
グ）」の部分だけである。「ｄｅｓｔ」の部分は命令メ
モリ５０６、５０７から指定される。「（モディファイ
フラグ）」の部分命令ポインタ保持部３０８ａ〜３０８
ｃから指定される。Next, the microinstruction stored in each register of the instruction register section 309 is described as "op Ri, Rj, dest, (modify flag)" in the mnemonic format. However, the micro-instruction of the instruction register unit 309 is only “op and Ri, Rj and (modify flag)”. The “dest” part is specified from the instruction memories 506 and 507. Partial instruction pointer holding units 308a to 308 of "(modify flag)"
Specified from c.

【００９２】ここで、”ｏｐ”は乗算命令、加減算命
令、転送命令などを示すオペレーションコード、”Ｒ
ｉ，Ｒｊ”はオペランドである。乗算命令は、３つの実
行部５０１ａ〜ｃ中の各乗算器５０２に実行される命令
であり、加算命令及び転送命令は、３つの実行部５０１
ａ〜ｃ中の各乗算器５０２に実行される命令である。”
ｄｅｓｔ”は演算結果の格納先を示す。この”ｄｅｓ
ｔ”は命令レジスタ部３０９のレジスタではなく、命令
メモリ５０６（乗算命令の場合）又は命令メモリ５０７
（加減算命令や転送命令の場合）から指定される。これ
は、命令レジスタ部３０９のマイクロプログラムを実行
部５０１ａ〜５０１ｃに共通化するためである。もし転
送先をレジスタにより指定すれば実行部５０１ａ〜５０
１ｃそれぞれに個別のマイクロプログラムを用意する必
要があり、マイクロプログラムの容量が数倍に膨らむこ
とになる。Here, “op” is an operation code indicating a multiplication instruction, an addition / subtraction instruction, a transfer instruction, etc., and “R”
i, Rj "are operands. The multiplication instruction is an instruction to be executed by each multiplier 502 in the three execution units 501a to 501c, and the addition instruction and the transfer instruction are three execution units 501
Instructions executed by each of the multipliers 502 in a to c. "
“dest” indicates the storage destination of the operation result.
t ″ is not a register of the instruction register unit 309, but an instruction memory 506 (in the case of a multiplication instruction) or an instruction memory 507.
(In the case of an addition / subtraction instruction or a transfer instruction). This is to make the microprogram of the instruction register unit 309 common to the execution units 501a to 501c. If the transfer destination is specified by a register, execution units 501a to 501a
It is necessary to prepare an individual microprogram for each 1c, and the capacity of the microprogram is increased several times.

【００９３】”モディファイフラグ”は、加減算命令に
おいて、加算であるか減算であるかを示すフラグであ
る。この”モディファイフラグ”は、命令レジスタ部３
０９のレジスタからではなく、命令ポインタ保持部３０
８ａ〜ｃから別途指定される。これは、ＤＣＴ、ＩＤＣ
Ｔでの行列演算に用いられる定数行列中に全要素が”
１”の行（又は列）と全要素が”−１”行（又は列）と
が含まれるので、命令ポインタ３０８ａ〜ｃから”モデ
ィファイフラグ”を指定することにより、命令レジスタ
部３０９の同一マイクロプログラムを共用することを可
能にしている。The "modify flag" is a flag indicating whether addition or subtraction is performed in an addition / subtraction instruction. This “modify flag” is stored in the instruction register 3
09, not from the register 09
8a to 8c. This is DCT, IDC
All elements in the constant matrix used for the matrix operation in T are "
Since the row (or column) of "1" and the row (or column) of which all elements are "-1" are included, by specifying the "modify flag" from the instruction pointers 308a-308c, the same micro It is possible to share programs.

【００９４】分配部３１０は、命令レジスタ部３０９か
ら入力される３つのマイクロ命令が加減算命令である場
合には、それらの「ｏｐとＲｉ，Ｒｊ」の部分と、命令
メモリ５０６から入力される「ｄｅｓｔ」の部分と、命
令ポインタ部３０８ａ〜ｃから入力される「モディファ
イフラグ」とを３つの加減算器５０３に分配し、同時に
命令メモリ５０６のマイクロ命令を３つの乗算器５０２
に分配する。また、分配部３１０は、命令レジスタ部３
０９から入力される３つのマイクロ命令が乗算命令であ
る場合には、それらの「ｏｐとＲｉ，Ｒｊ」の部分とを
命令メモリ５０６から入力される「ｄｅｓｔ」の部分と
を３つの乗算器５０２に分配し、、命令メモリ５０７の
マイクロ命令を３つの加減算器５０３に分配する。言い
換えれば、分配部３１０により、３つの加減算器５０３
に供給されるマイクロ命令は、３つの加減算器５０３に
共通する命令については命令メモリ５０７から１つのマ
イクロ命令がそれぞれに供給され、３つの加減算器５０
３で異なる加減算命令については命令レジスタ部３０９
からの３つのマイクロ命令がそれぞれに供給される。同
様に、３つの乗算器５０２に供給されるマイクロ命令
は、３つの乗算器５０２に共通する命令については命令
メモリ５０６からマイクロ命令が供給され、３つの乗算
器５０２で異なる乗算算命令については命令レジスタ部
３０９からのマイクロ命令がそれぞれに供給される。When the three micro-instructions input from the instruction register unit 309 are addition / subtraction instructions, the distribution unit 310 outputs the "op and Ri, Rj" part and the "input from the instruction memory 506". The "dest" part and the "modify flag" input from the instruction pointer units 308a to 308c are distributed to the three adders / subtractors 503, and the micro instruction of the instruction memory 506 is simultaneously divided into three multipliers 502.
Distribute to Further, the distribution unit 310 includes the instruction register unit 3
When the three micro-instructions input from 09 are multiplication instructions, those “op and Ri, Rj” portions and the “dest” portion input from the instruction memory 506 are converted into three multipliers 502. , And the microinstructions in the instruction memory 507 are distributed to the three adders / subtracters 503. In other words, the distribution unit 310 controls the three adders / subtractors 503.
Are supplied from the instruction memory 507 to the instructions common to the three adders / subtractors 503, and the three adders / subtracters 50
Instruction register section 309
Are supplied to each of them. Similarly, the microinstruction supplied to the three multipliers 502 is a microinstruction supplied from the instruction memory 506 for an instruction common to the three multipliers 502, and an instruction for a multiplication instruction different in the three multipliers 502. The micro instruction from the register unit 309 is supplied to each.

【００９５】画素演算部３０のこのような構成によれ
ば、命令メモリ５０６、命令メモリ５０７の記憶容量を
削減することができる。もし、画素演算部３０が命令ポ
インタ保持部３０８ａ〜ｃ、命令レジスタ部３０９、分
配部３１０を備えていないと仮定すると、命令メモリ５
０６、命令メモリ５０７はいずれも、３つの実行部５０
１ａ〜ｃに対して異なるマイクロ命令を供給するには、
３つのマイクロ命令を並列に記憶しなければならない。According to such a configuration of the pixel operation unit 30, the storage capacity of the instruction memory 506 and the instruction memory 507 can be reduced. If it is assumed that the pixel operation unit 30 does not include the instruction pointer holding units 308a to 308c, the instruction register unit 309, and the distribution unit 310, the instruction memory 5
06 and the instruction memory 507 are all three execution units 50
To supply different microinstructions for 1a-c,
Three microinstructions must be stored in parallel.

【００９６】図２２に命令ポインタ保持部３０８ａ〜
ｃ、命令レジスタ部３０９、分配部３１０を備えていな
い場合の命令メモリ５０６及び命令メモリ５０７の記憶
内容の一例を示す。同図では、１６ステップのマイクロ
プログラムが記憶され、１つのマイクロ命令は１６ビッ
ト長としている。この場合、命令メモリ５０６と命令メ
モリ５０７は、３つのマイクロ命令を並列に記録するこ
とから、合計１５３６ビット（１６ステップ×１６ビッ
ト×３×２）の記憶容量を必要とする。FIG. 22 shows instruction pointer holding units 308a to 308a.
c, an example of contents stored in the instruction memory 506 and the instruction memory 507 when the instruction register unit 309 and the distribution unit 310 are not provided. In the figure, a 16-step microprogram is stored, and one microinstruction has a 16-bit length. In this case, since the instruction memory 506 and the instruction memory 507 record three micro-instructions in parallel, a total storage capacity of 1536 bits (16 steps × 16 bits × 3 × 2) is required.

【００９７】これに対して、本実施例の画素演算部３０
における、命令ポインタ保持部３０８ａ〜ｃ、命令レジ
スタ部３０９の記憶内容の一例を図２３に示す。同図で
も１６ステップのマイクロプログラムが記憶され、１マ
イクロ命令は１６ビットとしている。同図において、命
令ポインタ保持部３０８ａ〜ｃは、それぞれ１６個のレ
ジスタ番号（４ビット長）を記憶し、命令レジスタ部３
０９は１６個のマイクロ命令を記憶する。この場合、命
令ポインタ保持部３０８ａ〜ｃと命令レジスタ部３０９
との記憶容量は４４８ビット（１６ステップ×（１２＋
１６））でよい。このように画素演算部３０では、マイ
クロプログラムの記憶容量を大幅に削減することができ
る。実際には、「ｄｅｓｔ」「モディファイフラグ」が
別途発行されるようにしているので、その分の記録容量
又は回路が必要である。また、命令メモリ５０６、５０
７はマイクロ命令中の「ｄｅｓｔ」を指定し、また、実
行部５０１ａ〜ｃに共通する乗算命令、加減算命令を発
行するようにしているので、命令メモリ５０６、５０７
を完全に削除することまではしていない。もし、命令レ
ジスタ部３０９に６つの出力ポートを設ければ、命令メ
モリ５０６と命令メモリ５０７とを削除することも可能
になる。On the other hand, the pixel operation unit 30 of this embodiment
FIG. 23 shows an example of the contents stored in the instruction pointer holding units 308a to 308c and the instruction register unit 309 in FIG. Also in this figure, a 16-step microprogram is stored, and one microinstruction has 16 bits. In the figure, the instruction pointer holding units 308a to 308c store 16 register numbers (4-bit length), respectively.
09 stores 16 microinstructions. In this case, the instruction pointer holding units 308a to 308c and the instruction register unit 309
Is 448 bits (16 steps × (12+
16)). As described above, in the pixel operation unit 30, the storage capacity of the microprogram can be significantly reduced. Actually, since “dest” and “modify flag” are separately issued, a recording capacity or a circuit corresponding to that is required. Also, instruction memories 506 and 50
7 designates "dest" in the microinstruction and issues multiplication instructions and addition / subtraction instructions common to the execution units 501a to 501c.
Has not been completely removed. If six output ports are provided in the instruction register unit 309, the instruction memory 506 and the instruction memory 507 can be deleted.

【００９８】なお、図２３では、命令ポインタ保持部３
０８ａ〜３０８ｃは、第１プログラムカウンタの値が０
〜１５の場合に、変換アドレス（レジスタ番号）を出力
しているが、これに限らない。例えば第１プログラムカ
ウンタの値が３２〜４７の場合に変換アドレスを出力す
るようにしてもよい。この場合、第１プログラムカウン
タの値に適切なオフセット値を加える構成とすればよ
い。これにより、第１プログラムカウンタが示す任意の
アドレス列を変換アドレスに変換することができる。In FIG. 23, the instruction pointer holding unit 3
08a to 308c indicate that the value of the first program counter is 0
In the case of １５15, the conversion address (register number) is output, but it is not limited to this. For example, the conversion address may be output when the value of the first program counter is 32 to 47. In this case, the configuration may be such that an appropriate offset value is added to the value of the first program counter. As a result, an arbitrary address string indicated by the first program counter can be converted into a conversion address.

【００９９】以上の構成により、本実施形態では圧縮映
像データと圧縮音声データのデコード処理だけでなく、
映像および音声データのエンコード処理と、ポリゴンデ
ータに基づくグラフィックス処理とが可能となってい
る。また、複数の実行部の並列動作により処理効率が向
上している。しかも、命令レジスタ部３０８ａ〜３０８
ｃにおいて一部のマイクロ命令の順序を入れ換えたこと
により、複数の実行部間の資源干渉を回避することがで
きるので、さらに処理効率を向上させている。With the above configuration, in the present embodiment, not only the decoding processing of the compressed video data and the compressed audio data, but also
Encoding processing of video and audio data and graphics processing based on polygon data are possible. Further, the processing efficiency is improved by the parallel operation of the plurality of execution units. Moreover, the instruction register units 308a to 308
By changing the order of some of the micro-instructions in c, it is possible to avoid resource interference between a plurality of execution units, thereby further improving processing efficiency.

【０１００】なお、上記実施形態では３つの実行部を有
する構成を示しているのは、ＲＧＢカラーのそれぞれを
独立に演算できる点で有利だからである。さらに実行部
の数は、３つ以上あればいくつでもよい。また、上記実
施形態において映像音声処理装置１０００、２０００
は、それぞれ１チップＬＳＩ化することが望ましい。さ
らに外部メモリ３は、チップ外部であるものとして説明
したが、１チップ内に内臓する構成としてもよい。In the above embodiment, the configuration having three execution units is shown because it is advantageous in that each of RGB colors can be independently calculated. Further, the number of execution units may be any number as long as it is three or more. In the above embodiment, the video and audio processing devices 1000 and 2000
Are desirably implemented as one-chip LSIs. Further, the external memory 3 has been described as being external to the chip, but it may be configured to be built in one chip.

【０１０１】また、上記実施形態では外部メモリに対し
てストリーム入出力部１（あるいはストリーム入出力部
２１）が、ＭＰＥＧストリーム（あるいは映像音声デー
タ）を格納していたが、ホストプロセッサが直接外部メ
モリ３に格納するように構成してもよい。さらに、上記
実施形態においてＩＯプロセッサ５は、４命令サイクル
毎にタスク切替えを行っているが、４命令サイクル以外
の複数命令サイクル毎であってもよい。また、タスク切
替えの命令サイクル数は、タスク毎に予め重み付けをし
て異なる命令サイクル数にしておいてもよい。また優先
度・緊急度に応じてタスク毎の命令サイクル数に重み付
けを行ってもよい。In the above embodiment, the stream input / output unit 1 (or the stream input / output unit 21) stores an MPEG stream (or video / audio data) with respect to the external memory. 3 may be stored. Further, in the above embodiment, the IO processor 5 performs the task switching every four instruction cycles, but may perform the task switching every plural instruction cycles other than the four instruction cycles. Also, the number of instruction cycles for task switching may be weighted in advance for each task and set to a different number of instruction cycles. The number of instruction cycles for each task may be weighted according to the priority and the urgency.

【０１０２】[0102]

【発明の効果】本発明の映像音声処理装置は、圧縮音声
データと圧縮映像データとを含むデータストリームを外
部から入力、デコードし、デコードしたデータを出力装
置に出力する映像音声処理装置であって、外部要因によ
り非同期に発生する入出力処理を行う入出力処理手段
と、前記入出力処理と並行して、メモリに格納されたデ
ータストリームのデコードを主とするデコード処理を行
うデコード処理手段とを備え、前記デコード処理手段に
よりデコードされた映像データ、デコードされた音声デ
ータはメモリに格納され、前記入出力処理は、外部から
非同期に入力される前記データストリームを入力し、さ
らにメモリに格納することと、メモリに格納されたデー
タストリームをデコード処理手段に供給することと、外
部の表示装置、音声出力装置それぞれの出力レートに合
わせてメモリから読み出し、それらに出力することとを
入出力処理として行うように構成されている。The video / audio processing apparatus of the present invention is a video / audio processing apparatus for externally inputting and decoding a data stream including compressed audio data and compressed video data, and outputting the decoded data to an output device. Input / output processing means for performing input / output processing that occurs asynchronously due to an external factor; and decoding processing means for performing decoding processing mainly for decoding a data stream stored in a memory in parallel with the input / output processing. Video data and decoded audio data decoded by the decoding processing means are stored in a memory, and the input / output processing is performed by inputting the data stream asynchronously input from the outside and further storing the data stream in a memory Supplying the data stream stored in the memory to the decoding processing means; Read from the memory in accordance with the force device each output rate, is configured to perform and outputting them as input and output processing.

【０１０３】この構成によれば、入出力処理手段とデコ
ード処理手段とがパイプライン的に並列動作することに
加えて、非同期処理とデコード処理とを入出力処理手段
とデコード処理手段とに分担させるので、デコード処理
手段は非同期に発生する処理から解放されてデコード処
理に専従することができる。その結果、本映像音声処理
装置は、ストリームデータ入力、デコード、出力という
一連の処理を効率良く実行するので、ストリームデータ
のフルデコード（フレーム落ちなし）を高速な動作クロ
ックを用いなくても可能にしている。According to this configuration, in addition to the input / output processing means and the decoding processing means operating in parallel in a pipeline manner, the asynchronous processing and the decoding processing are shared between the input / output processing means and the decoding processing means. Therefore, the decoding processing means can be released from the processing that occurs asynchronously and can exclusively use the decoding processing. As a result, the video and audio processing apparatus efficiently executes a series of processing of stream data input, decode, and output, thereby enabling full decoding of stream data (without dropping frames) without using a high-speed operation clock. ing.

【０１０４】また、前記デコード処理手段は、データス
トリームに対して、条件判断を主とする逐次処理であっ
て、圧縮音声データ及び圧縮映像データのヘッダ解析
と、圧縮音声データのデコードとを含む逐次処理を行な
う逐次処理手段と、前記逐次処理と並行して、定型処理
を行う。定型処理は、圧縮映像データのヘッダ解析を除
く圧縮映像データのデコードである定型処理手段とを備
える構成としてもよい。Further, the decoding processing means is a sequential processing mainly for condition determination on the data stream, and includes a header analysis of the compressed audio data and the compressed video data and a decoding of the compressed audio data. A sequential processing means for performing the processing, and a fixed form processing in parallel with the sequential processing. The routine processing may include a routine processing means for decoding the compressed video data excluding the header analysis of the compressed video data.

【０１０５】この構成によれば、処理特性の異なる逐次
処理と並列処理に適した定型処理とを１つのユニットに
併存させることを解消することにより、処理効率を大幅
に向上させることができる。特に、定型処理手段の処理
効率を向上させることができる。なぜなら本映像音声処
理装置において、定型処理手段は上記の非同期処理及び
逐次処理から解放されたことから、圧縮映像データのデ
コードに要求される定型的な種々演算のみに専従できる
るからである。その結果、高速な動作クロックを用いな
くても高い処理能力を得ることができる。According to this configuration, it is possible to greatly improve the processing efficiency by eliminating the coexistence of the sequential processing having different processing characteristics and the routine processing suitable for the parallel processing in one unit. In particular, the processing efficiency of the routine processing means can be improved. This is because, in the present video / audio processing apparatus, since the standard processing means is released from the asynchronous processing and the sequential processing described above, it can exclusively perform various standard operations required for decoding the compressed video data. As a result, high processing performance can be obtained without using a high-speed operation clock.

【０１０６】さらに、前記入出力処理手段は、外部から
非同期データストリームを入力する入力手段と、外部の
表示装置にデコードされた映像データを出力する映像出
力手段と、外部の音声出力装置にデコードされた音声デ
ータを出力する音声出力手段と、命令メモリに格納され
た第１から第４のタスクを切替えながら実行するプロセ
ッサとを有し、前記第１タスクは入力部から前記メモリ
にデータストリームを転送するプログラムであり、前記
第２タスクは前記メモリからデコード処理手段にデータ
ストリームを供給するプログラムであり、前記第３タス
クは前記メモリから映像出力部にデコードされた映像デ
ータを出力するプログラムであり、前記第４タスクは前
記メモリから音声出力部にデコードされた音声データを
出力するプログラムであると構成してもよい。Further, the input / output processing means includes an input means for inputting an asynchronous data stream from the outside, a video output means for outputting decoded video data to an external display device, and a video output means for decoding to an external audio output device. And a processor for executing the first to fourth tasks stored in the instruction memory while switching, the first task transferring a data stream from an input unit to the memory. The second task is a program that supplies a data stream from the memory to a decoding processing unit, and the third task is a program that outputs decoded video data from the memory to a video output unit. The fourth task is a program for outputting decoded audio data from the memory to an audio output unit. It may be configured to be the.

【０１０７】ここで、前記プロセッサは、前記第１から
第４タスクに対応する少なくとも４つのプログラムカウ
ンタを有するプログラムカウンタ部と、１つのプログラ
ムカウンタが指す命令アドレスを用いて、各タスクプロ
グラムを記憶する命令メモリから命令を取り出す命令フ
ェッチ部と、命令取出部に取出された命令を実行する命
令実行部と、所定数の命令サイクルが経過する毎に、命
令フェッチ部に対してプログラムカウンタを順次切替え
るように制御するタスク制御部とを有する構成としても
よい。Here, the processor stores each task program using a program counter section having at least four program counters corresponding to the first to fourth tasks and an instruction address indicated by one program counter. An instruction fetch unit that fetches an instruction from the instruction memory; an instruction execution unit that executes the instruction fetched by the instruction fetch unit; And a task control unit for controlling the operation.

【０１０８】この構成によれば、外部装置により定まる
ストリームデータの入力レート及び入力周期、外部表示
装置、外部音声出力装置により定まる映像データ、音声
データそれぞれの出力レート及び出力周期がどのような
範囲であっても、入出力要求に対する応答遅延が極めて
小さいという効果がある。また、本発明の映像音声処理
装置は、圧縮音声データと圧縮映像データとを含むデー
タストリームを入力する入力手段と、データストリーム
に対して、条件判断を主とする逐次処理であって、デー
タストリーム中の所定ブロック単位に付加されたヘッダ
情報の解析と、データストリーム中の圧縮音声データの
復号とを行なう逐次処理手段と、定型演算を主とする定
型処理であって、ヘッダ解析の結果を用いてデータスト
リーム中の圧縮映像データを、前記逐次処理と並行し
て、所定ブロック単位に復号する定型処理手段とを備
え、前記逐次処理手段は前記所定ブロックのヘッダ解析
が終了したとき、定型処理手段に当該所定ブロックのデ
コード開始を指示し、定型処理手段から所定ブロックの
デコード終了通知を受けたとき、次の所定ブロックのヘ
ッダ解析を開始するように構成してもよい。According to this configuration, in what range the input rate and the input cycle of the stream data determined by the external device and the output rate and the output cycle of the video data and the audio data determined by the external display device and the external audio output device are set. Even so, there is an effect that the response delay to the input / output request is extremely small. Further, the video and audio processing apparatus of the present invention includes: an input unit for inputting a data stream including compressed audio data and compressed video data; and a sequential processing mainly on condition determination for the data stream. A sequential processing unit for analyzing header information added to a predetermined block unit in the inside and decoding compressed audio data in the data stream; and a routine process mainly including a routine operation, using a result of the header analysis. Routine processing means for decoding the compressed video data in the data stream in units of predetermined blocks in parallel with the sequential processing, wherein the sequential processing unit performs the routine processing when the header analysis of the predetermined block is completed. To start decoding of the predetermined block, and when receiving a notification of the end of decoding of the predetermined block from the routine processing means, the next predetermined block is notified. It may be configured to start the header analysis.

【０１０９】この構成によれば、逐次処理手段が圧縮映
像データに対しても圧縮音声データに対しても多岐にわ
たる条件判断を必要とするヘッダ解析を担当するととも
に音声圧縮データのデコードも担当する。一方、定型処
理手段は、圧縮映像データのブロックデータに対する、
定型的な大量の演算量を担当する。このような役割分担
により、また逐次処理手段は映像デコードに比較して演
算量が少ない音声デコード全般と、圧縮映像データのヘ
ッダ解析と、定型処理手段の制御とを行う。その制御の
下で、定型処理手段は、専ら定型的な演算を行うので、
無駄のない効率的な処理を実現できる。それゆえ高い周
波数で動作させなくても処理能力を得ることができ、製
造コストを低減させることができる。また、逐次処理手
段は、音声デコード全般と、圧縮映像データのヘッダ解
析と、定型処理手段の制御とを順次行うので、１プロセ
ッサにて構成できる。According to this configuration, the sequential processing means is in charge of header analysis which requires a wide variety of condition judgments for compressed video data and compressed audio data, and is also responsible for decoding of audio compressed data. On the other hand, the routine processing means performs processing on the block data of the compressed video data.
Responsible for a large amount of routine calculations. Due to such role assignment, the sequential processing means performs overall audio decoding, which requires less computation than video decoding, header analysis of compressed video data, and control of routine processing means. Under the control, the routine processing unit performs a routine operation exclusively,
Lean and efficient processing can be realized. Therefore, the processing capability can be obtained without operating at a high frequency, and the manufacturing cost can be reduced. In addition, the sequential processing means sequentially performs overall audio decoding, header analysis of the compressed video data, and control of the standard processing means, so that it can be constituted by one processor.

【０１１０】また、前記定型処理手段は、逐次処理手段
の指示に従ってデータストリーム中の圧縮映像データを
可変長復号するデータ変換手段と、可変長復号により得
られた映像ブロックに対して、所定の演算を施すことに
より逆量子化および逆離散余弦変換を行う演算手段と、
逆離散余弦変換後の映像ブロックと復号済みのブロック
を合成することにより動き補償処理を行って映像データ
を復元する合成手段とを有し、前記逐次処理手段は、デ
ータ変換手段により可変長復号されたヘッダ情報を取得
する取得手段と、取得されたヘッダ情報を解析する解析
手段と、解析結果として得られるパラメータを定型処理
手段に通知する通知手段と、入力手段により入力された
データストリーム中の圧縮音声データを復号する音声復
号手段と、前記定型処理手段から所定ブロックのデコー
ド完了を通知する割込み信号を受けたとき、音声復号手
段の動作を停止するとともに取得手段を起動し、前記通
知手段が前記通知をしたとき、前記データ変換手段に映
像ブロックの可変長復号の開始を指示する制御手段とを
有するように構成してもよい。Further, the routine processing means includes a data conversion means for performing variable length decoding of the compressed video data in the data stream in accordance with the instruction of the sequential processing means, and a predetermined operation for the video block obtained by the variable length decoding. Computing means for performing inverse quantization and inverse discrete cosine transform by applying
Synthesizing means for performing motion compensation processing by synthesizing the video block after inverse discrete cosine transform and the decoded block to restore video data, wherein the sequential processing means is subjected to variable-length decoding by data conversion means. Obtaining means for obtaining the obtained header information, analyzing means for analyzing the obtained header information, notifying means for notifying the routine processing means of a parameter obtained as an analysis result, and compression in the data stream inputted by the input means. The audio decoding means for decoding audio data, and upon receiving an interrupt signal notifying the completion of decoding of the predetermined block from the routine processing means, stops the operation of the audio decoding means and activates the acquisition means, and the notifying means Control means for instructing the data conversion means to start variable-length decoding of a video block when notified. It may be.

【０１１１】この構成によれば、マクロブロックなど所
定ブロック単位に逐次処理手段は、ヘッダ解析を行った
後音声デコードを行い、定型処理手段により所定ブロッ
クのデコードが完了したとき次のブロックのヘッダ解析
を開始する。このように逐次処理手段は時分割でヘッダ
解析と音声デコードとを繰り返すので１個のプロセッサ
にて低コストで実現することができる。また、定型処理
手段は多岐にわたる条件判断処理をする必要がないの
で、低コストで専用ハードウェア（或はハードウェアと
ファームウェア）化することができる。According to this configuration, the sequential processing means decodes the audio after performing header analysis for each predetermined block such as a macro block, and when the decoding of the predetermined block is completed by the standard processing means, the header analysis of the next block is performed. To start. Thus, the sequential processing means repeats the header analysis and the audio decoding in a time-division manner, so that it can be realized at low cost with one processor. Further, since the routine processing means does not need to perform a wide variety of condition determination processing, it can be made into dedicated hardware (or hardware and firmware) at low cost.

【０１１２】ここで、前記演算手段は、さらに１ブロッ
クに相当する記憶領域を有する第１バッファを有し、前
記データ変換手段は、データストリーム中の圧縮映像デ
ータを可変長復号する可変長復号手段と、第１バッファ
の記憶領域のアドレスをジグザグスキャン順に並べた第
１アドレス列を記憶する第１アドレステーブル手段と、
第１バッファの記憶領域のアドレスをオルタネートスキ
ャン順に並べた第２アドレス列を記憶する第２アドレス
テーブル手段と、第１アドレス列と第２アドレス列の一
方に従って、可変長復号手段の可変長復号により得られ
るブロックデータを第１バッファに書き込む書き込み手
段とを有する構成としてもよい。Here, the arithmetic means has a first buffer having a storage area corresponding to one block, and the data conversion means has a variable length decoding means for performing variable length decoding of the compressed video data in the data stream. First address table means for storing a first address sequence in which the addresses of the storage area of the first buffer are arranged in zigzag scan order;
A second address table for storing a second address sequence in which the addresses of the storage area of the first buffer are arranged in an alternate scan order; It may be configured to have a writing unit for writing the obtained block data to the first buffer.

【０１１３】この構成によれば、書込み手段は、ジグザ
グスキャンとオルタネートスキャンのどちらにも対応し
て、第１バッファの記憶領域にブロックデータを書き込
むことができる。従って演算手段は、第１バッファの記
憶領域からブロックデータ読み出すときに、読み出しア
ドレスの順番を変更しなくてもよく、スキャンタイプに
拘らず常に同じに読み出しアドレスの順番にて読み出す
ことができる。According to this configuration, the writing means can write the block data in the storage area of the first buffer in accordance with both the zigzag scan and the alternate scan. Therefore, the arithmetic means does not need to change the order of the read addresses when reading the block data from the storage area of the first buffer, and can always read in the same order of the read addresses regardless of the scan type.

【０１１４】さらに、前記解析手段は、ヘッダ情報に基
づいて量子化スケールと動きベクトルとを算出し、前記
通知手段は、量子化スケールを演算手段に、動きベクト
ルを合成手段に通知するように構成してもよい。この構
成によれば、動きベクトルの算出を逐次処理手段に担当
させることができ、合成手段は算出された動きベクトル
を用いて定型的に動き補償処理を行うことができる。。Further, the analyzing means calculates the quantization scale and the motion vector based on the header information, and the notifying means notifies the calculating means of the quantization scale and the motion vector to the synthesizing means. May be. According to this configuration, the calculation of the motion vector can be assigned to the sequential processing unit, and the combining unit can routinely perform the motion compensation process using the calculated motion vector. .

【０１１５】また、前記演算手段は、それぞれマイクロ
プログラムを記憶する第１、第２の制御記憶部と、第１
制御記憶部に第１読出アドレスを指定する第１プログラ
ムカウンタと、第２読出アドレスを指定する第２プログ
ラムカウンタと、第１読出アドレスと第２読出アドレス
との一方を選択して第２制御記憶部に出力するセレクタ
と、乗算器と加算器とを有し、第１、第２制御記憶部に
よるマイクロプログラム制御によりブロック単位の逆量
子化と逆離散余弦変換とを実行する実行部とを有する構
成としてもよい。The arithmetic means includes first and second control storage units for storing microprograms, respectively.
A first program counter for designating a first read address, a second program counter for designating a second read address, and a second control storage by selecting one of the first and second read addresses in the control storage unit; A selector for outputting to the unit, a multiplier and an adder, and an execution unit for performing inverse quantization and inverse discrete cosine transform in block units by microprogram control by the first and second control storage units. It may be configured.

【０１１６】この構成によれば、マイクロプログラム
（ファームウェア）は多岐にわたる条件判断処理を行う
必要がなく、定型的な処理を実現するだけなのでプログ
ラムサイズが小さくかつ作成が容易であり、低コスト化
に適している。しかも、２つのプログラムカウンタを使
用して乗算器と加算器とを独立して並列に動作させるこ
とができる。According to this configuration, the microprogram (firmware) does not need to perform a wide variety of condition determination processing, and only realizes routine processing. Therefore, the program size is small, the program is easy to create, and cost reduction is achieved. Are suitable. Moreover, the multiplier and the adder can be operated independently and in parallel using two program counters.

【０１１７】さらに、前記実行部は、セレクタにより第
２読出アドレスが選択されたとき、乗算器を用いた処理
と加算器を用いた処理とを独立並行して行い、セレクタ
により第１読出アドレスが選択されたとき、乗算器を用
いた処理と加算器を用いた処理とを連動させて行うよう
構成してもよい。この構成によれば、乗算器及び加算器
の遊び時間を減らして処理効率を向上させることができ
る。Further, when the second read address is selected by the selector, the execution section performs the processing using the multiplier and the processing using the adder independently and in parallel, and the first read address is determined by the selector. When selected, the processing using the multiplier and the processing using the adder may be performed in conjunction with each other. According to this configuration, the idle time of the multiplier and the adder can be reduced, and the processing efficiency can be improved.

【０１１８】ここで、前記演算手段は、さらに、データ
変換手段からの映像ブロックを保持する第１バッファ
と、実行部により逆離散余弦変換されたブロックを保持
する第２バッファとを有し、前記第１制御記憶部は、逆
量子化処理するマイクロプログラムと、逆離散余弦変換
するマイクロプログラムとを記憶し、前記第２制御記憶
部は、逆離散余弦変換するマイクロプログラムと、逆離
散余弦変換された映像ブロックを第２バッファに転送す
るマイクロプログラムとを記憶し、前記実行手段は、逆
離散余弦変換された映像ブロックを第２バッファに転送
する処理と、次の映像ブロックを逆量子化する処理とを
並列に実行し、逆量子化された当該映像ブロックを逆離
散余弦変換する処理を乗算器と加算器とを連動させて実
行するように構成してもよい。Here, the calculating means further includes a first buffer for holding the video block from the data converting means, and a second buffer for holding the block subjected to the inverse discrete cosine transform by the execution unit. The first control storage unit stores a microprogram for performing an inverse quantization process and a microprogram for performing an inverse discrete cosine transform, and the second control storage unit includes a microprogram for performing an inverse discrete cosine transform and an inverse discrete cosine transform. And a microprogram for transferring the video block to the second buffer, wherein the execution means transfers the inverse discrete cosine transformed video block to the second buffer, and performs the inverse quantization of the next video block. Are performed in parallel, and a process of performing an inverse discrete cosine transform of the inversely quantized video block is performed in conjunction with the multiplier and the adder. It may be.

【０１１９】この構成によれば、逆量子化処理と第２バ
ッファへの転送処理とを並列実行するので処理効率を向
上させることができる。また、前記入力手段は、さらに
ポリゴンデータを入力し、前記逐次処理手段は、さらに
ポリゴンデータを解析してポリゴンの頂点座標とエッジ
の傾きとを算出し、前記定型処理手段は、さらに算出さ
れた頂点座標と傾きと従って、前記ポリゴンの画像デー
タを生成するように構成してもよい。According to this configuration, since the inverse quantization process and the transfer process to the second buffer are executed in parallel, the processing efficiency can be improved. Further, the input means further inputs polygon data, the sequential processing means further analyzes the polygon data and calculates vertex coordinates and edge inclinations of the polygon, and the routine processing means further calculates the polygon data. The image data of the polygon may be generated in accordance with the vertex coordinates and the inclination.

【０１２０】この構成によれば、逐次処理手段はポリゴ
ンデータの解析を担当し、定型処理手段は定型的な画像
データ生成処理を担当する。本映像音声処理装置は、効
率よくポリゴンデータから画像データを生成するグラフ
ィックス処理を行うことができる。ここで、前記第１、
第２制御記憶部は、さらにＤＤＡアルゴリズムによる走
査変換を行うマイクロブログラムを記憶し、前記実行部
は、さらに逐次処理手段により算出された頂点座標と傾
きとに基づいてマイクロプログラム制御により走査変換
を行うように構成してもよい。According to this configuration, the sequential processing unit is in charge of analyzing polygon data, and the standard processing unit is in charge of standard image data generation processing. The video / audio processing apparatus can perform graphics processing for efficiently generating image data from polygon data. Here, the first,
The second control storage unit further stores a microprogram for performing scan conversion by the DDA algorithm, and the execution unit further performs scan conversion by microprogram control based on the vertex coordinates and the inclination calculated by the sequential processing unit. It may be configured to do so.

【０１２１】この構成によれば、画像データの生成は前
記第１、第２制御記憶部に走査変換マイクロプログラム
により簡単に実現することができる。また、前記合成手
段はさらに圧縮すべき映像データから差分画像を表す差
分ブロックを生成し、前記第２バッファはさらに生成さ
れた差分画像を保持し、第１制御記憶部はさらに離散余
弦変換するマイクロプログラムと量子化処理するマイク
ロプログラムとを記憶し、第２制御記憶部はさらに離散
余弦変換するマイクロプログラムと離散余弦変換された
映像ブロックを第１バッファに転送するマイクロプログ
ラムとを記憶し、前記実行手段はさらに第２バッファに
保持された差分ブロックに対して離散余弦変換と量子化
を実行して第１バッファに転送し、前記データ変換手段
はさらに第１バッファのブロックに対して可変長符号化
を行い、前記逐次処理手段はさらにデータ変換手段によ
り可変長符号化された所定のブロックに対してヘッダ情
報を付加するように構成してもよい。According to this configuration, generation of image data can be easily realized by a scan conversion microprogram in the first and second control storage units. Further, the synthesizing means generates a difference block representing a difference image from the video data to be further compressed, the second buffer holds the further generated difference image, and the first control storage unit further performs a discrete cosine transform. The second control storage unit further stores a microprogram for performing discrete cosine transform and a microprogram for transferring the video block subjected to discrete cosine transform to the first buffer. The means further performs discrete cosine transform and quantization on the difference block held in the second buffer and transfers the result to the first buffer, and the data conversion means further performs variable length coding on the block in the first buffer. And the sequential processing means further performs header information on the predetermined block which has been variable-length coded by the data conversion means. The may be configured to add.

【０１２２】この構成によれば、定型処理手段は定型的
な処理として量子化と離散余弦変換を担当し、逐次処理
手段は条件判断を要する処理（ヘッダ情報の付加）を担
当する。この場合、本映像音声処理装置は、高速クロッ
クを用いなくても画像データから圧縮映像データへのエ
ンコード処理を効率よく実行することができる。また、
前記演算手段は、それぞれマイクロプログラムを記憶す
る第１、第２の制御記憶部と、第１制御記憶部に第１読
出アドレスを指定する第１プログラムカウンタと、第２
読出アドレスを指定する第２プログラムカウンタと、第
１読出アドレスと第２読出アドレスとの一方を選択して
第２制御記憶部に出力するセレクタと、乗算器と加算器
とをそれぞれ有し、第１、第２制御記憶部によるマイク
ロプログラム制御によりブロック単位の逆量子化と逆離
散余弦変換とを実行する複数の実行部とを備え、各実行
部は、ブロックを分割した部分ブロックを分担して処理
するように構成してもよい。According to this configuration, the routine processing unit is responsible for quantization and discrete cosine transformation as routine processing, and the sequential processing unit is responsible for processing that requires a condition determination (addition of header information). In this case, the present video / audio processing apparatus can efficiently execute the encoding process from image data to compressed video data without using a high-speed clock. Also,
The arithmetic means includes first and second control storage units each storing a microprogram; a first program counter for designating a first read address in the first control storage unit;
A second program counter that specifies a read address, a selector that selects one of the first read address and the second read address and outputs the selected read address to the second control storage unit, and a multiplier and an adder, respectively. 1. A plurality of execution units that execute inverse quantization and inverse discrete cosine transform in units of blocks by microprogram control by a second control storage unit, and each execution unit shares partial blocks obtained by dividing blocks It may be configured to perform the processing.

【０１２３】この構成によれば、複数の実行部が並列に
演算命令を実行するので、定型的な大量の演算を画素レ
ベルで並列化して効率よく実行することができる。ま
た、前記演算手段は、さらに、各実行部に対応して設け
られ、各変換テーブルは所定のアドレス列に対応して部
分的にアドレス順序を入れ換えた変換アドレス保持する
複数のアドレス変換テーブルと、所定の演算を実現する
マイクロプログラムを構成する個々のマイクロ命令を変
換アドレスに対応させて記憶する複数レジスタからなる
命令レジスタ群と、第１及び第２制御記憶部と複数の実
行部との間に設けられ、第１制御記憶部又はセレクタか
ら各実行部に出力されるマイクロ命令を、命令レジスタ
のマイクロ命令に切り替えて複数の実行部に出力する切
り替え部とを備え、前記第１読出アドレス又は第２読出
アドレスが前記所定のアドレス列の中のアドレスである
場合、そのアドレスは前記各アドレス変換テーブルによ
って変換アドレスに変換される。前記命令レジスタ群
は、変換テーブルから出力された各変換アドレスに対応
するマイクロ命令を出力するように構成してもよい。According to this configuration, since a plurality of execution units execute operation instructions in parallel, a large number of routine operations can be efficiently executed in parallel at the pixel level. Further, the arithmetic means is further provided corresponding to each execution unit, each conversion table is a plurality of address conversion table holding a conversion address partially changed the address order corresponding to a predetermined address sequence, An instruction register group consisting of a plurality of registers for storing individual microinstructions constituting a microprogram for realizing a predetermined operation in association with a translation address, and a first and second control storage unit and a plurality of execution units; A switching unit that switches a microinstruction output from the first control storage unit or the selector to each execution unit to a microinstruction in an instruction register and outputs the microinstruction to a plurality of execution units. (2) If the read address is an address in the predetermined address string, the address is converted according to each of the address conversion tables. It is converted to. The instruction register group may be configured to output a micro instruction corresponding to each translation address output from the translation table.

【０１２４】この構成によれば、複数の実行部が並列に
マイクロプログラムを実行する間、実行部間でアクセス
の競合など資源干渉を回避して、さらに効率よく処理す
ることができる。ここで、前記各変換テーブルは、さら
に第１プログラムカウンタが前記所定のアドレス列中の
第１読出アドレスを出力する間、前記レジスタ中の加減
算を示すマイクロ命令出力に伴って、加算すべきか減算
すべきかを示すフラグを前記複数の実行部に出力し、前
記各実行部は、前記フラグに従って加減算を実行し、前
記フラグは、前記第２制御記憶部のマイクロ命令に従っ
て設定されるように構成してもよい。According to this configuration, while a plurality of execution units execute microprograms in parallel, resource interference such as access competition between execution units can be avoided, and processing can be performed more efficiently. Here, each of the conversion tables should be added or subtracted in accordance with a microinstruction output indicating addition and subtraction in the register while the first program counter outputs the first read address in the predetermined address string. Outputting a flag indicating a perception to the plurality of execution units, wherein each execution unit performs addition and subtraction according to the flag, and the flag is set according to a microinstruction of the second control storage unit. Is also good.

【０１２５】この構成によれば、マイクロ命令により加
算を行うか減算を行うかを変換テーブルが指定するの
で、同じマイクロプログラムを２通りに共用できるの
で、さらに、マイクロプログラムの全容量を低減させる
ことができ、ハードウェア規模の低減、ひいては低コス
ト化を実現できる。また、前記第２制御記憶部は、さら
に第１プログラムカウンタが前記所定のアドレス列中の
第１読出アドレスを出力する間、前記レジスタ中のマイ
クロ命令出力に伴って、マイクロ命令実行結果の格納先
を示す情報を前記複数の実行部に出力し、前記各実行部
は、格納先情報に従って実行結果を格納するように構成
してもよい。According to this configuration, since the conversion table specifies whether to perform addition or subtraction by a microinstruction, the same microprogram can be shared in two ways, further reducing the total capacity of the microprogram. Thus, the hardware scale can be reduced, and the cost can be reduced. Further, the second control storage unit further stores a micro instruction execution result storage destination along with the micro instruction output in the register while the first program counter outputs the first read address in the predetermined address sequence. May be output to the plurality of execution units, and each execution unit may store the execution result according to the storage location information.

【０１２６】この構成によれば、格納先情報は、命令レ
ジスタ群中のマイクロプログラムと別個に指定できるの
で、当該マイクロプロラムを異なる処理例えば行列演算
の部分的な処理において共用することができる。その結
果、さらに、マイクロプログラムの全容量を低減させる
ことができ、ハードウェア規模の低減、ひいては低コス
ト化を実現できる。According to this configuration, since the storage destination information can be specified separately from the microprogram in the instruction register group, the microprogram can be shared in different processing, for example, partial processing of matrix operation. As a result, the total capacity of the microprogram can be further reduced, and the hardware scale can be reduced, and the cost can be reduced.

[Brief description of the drawings]

【図１】第１の従来技術における映像音声デコーダによ
るデコード処理の説明図を示す。FIG. 1 is an explanatory diagram of a decoding process by a video / audio decoder according to a first conventional technique.

【図２】第２の従来技術における２チップ構成のデコー
ダによるデコード処理の説明図を示す。FIG. 2 is an explanatory diagram of a decoding process by a two-chip decoder according to a second conventional technique.

【図３】本発明の第１の実施形態における画像処理装置
の概略構成を示すブロック図である。FIG. 3 is a block diagram illustrating a schematic configuration of the image processing apparatus according to the first embodiment of the present invention.

【図４】本発明の第１の実施形態における画像処理装置
の構成を示すブロック図である。FIG. 4 is a block diagram illustrating a configuration of an image processing apparatus according to the first embodiment of the present invention.

【図５】ＭＰＥＧストリームを階層的に示すとともに画
像処理装置各部の動作タイミングを示す図である。FIG. 5 is a diagram showing an MPEG stream in a hierarchical manner and showing operation timing of each section of the image processing apparatus.

【図６】プロセッサ７によるマクロブロックヘッダの解
析と、他の各部への制御内容とを示す図である。FIG. 6 is a diagram showing an analysis of a macroblock header by a processor 7 and contents of control to other units.

【図７】画素演算部１０の構成を示すブロック図であ
る。FIG. 7 is a block diagram illustrating a configuration of a pixel operation unit 10.

【図８】第１命令メモリ５０６及び第２命令メモリ５０
７に記憶されたマイクロプログラムの一例を示す。FIG. 8 shows a first instruction memory 506 and a second instruction memory 50;
7 shows an example of the microprogram stored in FIG.

【図９】画素演算部１０の動作タイミングを示す図であ
る。FIG. 9 is a diagram showing operation timings of the pixel operation unit 10;

【図１０】画素読み書き部１１の詳細な構成を示すブロ
ック図である。FIG. 10 is a block diagram showing a detailed configuration of a pixel read / write unit 11;

【図１１】ＩＯプロセッサ５の構成を示すブロック図で
ある。FIG. 11 is a block diagram showing a configuration of an IO processor 5.

【図１２】命令読出回路５３の詳細な構成例を示すブロ
ック図である。FIG. 12 is a block diagram showing a detailed configuration example of an instruction reading circuit 53.

【図１３】ＩＯプロセッサ５の動作タイミングを示すタ
イムチャートである。FIG. 13 is a time chart showing the operation timing of the IO processor 5;

【図１４】タスク管理部の構成を示すブロック図であ
る。FIG. 14 is a block diagram illustrating a configuration of a task management unit.

【図１５】ＦＩＦＯメモリ４以降の復号動作を示す説明
図である。FIG. 15 is an explanatory diagram showing a decoding operation after the FIFO memory 4;

【図１６】本発明の第２の実施形態のおける画像処理装
置の構成を示すブロック図である。FIG. 16 is a block diagram illustrating a configuration of an image processing apparatus according to a second embodiment of the present invention.

【図１７】画素演算部３０の構成を示すブロック図であ
る。FIG. 17 is a block diagram illustrating a configuration of a pixel operation unit 30.

【図１８】第１命令メモリ５０６、第２命令メモリ５０
７の記憶内容の一例を示す。FIG. 18 shows a first instruction memory 506 and a second instruction memory 50
7 shows an example of the storage content of No. 7.

【図１９】コード変換部９の構成を示すブロック図であ
る。FIG. 19 is a block diagram showing a configuration of a code conversion unit 9;

【図２０】８×８個の空間周波数データを記憶するブロ
ック記憶領域と、ジグザグスキャンの順路を示す。FIG. 20 shows a block storage area for storing 8 × 8 spatial frequency data and a zigzag scan route.

【図２１】８×８個の空間周波数データを記憶するブロ
ック記憶領域と、オルタネートスキャンの順路を示す。FIG. 21 shows a block storage area for storing 8 × 8 spatial frequency data and a route of an alternate scan.

【図２２】命令ポインタ保持部３０８ａ〜ｃ、命令レジ
スタ部３０９、分配部３１０を備えていない場合の命令
メモリ５０６及び命令メモリ５０７の記憶内容の一例を
示す。FIG. 22 shows an example of the contents stored in the instruction memory 506 and the instruction memory 507 when the instruction pointer holding units 308a to 308c, the instruction register unit 309, and the distribution unit 310 are not provided.

【図２３】命令ポインタ保持部３０８ａ〜ｃ、命令レジ
スタ部３０９の記憶内容の一例を示す。FIG. 23 shows an example of the contents stored in the instruction pointer holding units 308a to 308c and the instruction register unit 309.

[Explanation of symbols]

１ストリーム入力部２バッファメモリ３外部メモリ４ＦＩＦＯメモリ５入出力プロセッサ５ａＤＭＡＣ６メモリコントローラ７プロセッサ８内部メモリ９コード変換部１０画素演算部１２ビデオ出力部１３音声出力部１４ホストＩ／Ｆ部１０００映像音声処理装置１００１入出力処理部１００２デコード処理部１００３逐次処理部１００４定型処理部 Reference Signs List 1 stream input unit 2 buffer memory 3 external memory 4 FIFO memory 5 input / output processor 5a DMAC 6 memory controller 7 processor 8 internal memory 9 code conversion unit 10 pixel operation unit 12 video output unit 13 audio output unit 14 host I / F unit 1000 Video / audio processing device 1001 Input / output processing unit 1002 Decoding processing unit 1003 Sequential processing unit 1004 Standard processing unit

フロントページの続き (51)Int.Cl.⁷ 識別記号ＦＩテーマコート゛(参考）Ｈ０３Ｍ 7/30 Ｈ０４Ｎ 7/133 Ｚ５Ｊ０６４Ｈ０４Ｎ 7/30 Ｇ１０Ｌ 9/18 Ｍ (72)発明者清原督三大阪府門真市大字門真1006番地松下電器産業株式会社内 (72)発明者木村浩三大阪府門真市大字門真1006番地松下電器産業株式会社内Ｆターム(参考） 5B013 DD02 DD04 5B057 CA12 CA16 CG02 CH02 CH08 5B105 AA10 AC07 FD01 FD23 FD25 GA16 5C059 KK10 KK15 LA06 MA23 MB18 MC11 SS13 SS26 UA02 UA05 UA36 UA37 UA38 UA39 5D045 DA20 5J064 AA01 BA16 BC01 BC02 BC04 BC09 BC25 BD03 Continued on the front page (51) Int.Cl. ⁷ Identification symbol FI Theme coat II (Reference) H03M 7/30 H04N 7/133 Z 5J064 H04N 7/30 G10L 9/18 M (72) Inventor Shuzo Kiyohara Kadoma, Osaka 1006 Kadoma, Ichidai-ji Matsushita Electric Industrial Co., Ltd. FD23 FD25 GA16 5C059 KK10 KK15 LA06 MA23 MB18 MC11 SS13 SS26 UA02 UA05 UA36 UA37 UA38 UA39 5D045 DA20 5J064 AA01 BA16 BC01 BC02 BC04 BC09 BC25 BD03

Claims

[Claims]

1. A first and a second control storage unit each storing a microprogram, a first program counter for designating a first read address in the first control storage unit, and a second program counter for designating a second read address. A program counter, a selector for selecting one of the first read address and the second read address and outputting the selected read address to the second control storage unit, and an execution unit for executing an operation under microprogram control by the first and second control storage units An arithmetic device comprising:

2. The arithmetic unit according to claim 1, wherein the execution unit has a first arithmetic unit and a second arithmetic unit, and when the second read address is selected by the selector, the first arithmetic unit and the second arithmetic unit. An arithmetic device, wherein arithmetic operations are performed in parallel in two arithmetic units, and when the first read address is selected by the selector, the first arithmetic unit and the second arithmetic unit are operated in conjunction with each other.

3. The arithmetic unit according to claim 1, wherein the execution unit has a multiplier and an adder, and performs inverse quantization in block units by microprogram control by first and second control storage units. An arithmetic unit for performing an inverse discrete cosine transform.

4. The arithmetic unit according to claim 3, wherein the execution unit independently performs a process using the multiplier and a process using the adder when the second read address is selected by the selector. A processing using a multiplier and a processing using an adder are performed in conjunction with each other when the first read address is selected by the selector.

5. The arithmetic unit according to claim 4, further comprising: a first buffer for holding a video block from the data conversion means; and a second buffer for holding a block subjected to inverse discrete cosine transform by an execution unit. Wherein the first control storage unit stores a microprogram for performing inverse quantization and a microprogram for performing inverse discrete cosine transform, and the second control storage unit includes a microprogram for performing inverse discrete cosine transform. The inverse discrete cosine transformed video block is
A microprogram to be transferred to a buffer, wherein the execution means executes, in parallel, a process of transferring the inverse discrete cosine transformed video block to the second buffer and a process of inversely quantizing the next video block. An arithmetic unit for performing a process of performing an inverse discrete cosine transform on the inversely quantized video block in conjunction with a multiplier and an adder.

6. The arithmetic device according to claim 5, wherein the synthesizing unit further generates a difference block representing a difference image from the video data to be compressed, and wherein the second buffer further generates the difference block. The first control storage unit further stores a microprogram for performing discrete cosine conversion and a microprogram for performing quantization processing, and the second control storage unit further includes a microprogram for performing discrete cosine conversion. The discrete cosine transformed video block is
A microprogram to be transferred to a buffer, wherein the execution means further executes discrete cosine transform and quantization on the difference block held in the second buffer and transfers the result to the first buffer; The means further performs variable length coding on the block of the first buffer, and the sequential processing means further adds header information to the predetermined block which has been subjected to the variable length coding by the data converting means. An arithmetic unit characterized by the following.

7. The arithmetic device according to claim 3, further comprising: generating image data of the polygon according to the vertex coordinates of the polygon and the inclination of the edge.

8. The arithmetic device according to claim 7, wherein the first and second control storage units further store a microprogram for performing scan conversion by a DDA algorithm, and the execution unit stores the vertex coordinates. An arithmetic unit for performing scan conversion by microprogram control based on the inclination and the inclination.

9. A first and second control storage unit for respectively storing a microprogram, a first program counter for designating a first read address in the first control storage unit, and a second program for designating a second read address. A first counter for selecting one of the first read address and the second read address and outputting the selected address to the second control storage unit; a multiplier and an adder; And a plurality of execution units for performing inverse quantization and inverse discrete cosine transform in block units by microprogram control according to the above. Each execution unit shares and processes partial blocks obtained by dividing blocks. Arithmetic unit.

10. The arithmetic unit according to claim 9, wherein said arithmetic means is further provided corresponding to each execution unit, and each conversion table partially corresponds to a predetermined address sequence in an address order. A plurality of address conversion tables for storing a conversion address in which a plurality of addresses are exchanged; an instruction register group including a plurality of registers for storing individual microinstructions constituting a microprogram for realizing a predetermined operation in correspondence with the conversion address; A micro instruction provided between the second control storage unit and the plurality of execution units and output from the first control storage unit or the selector to each execution unit is switched to a micro instruction in an instruction register and output to the plurality of execution units. When the first read address or the second read address is an address in the predetermined address sequence, the address is Serial is converted into translated address by the address conversion table. The arithmetic unit according to claim 1, wherein the instruction register group outputs microinstructions corresponding to each conversion address output from a conversion table.

11. The arithmetic unit according to claim 10, wherein each of the conversion tables further includes an addition / subtraction operation in the register while the first program counter outputs a first read address in the predetermined address string. A flag indicating whether to add or subtract is output to the plurality of execution units in accordance with the output of the microinstruction indicating that each execution unit performs addition / subtraction according to the flag. An arithmetic unit, which is set according to a micro instruction in a storage unit.

12. The arithmetic unit according to claim 10, wherein the second control storage unit further stores the first program counter in the register while the first program counter outputs the first read address in the predetermined address string. With the micro instruction output of
An arithmetic unit which outputs information indicating a storage location of a microinstruction execution result to the plurality of execution units, wherein each execution unit stores the execution result according to the storage destination information.