JP3380236B2

JP3380236B2 - Video and audio processing device

Info

Publication number: JP3380236B2
Application number: JP2001182648A
Authority: JP
Inventors: 康介 ▲よし▼岡; 誠平井; 督三清原; 浩三木村
Original assignee: Panasonic Corp; Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Corp; Panasonic Holdings Corp
Priority date: 1997-04-07
Filing date: 2001-06-15
Publication date: 2003-02-24
Anticipated expiration: 2018-04-06
Also published as: JP2002094988A

Description

【発明の詳細な説明】【０００１】【発明の属する技術分野】本発明は、デジタル信号処理
の技術分野に属するものであって、圧縮された映像及び
音声データの伸長、映像及び音声データの圧縮、グラフ
ィックス処理などを行う画像処理装置に関する。【０００２】【従来の技術】近年、ディジタル動画データの圧縮／伸
長技術が確立されてきたことや、ＬＳＩ技術が向上して
きたこととがあいまって、圧縮映像及び音声データを伸
長するデコーダ、映像及び音声データを圧縮するエンコ
ーダ、グラフィックス処理を行うグラフィックス処理な
どの種々の映像音声処理装置が重要視されている。【０００３】第１の従来技術として、ＭＰＥＧ（Moving
Picture Experts Group）規格の圧縮映像及び音声デー
タを伸長する映像音声デコーダ（特開平８−１１１６４
２９）がある。この映像音声デコーダは、１つの信号処
理ユニットを用いて映像デコードと音声デコードの両方
を行う。図１に、この映像音声デコーダによるデコード
処理の説明図を示す。同図の縦軸は時間を、横軸は演算
量を表している。【０００４】縦軸に沿って大きく見ると、映像デコード
と音声デコードとが交互に処理される。これは、共通の
ハードウェアで映像、音声の両者をデコードするためで
ある。同図のように映像デコードは、逐次処理とブロッ
ク処理とに分けられる。逐次処理は、ブロック以外のデ
コード、つまりＭＰＥＧストリームのヘッダ解析など多
岐にわたる条件判断を必要とする処理であり、その演算
量は少ない。ブロックデコードは、ＭＰＥＧストリーム
の可変長符号を復号しさらにブロック単位に逆量子化、
逆ＤＣＴ（離散余弦変換）を行う処理であり、その演算
量は大きい。同図のように音声デコードも、多岐にわた
る条件判断を必要とする上記と同様の逐次処理と、音声
データ本体のデコード処理とに分けられる。音声データ
本体のデコード処理は、画像データよりも高い精度が要
求され、かつ限られた時間内に処理しなければならない
ので、精度よく高速に処理する必要があり、その演算量
は大きい。【０００５】このように、第１の従来技術は、１チップ
化を可能にし、１チップという少ないハードウェアで効
率的な音声映像デコードを実現している。第２の従来技
術として、２チップ構成のデコーダがある。１チップは
映像デコーダ、他の１チップは音声デコーダとして用い
られる。図２に２チップ構成のデコーダによるデコード
処理の説明図を示す。映像デコーダ、音声デコーダとも
にヘッダ解析等の条件判断を多数含む逐次処理と、デー
タ本体のデコードを主とするブロックデコード処理とを
行う。映像デコーダ、音声デコーダともに、独立に処理
するので第１の従来技術と比べて個々のチップの能力は
低くてよい。【０００６】【発明が解決しようとする課題】しかしながら上記従来
技術によれば、次のような問題があった。第１の従来技
術によれば、信号処理ユニットが映像も音声もデコード
しなねればならないので、高い処理能力が要求される。
つまり１００ＭＨｚ以上の高速クロックを用いて動作さ
せる必要があり、民生用の半導体としてはコストが高い
という問題がある。また、高速クロックを用いずに処理
能力を高めるために、ＶＬＩＷ(Very Long Instruction
Word)プロセッサなどを用いることも考えられなくはな
いが、ＶＬＩＷプロセッサそのもののコストが高いうえ
に、別途逐次処理を行うプロセッサを用いなければ全体
の処理としては非効率になるという問題がある。【０００７】第２の従来技術によれば、２つのプロセッ
サを用いるのでコストが高いという問題があった。つま
り、映像用プロセッサも音声用プロセッサも、処理能力
の低い汎用の安価なプロセッサをそのまま使用すること
はできない。なぜなら映像用のプロセッサは、大量の画
像データをリアルタイムに処理する能力が要求されるか
らである。また音声用のプロセッサは、映像用プロセッ
サほど多くの演算量を要求されないけれども、音声デー
タの方が画像データよりも高い精度を要求されるからで
ある。それゆえ、安価なあるいは処理能力の低いプロセ
ッサでは、映像用としても音声用としても、要求される
処理能力を満たさない。【０００８】さらに、ディジタル（衛星）放送用チュー
ナー（ＳＴＢ（Set Top Box）と呼ばれる）やＤＶＤ（D
igital Versatile/Video Disc）再生装置などに用いら
れるＡＶデコーダ中に上記映像音声処理装置が用いられ
る場合には、放送波から受信されたあるいはディスクか
ら読み出されたＭＰＥＧストリームを入力し、そのＭＰ
ＥＧストリームをデコードし、最終的にディスプレイ、
スピーカなどへ映像信号出力及び音声信号出力をするま
でに必要とされる一連の処理量は膨大なものとなる。最
近では、このような一連の膨大な処理を効率良く実行す
る映像音声処理装置に対する要求が高まっている。【０００９】本発明は、圧縮画像及び圧縮音声データを
表すストリームデータの入力、デコード、出力という一
連の処理を行い、高い周波数で動作させなくても高い処
理能力を有し、製造コストを低減させることができる映
像音声処理装置を提供することを目的とする。また本発
明の他の目的は、圧縮映像データのデコード、映像デー
タのエンコード、グラフィックス処理を低コストで実現
する映像音声処理装置を提供することにある。【００１０】【課題を解決するための手段】上記の課題を解決するた
め本発明の映像音声処理装置は、圧縮音声データと圧縮
映像データとを含むデータストリームを外部から入力、
デコードし、デコードしたデータを出力装置に出力する
装置であって、外部要因により非同期に発生する入出力
処理を行う入出力処理手段と、前記入出力処理と並行し
て、メモリに格納されたデータストリームのデコードを
主とするデコード処理を行うデコード処理手段とを備
え、前記デコード処理手段によりデコードされた映像デ
ータ、デコードされた音声データはメモリに格納され、
前記入出力処理は、外部から非同期に入力される前記デ
ータストリームを入力し、さらにメモリに格納すること
と、メモリに格納されたデータストリームをデコード処
理手段に供給することと、外部の表示装置、音声出力装
置それぞれの出力レートに合わせてメモリから読み出
し、それらに出力することとを入出力処理として行うよ
うに構成されている。【００１１】この構成によれば、入出力処理手段とデコ
ード処理手段とがパイプライン的に並列動作することに
加えて、非同期処理とデコード処理とを入出力処理手段
とデコード処理手段とに分担させるので、デコード処理
手段は非同期に発生する処理から解放されてデコード処
理に専従することができる。その結果、本映像音声処理
装置は、ストリームデータ入力、デコード、出力という
一連の処理を効率良く実行するので、ストリームデータ
のフルデコード（フレーム落ちなし）を高速な動作クロ
ックを用いなくても可能にしている。【００１２】【発明の実施の形態】本発明の映像音声処理装置につい
て、その実施の形態を次のように項分けして記載する。 1 第１の実施形態 1.1 映像音声処理装置の概略構成 1.1.1 入出力処理部 1.1.2 デコード処理部 1.1.2.1 逐次処理部 1.1.2.2 定型処理部 1.2 映像音声処理装置の構成 1.2.1 入出力処理部の構成 1.2.2 デコード処理部 1.2.2.1 逐次処理部 1.2.2.2 定型処理部 1.3 各部の詳細構成 1.3.1 プロセッサ７（逐次処理部） 1.3.2 定型処理部 1.3.2.1 コード変換部 1.3.2.2 画素演算部 1.3.2.3 画素読み書き部 1.3.3 入出力処理部 1.3.3.1 ＩＯプロセッサ 1.3.3.1.1 命令読出回路 1.3.3.1.2 タスク管理部 1.4 動作説明 2 第２の実施形態 2.1 映像音声処理装置の構成 2.1.1 画素演算部＜1. 第１の実施形態＞本実施形態における映像音声処
理装置は、衛星放送受信装置（ＳＴＢ:Set TopBoxと呼
ばれる）、ＤＶＤ(Digital Versatile Disc)再生装置、
ＤＶＤ−ＲＡＭ記録再生装置などに備えられ、圧縮映像
／音声データとして衛星放送から又はＤＶＤからのＭＰ
ＥＧストリームを入力し、伸長処理（以下単にデコード
と呼ぶ）を行って、映像信号及び音声信号を外部の出力
装置に出力する。＜1.1 映像音声処理装置の概略構成＞図３は、本発明
の第１の実施形態における映像音声処理装置の概略構成
を示すブロック図である。【００１３】映像音声処理装置１０００は、入出力処理
部１００１、デコード処理部１００２、メモリコントロ
ーラ６を備え、入出力処理とデコード処理とを分離して
並行して行うように構成されている。また、外部メモリ
３は、ＭＰＥＧストリームやデコード後の音声データを
一時的に記憶する作業用メモリ、デコード後の映像デー
タを記憶するフレームメモリとして利用される。＜1.1.1 入出力処理部＞入出力処理部１００１は、映
像音声処理装置１０００の内部動作とは非同期に発生す
る入出力処理を行う。この入出力処理は、（ａ）外部か
ら非同期に入力されるＭＰＥＧストリームを入力して外
部メモリ３に一時的に格納すること、（ｂ）外部メモリ
３に格納されたＭＰＥＧストリームをデコード処理部１
００２に供給すること、（ｃ）デコードされた映像デー
タ、音声データを外部メモリ３から読み出し、外部の表
示装置、音声出力装置（図外）それぞれの出力レートに
合わせて出力することを内容とする。＜1.1.2 デコード処理部＞デコード処理部１００２
は、入出力処理部１００１の動作とは独立に並行して、
入出力処理部１００１によって供給されるＭＰＥＧスト
リームのデコードし、デコード後の映像データ及び音声
データを外部メモリ３に格納する。ＭＰＥＧストリーム
のデコード処理は演算量が多く処理内容も多岐にわたる
ため、デコード処理部１００２は、逐次処理部１００
３、定型処理部１００４とを備え、多岐に亘る条件判断
を主とする逐次処理と、定型的な大量の演算を主としか
つ並列演算に適した定型処理とを分離して並行して実行
するように構成されている。ここで、逐次処理は、ＭＰ
ＥＧストリームのヘッダ解析などであり、ヘッダの検出
及びヘッダ内容の判定等の多数の条件判断含む。また定
型処理は、所定数の画素からなるブロック単位に各種演
算を施す必要があるので、パイプライン的な並列処理に
適していて、かつ、異なるデータ（画素）に対して全く
同じ演算を施すというベクトル演算のような並列処理に
適している。＜1.1.2.1 逐次処理部＞逐次処理部１００３は、入出
力処理部１００１から供給される圧縮音声データ及び圧
縮映像データのヘッダ解析と、定型処理部１００４をマ
クロブロック毎に起動する制御と、圧縮音声データのデ
コード処理とを上記逐次処理として行う。ヘッダ解析
は、ＭＰＥＧストリームにおけるマクロブロックヘッダ
の解析と、動きベクトルの復号を含む。ここでブロック
とは、８＊８画素からなる画像を表す。マクロブロック
とは、４つの輝度ブロックと２つの色差ブロックからな
る。動きベクトルとは、参照フレーム中の８＊８画素の
矩形領域を指すベクトルであり、当該ブロックが参照フ
レーム中のどの矩形領域との差分がとられたかを指し示
す。＜1.1.2.2 定型処理部＞定型処理部１００４は、逐次
処理部１００３からマクロブロック毎にデコードの起動
指示を受けて逐次処理部１００３の音声デコード処理と
並行して、マクロブロックのデコード処理を上記定型処
理として行う。このデコード処理は、可変長符号の復号
（ＶＬＤ:Variable Length code Decoding）、逆量子化
（ＩＱ：Inverse Quantization）、逆離散余弦変換（Ｉ
ＤＣＴ:Inverse Discrete Cosine Transform）、動き補
償（ＭＣ:Motion Compensation）を同順に施すことを内
容とする。定型処理部１００４は、動き補償において、
復号後のブロックをフレームメモリとしての外部メモリ
３にメモリコントローラ６を介して格納する。＜1.2 映像音声処理装置の構成＞図４は、映像音声処
理装置１０００のより詳細な構成を示すブロック図であ
る。＜1.2.1 入出力処理部の構成＞同図において入出力処
理部１００１は、ストリーム入力部１、バッファメモリ
２、入出力プロセッサ５（以下ＩＯプロセッサ５と略
す）、ＤＭＡＣ（Direct Memory Access Controller）
５ａ、ビデオ出力部１２、音声出力部１３、ホストI/F
部１４とを備える。【００１４】ストリーム入力部１は、外部からシリアル
に入力されるＭＰＥＧデータストリームをパラレルデー
タ（以降、ＭＰＥＧデータと呼ぶ）に変換する。その
際、ストリーム入力部１は、ＭＰＥＧデータストリーム
からＧＯＰ(Group Of Picture:Ｉピクチャを１つ含み、
約０．５秒分の動画に相当するＭＰＥＧデータストリー
ム）のスタートコードを検出し、その旨をＩＯプロセッ
サ５に通知する。この通知により変換後のＭＰＥＧデー
タは、ＩＯプロセッサ５の制御によりバッファメモリ２
に転送される。【００１５】バッファメモリ２は、ストリーム入力部１
から転送されたＭＰＥＧデータを一時的に保持する緩衝
用メモリである。バッファメモリ２に保持されたＭＰＥ
Ｇデータは、さらに入出力プロセッサ５の制御の下でメ
モリコントローラ６を介して外部メモリ３に転送され
る。外部メモリ３は、ＳＤＲＡＭ（Synchronous Dynami
c Random Access Memory）チップにより構成され、バッ
ファメモリ２からメモリコントローラ６を介して転送さ
れたＭＰＥＧデータを一時的に保持する。さらに、外部
メモリ３は復号後の映像データ（以降、フレームデータ
とも呼ぶ）および復号後の音声データも保持する。【００１６】入出力プロセッサ５は、ストリーム入力部
１、バッファメモリ２、外部メモリ３（メモリコントロ
ーラ６が介在する）、ＦＩＦＯメモリ４の間のデータ入
出力を制御する。すなわち以下の(1)〜(4)に示す経路の
データ転送（ＤＭＡ転送）を制御する。 (1)ストリーム入力部１→バッファメモリ２→メモリコントローラ
６→外部メモリ３ (2)外部メモリ３→メモリコントローラ６→ＦＩＦＯメモリ４ (3)外部メモリ３→メモリコントローラ６→バッファメモリ２→ビ
デオ出力部１２ (4)外部メモリ３→メモリコントローラ６→バッファメモリ２→音
声出力部１３これらの径路では入出力プロセッサ５は、ＭＰＥＧデー
タ中の映像データと音声データとを独立にそれぞれの転
送を制御する。また、(1)、(2)は復号前のＭＰＥＧデー
タの転送経路である。(1)、(2)の転送経路において入出
力プロセッサ５は、圧縮映像データと圧縮音声データと
を別個に転送する。(3)、(4)はそれぞれ、復号後の映
像、音声データの転送経路である。復号後の映像、音声
データは、外部の表示装置（図外）、音声出力装置（図
外）それぞれの出力レートに合わせて転送される。【００１７】ＤＭＡＣ５ａは、ストリーム入力部１、ビ
デオ出力部１２、音声出力部１３とバッファメモリ２と
の間のＤＭＡ転送、バッファメモリ２と外部メモリ３と
の間のＤＭＡ転送、外部メモリ３とＦＩＦＯメモリ４の
間のＤＭＡ転送をＩＯプロセッサ５の制御に従って実行
する。ビデオ出力部１２は、外部の表示装置（ＣＲＴ
等）の出力レート（たとえば水平同期信号Ｈsyncの周
期）に合せて入出力プロセッサ５にデータ要求を出し、
入出力プロセッサ５により上記(3)の転送経路により入
力される映像データをその表示装置に出力する。【００１８】音声出力部１３は、外部の音声出力装置の
出力レートに合せて入出力プロセッサ５にデータ要求を
出し、入出力プロセッサ５により上記(4)の転送経路に
より入力される音声データを音声出力装置（Ｄ／Ａコン
バータ、音声アンプ、スピーカの組み合わせ等）に出力
する。ホストI/F部１４は、外部のホストプロセッサ、
たとえばＤＶＤ再生装置の場合にはその制御全般を行う
プロセッサとの通信を行うためのインターフェースであ
る。この通信では、ホストプロセッサからＭＰＥＧスト
リームのデコード開始、停止、早送り再生、逆再生等の
指示などが送られる。＜1.2.2 デコード処理部＞図４のデコード処理部１０
０２は、ＦＩＦＯメモリ４、逐次処理部１００３、定型
処理部１００４と備え、入出力処理部１００１からＦＩ
ＦＯメモリ４を介して供給されるＭＰＥＧデータのデコ
ード処理を行う。また、逐次処理部１００３は、プロセ
ッサ７と内部メモリ８とを備える。定型処理部１００４
は、コード変換部９、画素演算部１０、画素読み書き部
１１、バッファ２００、バッファ２０１を備える。【００１９】ＦＩＦＯメモリ４は、２つのＦＩＦＯ（以
下映像ＦＩＦＯ、音声ＦＩＦＯと呼ぶ）からなり、入出
力プロセッサ５の制御の下で外部メモリ３から転送され
た圧縮映像データ、圧縮音声データをそれぞれ先入れ先
出し式に記憶する。＜1.2.2.1 逐次処理部＞プロセッサ７は、ＦＩＦＯメ
モリ４の圧縮映像データ及び圧縮音声データの読み出し
を制御するとともに、圧縮映像データに対する一部のデ
コード処理と、圧縮音声データに対する全デコード処理
とを行う。圧縮映像データの一部のデコード処理とは、
ＭＰＥＧデータ中のヘッダ情報の解析と動きベクトルの
計算と圧縮映像デコード処理の制御とを含む。これは、
圧縮映像データの全デコード処理を、プロセッサ７と、
定型処理部１００４とで分担して行うためである。つま
りプロセッサ７は多岐にわたる条件判断を必要とする逐
次処理を分担し、定型処理部１００４は、大量の定型的
な演算処理を分担する。これに対し音声デコードは、映
像デコードに比べて演算量が少ないのでプロセッサ７が
全部を担当している。【００２０】プロセッサ７の機能を図５を用いて具体的
に説明する。図５はＭＰＥＧストリームを階層的に示と
ともに映像音声処理装置各部の動作タイミングを示して
いる。同図において横軸は時間軸である。第１階層はＭ
ＰＥＧストリームの流れを示す。第２階層のように１秒
間のＭＰＥＧストリームは、複数のフレーム（Ｉ、Ｐ、
Ｂピクチャ）を含む。第３階層のように１フレームは、
ピクチャヘッダと複数のスライスを含む。第４階層のよ
うに１スライスは、スライスヘッダと複数のマクロブロ
ックを含む。第５階層のように１マクロブロックは、マ
クロブロックヘッダと６つのブロックを含む。【００２１】同図に示す第１〜第５階層のデータ構成
は、公知文献、例えば株式会社アスキー「ポイント図解
式最新ＭＰＥＧ教科書」に詳しく説明されている。プロ
セッサ７は、同図の第５階層以下に示すように、ＭＰＥ
Ｇストリーム中のマクロブロック層までのヘッダ解析と
圧縮音声データの復号とを行う。その際、プロセッサ７
は、マクロブロック単位のヘッダ解析結果に従って、コ
ード変換部９、画素演算部１０及び画素読み書き部１１
に対してマクロブロックのデコードを開始を指示し、コ
ード変換部９、画素演算部１０及び画素読み書き部１１
によってマクロブロックのデコードがなされている間、
ＦＩＦＯメモリ４から圧縮音声データの読み出してデコ
ードする。コード変換部９、画素演算部１０及び画素読
み書き部１１によりマクロブロックのデコードが終了し
たと、プロセッサ７は、割込み信号によりその旨の通知
を受け、圧縮音声データのデコードを中断して、次のマ
クロブロックのヘッダ解析を開始する。【００２２】内部メモリ８は、プロセッサ７のワークメ
モリであり、復号された音声データを一時的に保持す
る。保持された音声データは、入出力プロセッサ５によ
り上記(4)の経路で外部メモリ３に転送される。＜1.2.2.2 定型処理部＞コード変換部９は、ＦＩＦＯ
メモリ４から読み出された圧縮映像データを可変長復号
（ＶＬＤ）する。図５に示すように、コード変換部９
は、復号後のデータのうち、ヘッダ情報及び動きベクト
ルに関する情報（図中の破線区間）をプロセッサ７に転
送し、マクロブロック（輝度ブロックＹ０〜Ｙ３と色差
ブロックＣｂ、Ｃｒとからなる６ブロック）のデータ
（図中の実線区間）をバッファ２００を介して画素演算
部１０に転送する。コード変換部９による復号後のマク
ロブロックのデータは空間周波数成分を表すデータであ
る。【００２３】バッファ２００は、コード変換部９により
書き込まれる１ブロック（８×８画素分）分の空間周波
数成分を表すデータを保持する。画素演算部１０は、コ
ード変換部９からバッファ２００を介して転送されたブ
ロックデータに対して、逆量子化処理（ＩＱ）及び逆離
散余弦変換（ＩＤＣＴ）をブロック単位に行う。画素演
算部１０による処理結果は、輝度ブロックであれば画素
の輝度値又はその差分を表すデータであり、色差ブロッ
クであれば画素の色差又はその差分を表すデータであ
り、バッファ２０１を介して画素読み書き部１１に転送
される。【００２４】バッファ２０１は、１ブロック（８×８画
素分）分の画素データを保持する。画素読み書き部１１
は、画素演算部１０の処理結果に対して、ブロック単位
に動き補償を行う。すなわち、Ｐピクチャ、Ｂピクチャ
については、外部メモリ３内の復号済みの参照フレーム
から動きベクトルが示す矩形領域をメモリコントローラ
６を介して切り出して、画素演算部１０の処理結果のブ
ロックと合成することにより、元のブロック画像に復号
する。画素読み書き部１１による復号結果は、メモリコ
ントローラ６を介して外部メモリ３に格納される。【００２５】上記の動き補償、ＩＱ、ＩＤＣＴの各内容
については公知技術なので詳しい説明は省略する（上記
文献参照）。＜1.3 各部の詳細構成＞次に、映像音声処理装置１０
００の主要な各部の詳細な構成について説明する。＜1.3.1 プロセッサ７（逐次処理部）＞図６は、プロ
セッサ７によるマクロブロックヘッダの解析と、他の各
部への制御内容とを示す図である。まず同図に略語で示
してあるマクロブロックヘッダ中の各データは上記文献
等に説明されているのでここでは説明を省略する。【００２６】同図のようにプロセッサ７は、コード変換
部９にコマンドを発行して可変長復号されたヘッダ部分
のデータを逐次取得し、その内容に従ってコード変換部
９、画素演算部１０、画素読み書き部１１に対してマク
ロブロックのデコードに必要なデータを設定する。具体
的には、まずプロセッサ７は、コード変換部９にＭＢＡ
Ｉ（Macro BlockAddress Increment）を取得するための
コマンドを発行して（Ｓ１０１）、コード変換部９から
ＭＢＡＩを取得する。このＭＢＡＩに基づき当該マクロ
ブロックデータがスキップマクロブロックであれば（今
デコードしようとしているマクロブロックが前回と同じ
であれば）、マクロブロックデータが省略されているの
でＳ１１７に進み、スキップマクロブロックでなければ
ヘッダ解析を続ける（Ｓ１０２、１０３）。【００２７】次いで、プロセッサ７はＭＢＴ（Macro Bl
ock Type）を取得するためのコマンドを発行して、コー
ド変換部９からＭＢＴを取得する。このＭＢＴからブロ
ックのスキャンタイプがジグザグスキャンかオールタネ
ートスキャンかを判断し、画素演算部１０にバッファ２
００の読み出し順序を指示する（Ｓ１０４）。さらに、
プロセッサ７は既に取得したヘッダデータからＳＴＷＣ
（Spartial Temporal Weight Code）が存在するか否か
を判定し（Ｓ１０５）、存在する場合にはコマンドを発
行して取得する（Ｓ１０６）。【００２８】同様にしてプロセッサ７は、ＦｒＭＴ（Fr
ame Motion Type）、ＦｉＭＴ（Field Motion Type）、
ＤＴ（DCT type）、ＱＳＣ（Quantizer Scale Code）、
ＭＶ（Motion Vector）、ＣＢＰ（Coded Block Patter
n）を取得する（Ｓ１０７〜１１６）。その際、プロセ
ッサ７は、ＦｒＭＴ、ＦｉＭＴ、ＤＴの解析結果を画素
読み書き部１１に通知し、ＱＳＣの解析結果を画素演算
部１０に通知し、ＣＢＰの解析結果をコード変換部９に
通知する。これによりＩＱ、ＩＤＣＴ、動き補償に必要
が情報が、コード変換部９、画素演算部１０、画素読み
書き部１１に設定される。【００２９】また２プロセッサ構成では、多岐にわたる
条件判断を必要とする上記の逐次処理を各プロセッサが
個別に行うため冗長な構成になっていた。次いで、プロ
セッサ７はコード変換部９に対してマクロブロックのデ
コード開始指示を発行する（Ｓ１１７）。これによりコ
ード変換部９は、マクロブロック内の各ブロックについ
てＶＬＤを開始し、ＶＬＤの結果をバッファ２００を介
して画素演算部１０に出力する。さらにプロセッサ７
は、ＭＶデータに基づいて動きベクトルを計算し（Ｓ１
１８）、その計算結果を画素読み書き部１１に通知する
（Ｓ１１９）。【００３０】上記処理において、動きベクトルに関して
は、動きベクトルのデータ（ＭＶ）取得（Ｓ１１３）
し、動きベクトルの計算（Ｓ１１８）し、動きベクトル
を画素読み書き部１１に設定する（Ｓ１１９）という一
連の処理が必要である。この点、プロセッサ７は、動き
ベクトルデータ（ＭＶ）を取得（Ｓ１１３）した直後に
動きベクトルの計算及び設定（Ｓ１１８、１１９）しな
いで、定型処理部１００４へのデコード開始指示を発行
してから動きベクトルを計算及び設定を行うようにして
いる。これにより、プロセッサ７の動きベクトル計算お
よび設定処理と、定型処理部１００４へのデコード処理
とが並列に処理されるようになる。つまり定型処理部１
００４のデコード開始タイミングを早くしている。【００３１】以上のようにしてマクロブロック１つ分の
圧縮映像データのヘッダ解析が完了するので、プロセッ
サ７は、ＦＩＦＯメモリ４から圧縮音声データを取得し
て、音声デコード処理を開始する（Ｓ１２０）。音声デ
コード処理は、コード変換部９からマクロブロックのデ
コード完了を示す割り込み信号が入力されるまで続けら
れる。この割り込み信号によりプロセッサ７は次のマク
ロブロックに対して上記ヘッダ解析を開始する。＜1.3.2 定型処理部＞次に、定型処理部１００４は、
マクロブロック内の６つのブロックをコード変換部９、
画素演算部１０、画素読み書き部１１を並列に（パイプ
ライン的に）に動作させることによりデコード処理を行
っている。ここでは、画素演算部１０、画素読み書き部
１１、コード変換部９の順にそれらの構成をより詳細に
説明する。＜1.3.2.1 コード変換部９＞図１９は、コード変換部
９の構成を示すブロック図である。【００３２】同図のコード変換部９は、ＶＬＤ部９０
１、カウンタ９０２、インクリメンタ９０３、セレクタ
９０４、スキャンテーブル９０５、スキャンテーブル９
０６、フリップフロップ（以下ＦＦと略す）９０７、セ
レクタ９０８とを備え、可変長復号（ＶＬＤ）した結果
をブロック単位に、ジグザグスキャン又はオルタネート
スキャンの順に配列するようにバッファ２００に書き込
むよう構成されている。【００３３】ＶＬＤ部９０１は、ＦＩＦＯメモリ４から
読み出された圧縮映像データを可変長復号（ＶＬＤ）
し、復号後のデータのうち、ヘッダ情報及び動きベクト
ルに関する情報（図５中の破線区間）をプロセッサ７に
転送し、マクロブロック（輝度ブロックＹ０〜Ｙ３と色
差ブロックＣｂ、Ｃｒとからなる６ブロック）のデータ
（図５中の実線区間）をブロック（６４個の空間周波数
データ）単位にバッファ２００に出力する。【００３４】カウンタ９０２、インクリメンタ９０３、
セレクタ９０４からなる回路部分は、ＶＬＤ部９０１か
らの空間周波数データの出力に同期して、０から６３ま
でを繰り返しカウントする。スキャンテーブル９０５
は、バッファ２００のブロック記憶領域のアドレスをジ
グザグスキャンの順に記憶しているテーブルであり、カ
ウンタ９０２の出力値（０〜６３）が順に入力され、順
次そのアドレスを出力する。図２０にバッファ２００中
の８×８個の空間周波数データを記憶するブロック記憶
領域と、ジグザグスキャンの順路を示す。スキャンテー
ブル９０５は、同図の順路における画素アドレスを順次
出力する。【００３５】スキャンテーブル９０６は、バッファ２０
０のブロック記憶領域のアドレスをオルタネートスキャ
ンの順に記憶しているテーブルであり、カウンタ９０２
の出力値（０〜６３）が順に入力され、順次そのアドレ
スを出力する。図２１にバッファ２００中の８×８個の
空間周波数データを記憶するブロック記憶領域と、オル
タネートスキャンの順路を示す。スキャンテーブル９０
５は、同図の順路における画素アドレスを順次出力す
る。【００３６】ＦＦ９０７は、スキャンタイプ（ジグザグ
スキャンかオルタネートスキャンか）を示すフラグを保
持する。このフラグは、プロセッサ７により設定され
る。セレクタ９０８は、ＦＦ９０７のフラグに応じてス
キャンテーブル９０５とスキャンテーブル９０６とから
出力されるアドレスを選択し、バッファ２００に書き込
みアドレスとして出力する。＜1.3.2.2 画素演算部＞図７は、画素演算部１０の構
成を示すブロック図である。【００３７】同図のように画素演算部１０は、乗算器５
０２と加減算器５０３からなる実行部５０１と、第１プ
ログラムカウンタ（以降、第１ＰＣと略す）５０４と、
第２プログラムカウンタ（以降、第２ＰＣと略す）５０
５と、第１命令メモリ５０６と、第２命令メモリ５０７
と、セレクタ５０８とを有し、ＩＱとＩＤＣＴの一部と
をオーバラップさせて並列に実行できるように構成され
ている。。【００３８】実行部５０１は、第１命令メモリ５０６、
第２命令メモリ５０７から順次出力されるマイクロ命令
に従って、バッファ２００、２０１のアクセス及び演算
を実行する。第１命令メモリ５０６、第２命令メモリ５
０７は、バッファ２００に保持されたブロック（周波数
成分）に対して、ＩＱ、ＩＤＣＴを実現するためのマイ
クロプログラムを記憶する制御記憶である。図８に、第
１命令メモリ５０６及び第２命令メモリ５０７に記憶さ
れたマイクロプログラムの一例を示す。【００３９】同図において、第１命令メモリ５０６はＩ
ＤＣＴ１Ａマイクロプログラムと、ＩＱマイクロプログ
ラムとを記憶し、第１ＰＣ５０４によって読み出しアド
レスが指定される。ＩＱマイクロプグラムは、バッファ
２００の読み出しと、乗算とを主体とする演算処理であ
り、加減算器５０３を用いない。第２命令メモリ５０７
はＩＤＣＴ１Ｂマイクロプログラムと、ＩＤＣＴ２マイ
クロプログラムとを記憶し、セレクタ５０８を介して第
１ＰＣ５０４又は第２ＰＣ５０５により読出アドレスが
指定される。ここで、ＩＤＣＴ１は、乗算及び加減算を
主とするＩＤＣＴの前半部分の処理を意味し、ＩＤＣＴ
１ＡマイクロプログラムとＩＤＣＴ１Ｂマイクロプログ
ラムとが同時に読み出されることにより実行部５０１全
体を使って実行される。また、ＩＤＣＴ２は、加減算を
主とするＩＤＣＴの後半部分の処理とバッファ２０１へ
の書き出し処理を意味し、第２命令メモリ５０７のＩＤ
ＣＴ２マイクロプログラムが読み出されることによって
加減算器５０３を使って実行される。【００４０】ＩＱは乗算器５０２により、ＩＤＣＴ２は
加減算器５０３により処理されるので、これらは並列動
作可能になっている。図９に、画素演算部１０によるＩ
Ｑ、ＩＤＣＴ１、ＩＤＣＴ２の動作タイミング図を示
す。図９において、コード変換部９はバッファ２００に
輝度ブロックＹ０のデータを書き込むと（タイミングｔ
０）、その旨を制御信号１０２にて画素演算部１０に通
知する。画素演算部１０は、プロセッサ７のヘッダ解析
時に設定されたＱＳ（Quantizer Scale）値を用いて、
第１ＰＣ５０４のアドレス指定に従って第１命令メモリ
５０６のＩＱマイクロプログラムを読み出すことによっ
てバッファ２００のデータに対してＩＱを行う。このと
き、セレクタ５０８は第１ＰＣ５０４を選択する（タイ
ミングｔ１）。【００４１】さらに、画素演算部１０は、第１ＰＣ５０
４のアドレス指定に従ってＩＤＣＴ１Ａ及びＩＤＣＴ１
Ｂマイクロプログラムを読み出すことによってバッファ
２００のデータに対してＩＤＣＴ１を行う。このとき、
セレクタ５０８は第１ＰＣ５０４を選択するので、第１
命令メモリ５０６、第２命令メモリ５０７の双方に第１
ＰＣ５０４からのアドレスが指定される（タイミングｔ
２）。【００４２】次に、画素演算部１０は、上記ＱＳ（Quan
tizer Scale）値を用いて、第１ＰＣ５０４のアドレス
指定に従って第１命令メモリ５０６のＩＱマイクロプロ
グラムを読み出すことによってバッファ２００のブロッ
クＹ１のデータに対してＩＱを行い、同時に、第２ＰＣ
５０５のアドレス指定に従って第２命令メモリ５０７の
ＩＤＣＴ２マイクロプログラムを読み出すことによって
ブロックＹ０に対してＩＤＣＴ処理の後半部分を処理す
る。このときセレクタ５０８は第２ＰＣ５０５を選択す
る。第１ＰＣ５０４と第２ＰＣ５０５とは独立にアドレ
スを指定することになる（タイミングｔ３）。【００４３】この後も同様に画素演算部１０はブロック
単位に処理を続ける（タイミングｔ４以降）。＜1.3.2.3 画素読み書き部＞図１０は、画素読み書き
部１１の詳細な構成を示すブロック図である。同図のよ
うに画素読み書き部１１は、バッファ７１〜７４（以
下、バッファＡ〜Ｄと呼ぶ）と、ハーフぺル補間部７５
と、合成部７６と、セレクタ７７、７８と、読み書き制
御部７９とからなる。【００４４】読み書き制御部７９は、バッファ２０１を
介して入力されるブロックデータに対して、バッファＡ
〜Ｄを用いて動き補償を行い、最終的な復号画像を２ブ
ロック単位で外部メモリ３に転送する。より具体的に
は、プロセッサ７のヘッダ解析時に設定された動きベク
トルに従って、外部メモリ３中の参照フレームから２ブ
ロック分に相当する矩形領域を読み出すようメモリコン
トローラ６を制御する。その結果、バッファＡ又はバッ
ファＢに動きベクトルが指し示す２ブロック分の矩形領
域のデータが格納される。その後、ピクチャの種類（Ｉ
かＰかＢピクチャか）に応じて２ブロック分の矩形領域
のハーフペル補間を合成部７６にて行う。さらにバッフ
ァ２０１を介して入力されるブロックデータと、ハーフ
ペル補間後の矩形領域とを合成（加算）することによ
り、当該ブロックの画素値を算出し、バッファＢに格納
する。こうしてバッファＢに格納された最終的な復号ブ
ロックはメモリコントローラ６を介して外部メモリ３に
転送される。＜1.3.3 入出力処理部＞入出力処理部１００１は、上
記のように多数のデータ入出力（データ転送）を実行す
るために、種々のデータ転送を分担する複数のタスクを
オーバーヘッドなく切り替え、しかもデータ入出力要求
に対して応答遅延を生じさせないように構成されてい
る。ここでいうオーバーヘッドは、タスクスイッチ時に
発生するコンテキストの退避及び復帰である。つまり入
出力プロセッサ５は、プログラムカウンタの命令アドレ
スやレジスタデータをメモリ（スタック領域）に退避及
び復帰することにより生ずるオーバーヘッドを解消する
ように構成されている。ここでは、その詳細な構成につ
いて説明する。＜1.3.3.1 ＩＯプロセッサ＞図１１は、ＩＯプロセッ
サ５の構成を示すブロック図である。同図において、Ｉ
Ｏプロセッサ５は、状態監視レジスタ５１、命令メモリ
５２、命令読出回路５３、命令レジスタ５４、デコーダ
５５、演算実行部５６、汎用レジスタセット群５７、タ
スク管理部５８を備え、非同期に発生する複数のイベン
トに対応するために、極めて短い周期（例えば４命令サ
イクル）毎にタスクを切り替えながら実行するよう構成
されている。【００４５】状態監視レジスタ５１は、レジスタＣＲ１
〜ＣＲ３からなり、ＩＯプロセッサ５が種々の入出力状
態を監視するための種々の状態データ（フラグなど）を
保持する。例えば、状態監視レジスタ５１は、ストリー
ム入力部１の状態（ＭＰＥＧストリームにおけるスター
トコード検出フラグ）、ビデオ出力部１２の状態（水平
ブランキング期間を示すフラグ、フレームデータの転送
完了フラグ）、音声出力部１３の状態（音声フレームデ
ータの転送完了フラグ）や、それらとバッファメモリ
２、外部メモリ３及びＦＩＦＯメモリ４との間でのデー
タ転送の状態（データ転送数、ＦＩＦＯメモリ４へのデ
ータ要求フラグ）などを示す状態データを保持する。【００４６】より具体的には、以下のフラグ等を含む。・スタートコード検出フラグ（以下フラグ１とも呼ぶ）このフラグは、ストリーム入力部１によってＭＰＥＧス
トリームにおけるスタートコードが検出されたとき設定
される。・水平ブランキングフラグ（フラグ２）このフラグは、水平ブランキング期間を示すフラグであ
り、ビデオ出力部１２により設定される。約６０マイク
ロ秒周期で設定される。・映像フレームデータの転送完了フラグ（フラグ３）このフラグは、外部メモリ３からビデオ出力部１２へ１
フレーム分の復号された画像データが転送されたときＤ
ＭＡＣ５ａによって設定される。・音声フレームデータの転送完了フラグ（フラグ４）このフラグは、外部メモリ３から音声出力部１３へ１フ
レーム分の復号された音声データが転送されたときＤＭ
ＡＣ５ａによって設定される。・データ転送完了フラグ（フラグ５）このフラグは、ストリーム入力部１からバッファメモリ
２へＩＯプロセッサ５により指定されたデータ数の圧縮
画像データがＤＭＡＣ５ａによりＤＭＡ転送されたとき
（ターミナルカウントになったとき）に設定される。・ＤＭＡ要求フラグ（フラグ６）このフラグは、バッファメモリ２の圧縮画像データ又は
圧縮音声データを外部メモリ３へＤＭＡ転送すべきデー
タがあることを示すフラグであり、ＩＯプロセッサ５に
より設定される（後述するタスク１からタスク２への要
求）。・映像ＦＩＦＯへのデータ要求フラグ（フラグ７）このフラグは、外部メモリ３からＦＩＦＯメモリ４中の
映像ＦＩＦＯへのデータ転送を要求するフラグであり、
映像ＦＩＦＯの圧縮映像データが所定量以下になったと
き設定される。このフラグは、約５〜４０マイクロ秒周
期で設定される。・音声ＦＩＦＯへのデータ要求フラグ（フラグ８）このフラグは、外部メモリ３からＦＩＦＯメモリ４中の
音声ＦＩＦＯへのデータ転送を要求するフラグであり、
音声ＦＩＦＯの圧縮音声データが所定量以下になったと
きに設定される。このフラグは、約１５〜６０マイクロ
秒周期で設定される。・デコーダ通信要求フラグ（フラグ９）このフラグは、デコード処理部１００２から入出力処理
部１００１へ通信を要求するフラグである。・ホスト通信要求フラグ（フラグ１０）このフラグは、ホストプロセッサから入出力処理部１０
０１へ通信を要求するフラグである。【００４７】上記のフラグ類は、ＩＯプロセッサ５によ
り実行される各タスクにより、割り込みではなく、定常
的に監視される。命令メモリ５２は、多数のデータ入出
力（データ転送）制御を分担する複数のタスクプログラ
ムを記憶する。本実施例では、タスク０〜５の６つのタ
スクプログラムを記憶する。・タスク０（ホストI/Fタスク）本タスクは、上記フラグ１０が設定されたとき、ホスト
コンピュ−タとの通信、つまりホストI/F部１４を介し
たホストコンピュ−タとの通信処理を行うためのタスク
である。例えば、ホストプロセッサからのＭＰＥＧスト
リームのデコード開始、停止、早送り再生、逆再生等の
受け付けと、デコード状況（エラー等）の通知などが行
われる。この処理は、上記フラグ１０をトリガーとす
る。・タスク１（パージングタスク）本タスクは、ストリーム入力部１によりスタートコード
が検出されたとき（上記フラグ１）、ストリーム入力部
１から入力されるＭＰＥＧデータを解析（パージング）
して、個々のエレメンタリストリームを抽出して、抽出
されたエレメンタリストリームを、ＤＭＡ転送(上記転
送経路(1)の前半部分)によりバッファメモリ２に転送す
るプログラムである。ここで抽出されるエレメンタリス
トリームの種類は、圧縮映像データ（ビデオエレメンタ
リーストリームとも呼ぶ）、圧縮音声データ（オーディ
オエレメンタリーストリームとも呼ぶ）、プライベート
データなどがある。エレメンタリストリームをバッファ
メモリ２に格納したときに、上記フラグ６が設定され
る。・タスク２（ストリーム転送／オーディオタスク）本タスクは、次の（ａ）〜（ｃ）の転送を制御するプロ
グラムである。【００４８】(a)バッファメモリ２から外部メモリ３へ
個々のエレメンタリーストリームのＤＭＡ転送(上記転
送経路(1)の後半部分)。この転送は、上記フラグ１、３
をトリガーとする。 (b)オーディオＦＩＦＯに保持されている圧縮音声デー
タのデータサイズ（残量）に応じて、外部メモリ３から
ＦＩＦＯメモリ４のオーディオＦＩＦＯへの圧縮音声デ
ータのＤＭＡ転送（上記転送経路(2)におけるオーディ
オＦＩＦＯへの転送）。このデータ転送は、オーディオ
ＦＩＦＯに保持されている圧縮音声データのデータサイ
ズが一定量よりも少なくなった場合になされる。この転
送は、上記フラグ８をトリガーとする。【００４９】(c)外部メモリ３からバッファメモリ２
へ、さらにバッファメモリ２から音声出力部１３へ復号
後のオーディオデータのＤＭＡ転送（上記転送経路
(4)）。この転送は、上記フラグ２をトリガーとする。・タスク３（映像供給タスク）本タスクは、映像ＦＩＦＯに保持されている圧縮映像デ
ータのデータサイズ（残量）に応じて、外部メモリ３か
らＦＩＦＯメモリ４の映像ＦＩＦＯへの圧縮映像データ
のＤＭＡ転送（上記転送経路(2)における映像ＦＩＦＯ
への転送）を処理するプログラムである。このデータ転
送は、映像ＦＩＦＯに保持されている圧縮映像データの
データサイズが一定量よりも少なくなった場合になされ
る。この転送は、上記フラグ７をトリガーとする。・タスク４（ビデオ出力タスク）本タスクは、外部メモリ３からバッファメモリ２へ、さ
らにバッファメモリ２からビデオ出力部１２へ復号後の
映像データのＤＭＡ転送（上記転送経路(4)）を処理す
るプログラムである。この転送は、上記フラグ２をトリ
ガーとする。・タスク５（デコーダＩ／Ｆタスク）本タスクは、デコード処理部１００２からＩＯプロセッ
サ５に向けてのコマンドを処理するプログラムである。
コマンドには、「getAPTS」、「getVPTS」、「getSTC」
などがある。getVPTS（Video Presentation Time Stam
p）は、デコード処理部１００２がＩＯプロセッサ５に
対して圧縮映像データに付与されているＶＰＴＳの取得
を要求するコマンドである。getAPTS（Audio Presentat
ion Time Stamp）は、デコード処理部１００２がＩＯプ
ロセッサ５に対して圧縮音声データに付与されているＡ
ＰＴＳの取得を要求するコマンドである。getSTC（Syst
em Time Clock）は、デコード処理部１００２がＩＯプ
ロセッサ５に対してＳＴＣの取得を要求するコマンドで
ある。これらのコマンドを受けたＩＯプロセッサ５は、
デコード処理部１００２にＳＴＣ、ＶＰＴＳ、ＡＰＴＳ
をそれぞれ通知する。ＳＴＣ、ＶＰＴＳ、ＡＰＴＳは、
デコード処理部１００２において音声と映像とのデコー
ドを同期させたり、フレーム単位でデコードの進度を調
整するために用いられる。この処理は、上記フラグ９を
トリガーとする。【００５０】命令読出回路５３は、命令フェッチアドレ
スを指すプログラムカウンタ（以下ＰＣと略す）を複数
個備え、タスク管理部５８により指定されたＰＣを用い
て命令メモリ５２から命令を読み出して命令レジスタ５
４に格納する。具体的には、命令読出回路５３は、上記
タスク０〜５に対応するＰＣ０〜５を有し、タスク管理
部５８によるＰＣの指定が変更されたとき、ハードウェ
アにより高速にＰＣを切り替えるように構成されてい
る。この構成によりＩＯプロセッサ５は、タスクスイッ
チに際して現在のタスクのＰＣ値をメモリに退避し、メ
モリから次のタスクのＰＣ値を復帰する処理から解放さ
れている。【００５１】デコーダ５５は、命令メモリ５２から読み
出されて命令レジスタ５４に格納された命令を解読し、
当該命令を実行するように演算実行部５６を制御する。
加えて、デコーダ５５は、ＩＯプロセッサ５全体を、命
令読出回路５３の命令読み出しステージ、デコーダ５５
の解読ステージ、演算実行部５６の実行ステージの少な
くとも３段からなるパイプライン制御を行う。【００５２】演算実行部５６は、ＡＬＵ（Arithmetic L
ogical Unit）、乗算器、ＢＳ(Barrel Shifter)などを
有し、デコーダ５５の制御に従って、命令で指定された
演算を実行する。汎用レジスタセット群５７は、タスク
０〜タスク５に対応する６つのレジスタセット（１レジ
スタセットは４本の３２ビットレジスタと４本の１６ビ
ットレジスタ）を備えている。全部で２４本の３２ビッ
トレジスタと２４本の１６ビットレジスタとを有し、実
行中のタスクに対応するレジスタセットが使用される。
これによりＩＯプロセッサ５は、タスクスイッチに際し
て現在の全レジスタデータをメモリに退避し、メモリか
ら次のタスクのレジスタデータを復帰する処理から解放
されている。【００５３】タスク管理部５８は、所定数の命令サイク
ル数毎に、命令読出回路５３のＰＣ及び汎用レジスタセ
ット群５７のレジスタセットを切り替えることによりタ
スク切替えを行う。本実施例では上記所定数は４であ
る。またＩＯプロセッサ５は１命令を１命令サイクルで
パイプライン処理するので、タスク管理部５８は、上記
オーバーヘッドを生じることなしに４命令毎にタスクを
切り替えることになる。これにより非同期に発生する各
種の入出力要求に対して応答遅延を抑えている。つまり
入出力要求に対する応答遅延は、最大でもわずか２４命
令サイクルしか生じない。＜1.3.3.1.1 命令読出回路＞図１２は、命令読出回路
５３の詳細な構成例を示すブロック図である。【００５４】同図において命令読出回路５３は、タスク
別ＰＣ格納部５３ａ、現ＩＦＡＲ（Instruction Fetch
Address Register）５３ｂ、インクリメンタ５３ｃ、次
ＩＦＡＲ５３ｄ、セレクタ５３ｅ、セレクタ５３ｆ、Ｄ
ＥＣＡＲ（DECode Address Register）５３ｇを備え、
タスク切替えに際してオーバーヘッドなしに命令読み出
しアドレスを切り替えるように構成されている。【００５５】タスク別ＰＣ格納部５３ａは、タスク０〜
５に対応する６本のアドレスレジスタを有し、タスク毎
にプログラムカウント値を保持する。各プログラムカウ
ント値は、対応するタスクの再開アドレスである。タス
ク切替えに際して、タスク管理部５８及びデコーダ５５
の制御の下で、次に実行すべきタスクに対応するアドレ
スレジスタからプログラムカウント値が読み出され、現
に実行しているタスクに対応するアドレスレジスタのプ
ログラムカウント値が新たな再開アドレスに更新され
る。このとき、次に実行すべきタスク、現タスクは、そ
れぞれタスク管理部５８により"nexttaskid（rd add
r）"信号（以下タスクＩＤとも呼ぶ）、”taskid（wr a
ddr）”信号により指定される。【００５６】タスク０、１、２に対応するプログラムカ
ウント値を図１３のＰＣ０、１、２に示す。同図におい
て、（０−０）はタスク０の命令０を、（１−４）はタ
スク１の命令４を表す。例えば、ＰＣ０は、タスク０の
再開に際して読み出され（命令サイクルｔ０）、次のタ
スクへの切替に際して、命令（０−４）のアドレスに更
新される（命令サイクルｔ４）。【００５７】インクリメンタ５３ｃ、次ＩＦＡＲ５３
ｄ、セレクタ５３ｅからなるループ回路は、セレクタ５
３ｅにより選択された命令読み出しアドレスを更新する
回路である。セレクタ５３ｅから出力されるアドレスを
図１３のＩＦ１に示す。同図において、例えばタスク０
からタスク１への切替えに際して、セレクタ５３ｅは、
サイクルｔ４においてタスク別ＰＣ格納部５３ａから読
み出された命令（１−０）アドレスを選択し、サイクル
ｔ５〜ｔ７において次ＩＦＡＲ５３ｄからのインクリメ
ントされた命令アドレスを選択する。【００５８】現ＩＦＡＲ５３ｂは、セレクタ５３ｅの選
択出力ＩＦ１を１サイクル遅れて保持し、命令メモリ５
２に命令読み出しアドレスとして出力する。言い換えれ
ば、現在アクティブなタスクの命令読み出しアドレスを
保持する。現ＩＦＡＲ５３ｂの命令読み出しアドレス
を、図１３のＩＦ２に示す。同図に示すように、ＩＦ２
は４命令サイクル毎に異なるタスクの命令アドレスを指
している。【００５９】ＤＥＣＡＲ５３ｇは、命令レジスタ５４に
保持されている命令のアドレスを保持する。つまり、デ
コード中の命令を指す。図１３中のＤＥＣに、ＤＥＣＡ
Ｒ５３ｇに保持されたアドレスを示す。また、図１３中
のＥＸは、実行中の命令アドレスを示す。セレクタ５３
ｆは、分岐命令実行時や割込み発生時に分岐アドレスを
選択し、それ以外は次ＩＦＡＲ５３ｄのアドレスを選択
する。【００６０】このような命令読出回路５３を備えること
により、ＩＯプロセッサ５は、図１３に示すように４段
（ＩＦ１、ＩＦ２、ＤＥＣ、ＥＸ）のパイプライン処理
を行っている。このうちＩＦ１ステージは、複数プログ
ラムカウント値の選択及び更新を行うステージである。
ＩＦ２ステージは、命令を読み出すステージである。＜1.3.3.1.2 タスク管理部＞図１４は、タスク管理部
５８の詳細な構成を示すブロック図である。同図におい
てタスク管理部５８は、タスクの切替えタイミングを管
理するスロットマネージャと、タスクの順序を管理する
スケジューラとに大別される。【００６１】スロットマネージャは、カウンタ５８ａ、
ラッチ５８ｂ、比較器５８ｃ、ラッチユニット５８ｄを
有し、４命令サイクル毎にタスク切替えを指示するタス
ク切替信号（chgtaskex）を命令読出回路５３へ出力す
る。具体的には、ラッチ５８ｂは、カウンタ５８ａの出
力の下位２ビットを保持する２個のＦＦ（Flip Flop）
回路である。カウンタ５８ａは、命令サイクルを示すク
ロック毎にラッチ５８ｂの２ビットの出力値を＋１イン
クリメントした３ビットを出力する。その結果、カウン
タ５８ａは、１、２、３、４を繰り返し出力することに
なる。比較器５８ｃは、カウンタ５８ａの出力値が定数
４と一致したときにタスク切替信号（chgtaskex）を命
令読出回路５３とスケジューラとに出力する。【００６２】スケジューラは、タスクラウンド管理部５
８ｅ、プライオリティエンコーダ５８ｆ、ラッチ５８ｇ
を備え、タスク切替信号（chgtaskex）が出力されるご
とに、タスクｉｄを更新し、現在のタスクｉｄと次に実
行すべきタスクｉｄとを命令読出回路５３に出力する。
具体的には、ラッチユニット５８ｄ、ラッチ５８ｇは、
ともに現在のタスクｉｄをエンコードされた形式（３ビ
ット）で保持する。エンコードされた形式は、その値が
タスクｉｄを表す。【００６３】タスクラウンド管理部５８ｅは、タスク切
替信号（chgtaskex）が入力されたとき、ラッチユニッ
ト５８ｄを参照して、次に実行すべきタスクｉｄを、デ
コードされた形式（６ビット）で出力する。デコードさ
れた形式（６ビット）は、１ビットが１タスクに対応
し、ビット位置がタスクｉｄを表す。プライオリティエ
ンコーダ５８ｆは、タスクラウンド管理部５８ｅから出
力されるタスクｉｄを、デコードされた形式からエンコ
ードされた形式に変換する。上記ラッチユニット５８
ｄ、ラッチ５８ｇは、ともにエンコードされたタスクｉ
ｄを１サイクル遅れて保持する。【００６４】この構成により、タスクラウンド管理部５
８ｅは、比較器５８ｃからタスク切替信号（chgtaske
x）が出力されたとき、プライオリティエンコーダ５８
ｆから次に実行すべきタスクのｉｄを"nexttaskid（rd
addr）"信号として、ラッチ５８ｅから現タスクｉｄ
を”taskid（wr addr）”信号として出力する。＜1.4 動作説明＞以上のように構成された第１の実施
形態における映像音声処理装置１０００について、その
動作を説明する。【００６５】入出力処理部１００１において、ストリー
ム入力部１から非同期に入力されるＭＰＥＧストリーム
は、入出力プロセッサ５の制御によって、バッファメモ
リ２、メモリコントローラ６を介して一旦外部メモリ３
に格納され、さらに、メモリコントローラ６を介してＦ
ＩＦＯメモリ４に保持される。このときＦＩＦＯメモリ
４に対して、ＩＯプロセッサ５は、上記タスク２
（ｂ）、タスク３を実行することによりその残量に応じ
て、圧縮動画データ、圧縮音声データを供給する。これ
により、ＦＩＦＯメモリ４には過不足なく一定量の圧縮
動画データ、圧縮音声データが供給されるので、デコー
ド処理部１００２は、非同期の入出力とは切り離され
て、デコード処理に専従することができる。ここまでの
処理は、上記入出力処理部１００１により、デコード処
理部１００２とは独立に並行してなされる。【００６６】一方、デコード処理部１００２において、
ＦＩＦＯメモリ４に保持されたＭＰＥＧストリームデー
タは、以降プロセッサ７、コード変換部９、画素演算部
１０、画素読み書き部１１により復号される。ＦＩＦＯ
メモリ４以降の復号動作を示す説明図を図１５に示す。
同図では、横軸を時間軸としておおよそ１マクロブロッ
ク分のヘッダ解析及び各ブロック毎のデコードの様子を
示している。また縦方向はデコード処理部１００２の各
部においてブロック毎のデコードがパイプライン的に実
行される様子を示している。【００６７】同図に示すように、プロセッサ７は、圧縮
映像データのヘッダ解析と、圧縮音声データに対するデ
コード処理とを時分割で繰り返す。すなわち、プロセッ
サ７は、１マクロブロック分のヘッダ解析を行い、解析
結果をコード変換部９、画素演算部１０、画素読み書き
部１１に通知した後、コード変換部９に対してマクロブ
ロックのデコード開始を指示する。その後プロセッサ７
は、コード変換部９からの割込み信号が通知されるま
で、圧縮音声データのデコード処理を行う。デコード後
の音声データは内部メモリ８に一旦保持され、さらにメ
モリコントローラ６により外部メモリ３にＤＭＡ転送さ
れる。【００６８】また、コード変換部９は、プロセッサ７か
らマクロブロックのデコード開始指示を受けて、マクロ
ブロック内の各ブロック毎にバッファ２００に格納す
る。このときコード変換部９は、プロセッサ７のヘッダ
解析時に通知されたブロックのスキャンタイプに応じて
バッファ２００への書き込みアドレスの順番を変更す
る。つまりジグザグスキャンの場合と、オルタネートス
キャンの場合とで書き込みアドレスの順番を変更する。
これにより画素演算部１０は、読み出しアドレスの順番
を変更しなくてもよく、スキャンタイプに拘らず常に同
じに読み出しアドレスの順番にて読み出すことができ
る。コード変換部９は、マクロブロック内の６つのブロ
ックをＶＬＤ処理をし終えるまで上記動作を繰り返して
バッファ２００に書き出す。６ブロックのＶＬＤを終え
るとプロセッサ７に割込みを発生する。この割込み信号
は、マクロブロックデコード終了信号End Of Macro Blo
ck(EOMB)である。コード変換部９は６つ目のブロックの
ブロック終了信号End Of Block(EOB)を検出することに
よりEOMBを生成している。【００６９】画素演算部１０は、コード変換部９と並行
して、図９に示したようにバッファ２００に格納された
ブロックデータをブロック単位にＩＱ、ＩＤＣＴを施
し、その処理結果をバッファ２０１に格納する。画素読
み書き部１１は、画素演算部１０と並行して、バッファ
２０１のブロックデータと、プロセッサ７によるヘッダ
解析により通知された動きベクトルとに基づいて、図１
５に示すように外部メモリ３の参照フレームからの矩形
領域の切り出しと、ブロック合成とを行う。ブロック合
成結果は、ＦＩＦＯメモリ４を介して外部メモリ３に格
納される。【００７０】上記は、スキップマクロブロックではない
場合の動作であるが、スキップマクロブロックの場合に
はコード変換部９及び画素演算部１０は動作せず、画素
読み書き部１１のみが動作する。スキップマクロブロッ
クがある場合には、参照フレーム中の矩形領域と同じ画
像なので、画素読み書き部１１により、その画像が復号
画像として外部メモリ３にコピーされることになる。【００７１】この場合、コード変換部９からプロセッサ
７への割込み信号は次のようにして生成される。すなわ
ち、プロセッサ７が画素読み書き部１１に対して動き補
償動作の開始の制御信号を送付したことを示す信号と、
画素読み書き部１１が動き補償動作が可能であることを
示す信号と、スキップマクロブロックであることを示す
信号との論理積を取り、さらにこの論理積と上記のEOMB
信号との論理和として割込み信号がプロセッサ７に入力
される。【００７２】以上説明してきたように本発明の第１実施
形態の映像音声処理装置によれば、記憶媒体や通信媒体
からのＭＰＥＧストリーム入力処理と、表示装置及び音
声出力装置への表示画像データ及び音声データの出力処
理と、デコード処理部１００２へストリームを供給する
処理とを入出力処理部１００１が分担し、圧縮映像デー
タ及び圧縮音声データのデコード処理をデコード処理部
１００２が分担するように構成されている。これによ
り、デコード処理部１００２は、非同期に発生する処理
から解放されてデコード処理に専従することができる。
その結果、ＭＰＥＧストリーム入力、デコード、出力と
いう一連の処理を効率良く実行するので、高速な動作ク
ロックを用いなくてもＭＰＥＧストリームのフルデコー
ド（フレーム落ちなし）を実現することができる。【００７３】また、本映像音声処理装置は、１チップに
ＬＳＩ化することが望ましい。この場合、１００ＭＨｚ
以下の動作クロック（実際には５４ＭＨｚ）で上記フル
デコードが可能である。この点、動作クロックが１００
ＭＨｚさらには２００ＭＨｚを越える近年の高性能ＣＰ
Ｕは、画像サイズが小さければ上記フルデコードを可能
にしているが、その反面製造コストが高価である。これ
に対して、本映像音声処理装置は、製造コストの点とフ
ルデコードの点で優れている。【００７４】さらに、本映像音声処理装置のデコード処
理部１００２は、次のように役割分担している。つま
り、プロセッサ７が圧縮映像データに対しても圧縮音声
データに対しても多岐にわたる条件判断を必要とするヘ
ッダ解析を担当するとともに音声圧縮データのデコード
も担当する。圧縮映像データのブロックデータに対して
は、定型的な大量の演算量が要求されるので、コード変
換部９、画素演算部１０、画素読み書き部１１という専
用のハードウェア（ファームウェア）が、デコード処理
を担当する。図１５に示したようにコード変換部９、画
素演算部１０、画素読み書き部１１は、パイプライン化
されている。画素演算部１０は、ＩＱとＩＤＣＴとが並
列処理が可能になっている。画素読み書き部１１は２ブ
ロック単位の参照フレームのアクセスを実現している。
これらにより圧縮音声デコード処理の効率化が達成され
ているので、映像デコード専用のハードウェア部分は高
速クロックを用いなくとも、高い処理能力を得ることが
できる。具体的には１００ＭＨｚを越える高速クロック
を用いずに５０〜６０ＭＨｚ程度のクロックで従来と同
程度以上の処理能力が得られた。従って、高速素子を用
いる必要がなく製造コストを押さえることができる。【００７５】また、映像デコードの基本単位をプロセッ
サ７においてマクロブロック単位、コード変換部９およ
び画素演算部１０においてブロック、画素読み書き部１
１において２ブロックとしているので、映像デコードに
おける緩衝バッファの容量を最小限に抑えることが可能
となる。＜2 第２の実施形態＞本実施形態の映像音声処理装置
は、圧縮ストリームデータのデコード機能に加えて、さ
らに、圧縮機能（以降、エンコード処理と呼ぶ）とグラ
フィックス機能を果たすように構成されている。＜2.1 映像音声処理装置の構成＞図１６は、本発明の
第２の実施形態における映像音声処理装置の構成を示す
ブロック図である。【００７６】この映像音声処理装置２０００は、ストリ
ーム入出力部２１、バッファメモリ２２、ＦＩＦＯメモ
リ２４、入出力プロセッサ２５、メモリコントローラ２
６、プロセッサ２７、内部メモリ２８、コード変換部２
９、画素演算部３０、画素読み書き部３１、ビデオ出力
部１２、音声出力部１３、バッファ２００、バッファ２
０１とからなる。映像音声処理装置２０００は、図４に
示した映像音声処理装置１０００の機能に加えて、次の
機能が付加されている。すなわち、映像データと音声デ
ータの圧縮機能と、ポリゴンデータを描画するグラフィ
ックス機能とが付加されている。【００７７】そのため、映像音声処理装置２０００にお
いて、図４と同名称の構成要素は全く同じ機能を有し、
さらに、圧縮機能とグラフィックス機能を果たす機能が
付加されている。以下図４と同じ点は説明を省略し、異
なる点を中心に説明する。ストリーム入出力部２１は、
双方向になっている点が異なる。つまり、入出力プロセ
ッサ２５の制御によりバッファメモリ２２からＭＰＥＧ
データを転送されると、転送されたパラレルデータをシ
リアルデータに変換して、ＭＰＥＧデータストリームと
して外部に出力する。【００７８】バッファメモリ２２、ＦＩＦＯメモリ２４
も双方向になった点が異なる。入出力プロセッサ２５
は、第１実施形態に示した(1)〜(4)に示すの経路のデー
タ転送を制御することに加えて、(5)〜(8)の径路の転送
をも制御する。 (1)ストリーム入出力部２１→バッファメモリ２２→メモリコントロー
ラ２６→外部メモリ３ (2)外部メモリ３→メモリコントローラ２６→ＦＩＦＯメモリ２４ (3)外部メモリ３→メモリコントローラ２６→バッファメモリ２２
→ビデオ出力部１２ (4)外部メモリ３→メモリコントローラ２６→バッファメモリ２２
→音声出力部１３ (5)外部メモリ３→メモリコントローラ２６→内部メモリ２８ (6)外部メモリ３→メモリコントローラ２６→画素読み書き部３１ (7)ＦＩＦＯメモリ２４→メモリコントローラ２６→外部メモリ３ (8)外部メモリ３→メモリコントローラ２６→バッファメモリ２２
→ストリーム入出力部２１ (5)(6)の径路は、映像データ、音声データのエンコード
処理を行う場合の元のデータの径路であり、(7)(8)は、
圧縮後のＭＰＥＧストリームの径路を示す。【００７９】まず、エンコード処理について説明する。
エンコードすべきデータは外部メモリ３に格納されてい
るものとする。外部メモリ３の映像データは、メモリコ
ントローラ２６を画素読み書き部３１が制御することに
より画素読み書き部３１に転送される。画素読み書き部
３１は映像データを第２のバッファ２０１に書き込む処
理と差分画像生成処理を行なう。差分画像生成処理は、
ブロック単位の動き検出（動きベクトルの算出）と差分
画像の生成とからなる。そのため、画素読み書き部３１
は、符号化対象ブロックと類似する矩形領域と参照フレ
ーム内で探索することにより動きベクトルを検出する動
き検出回路を内部に有している。なお動き検出回路の代
わりに、隣接するフレームの既に計算済みのブロックの
動きベクトルを利用して符号化対象の動きベクトルを見
積もる動き見積回路を備えるようにしてもよい。【００８０】画素演算部２５は、ブロック単位に差分画
像データを受け取り、ＤＣＴ、ＩＤＣＴ、量子化処理
（以降、Ｑ処理）、ＩＱを行なう。こうして量子化され
た映像データはバッファ２００に格納される。コード変
換部２９は、バッファ２００から量子化データを受け取
り可変長符号処理（ＶＬＣ）を行なう。可変長符号化さ
れたデータは先入れ先出しメモリ２４に格納され、メモ
リコントローラ２６を通して外部メモリ３に格納される
とともに、プロセッサ２７によりマクロブロック毎にヘ
ッダ情報が付加される。【００８１】また、外部メモリ３の映像データは、メモ
リコントローラ２６を介して内部メモリ２８に転送され
る。プロセッサ２７は、マクロブロック毎にヘッダ情報
を付加する処理と時分割で、内部メモリ２８の音声デー
タの圧縮処理を行う。以上のように、エンコード処理
は、第１の実施形態と逆の径路で処理されることにな
る。【００８２】次に、グラフィックス処理について説明す
る。グラフィックス処理は、ポリゴンと呼ばれる矩形型
図形の組合せによって行なわれる三次元画像生成処理で
ある。本装置においてはポリゴンの頂点座標における画
素データからポリゴン内部の画素データを生成する処理
を行う。最初にポリゴンの頂点データは外部メモリ３に
格納されている。【００８３】頂点データは、プロセッサ２７がメモリコ
ントローラ２６を制御することにより内部メモリ２８に
格納される。プロセッサ２７は内部メモリ２８より頂点
データを読みだしＤＤＡ(Digital Difference Analyze)
の前処理を行ないＦＩＦＯメモリ２４に書き込む。コー
ド変換部２９は、画素演算部３０の指示に従ってＦＩＦ
Ｏメモリ２４から頂点データを読みだし画素演算部３０
に転送する。【００８４】画素演算部３０は、DDA処理を行ない画素
読み書き部３１に送信する。画素読み書き部３１は、プ
ロセッサ２７の指示に従い、Zバッファ処理あるいはα
ブレンディング処理を行ないメモリコントローラ２６を
介して外部メモリ３に画像データを書き出す。＜2.1.1 画素演算部＞図１７は、画素演算部３０の構
成を示すブロック図である。【００８５】同図は、図７に示した画素演算部１０と同
じ構成要素には同じ番号を付与し、説明を省略し、以下
異なる点を中心に説明する。異なる点は、同図のように
画素演算部３０は、図７に示した画素演算部１０に対し
て実行部が３面（５０１ａ〜５０１ｃ）になっている点
と、命令ポインタ保持部３０８と命令レジスタ３０９と
分配部３１０とが追加された点とである。【００８６】実行部５０１ａ〜５０１ｃが３面になって
いるのは、演算性能を向上させるためである。具体的に
は、グラフィックス処理においてはカラー画像ＲＧＢを
独立に並列演算する。ＩＱおよびＱ処理では、乗算器５
０２を３つ用いて高速化を図っている。ＩＤＣＴにおい
ては乗算器５０２および加減算器５０３を複数用いるこ
とによって時間短縮を図っている。ＩＤＣＴにおいては
バタフライ演算と呼ばれる演算が存在し、これは演算の
元となる全てのデータ間で依存関係があるので、実行部
５０１ａ〜５０１ｃのユニット間通信を行なうデータ線
１０３を設けている。【００８７】第１命令メモリ５０６、第２命令メモリ５
０７は、ＩＤＣＴ、ＩＱに加えてＤＣＴ、Ｑ処理、ＤＤ
Ａ用のマイクロプログラムが格納されている。図１８
に、第１命令メモリ５０６、第２命令メモリ５０７の記
憶内容の一例を示す。図８に比べてＱ処理マイクロプロ
グラムと、ＤＣＴマイクロプログラムと、ＤＤＡマイク
ロプログラムとが追加されている。【００８８】命令ポインタ保持部３０８ａ〜３０８ｃ
は、実行部５０１ａ〜５０１ｃに対応して設けられ、そ
れぞれ第１プログラムカウンタから入力されるアドレス
を変換して命令レジスタ部３０９に出力する変換テーブ
ルを有する。変換後のアドレスは、命令レジスタ部３０
９のレジスタ番号を意味する。さらに、命令ポインタ保
持部３０８ａ〜３０８ｃは、それぞれ後述するモディフ
ァイフラグを保持し命令実行部５０１ａ〜５０１ｃに出
力する。【００８９】変換テーブルについては命令ポインタ保持
部３０８ａ、３０８ｂ、３０８ｃは、例えば入力アドレ
スが1,2,3,4,5,6,7,8,9,10,11,12である場合に、それぞ
れ次のような変換後アドレスを出力する。命令ポインタ保持部３０８ａ:1,2,3,4,5,6,7,8,9,10,1
1,12 命令ポインタ保持部３０８ｂ:2,1,4,3,6,5,8,7,10,9,1
2,11 命令ポインタ保持部３０８ｃ:4,3,2,1,8,7,6,5,12,11,1
0,9 命令レジスタ部３０９は、図２３に示すように、マイク
ロ命令を保持する複数のレジスタ３つのセレクタと３つ
の出力ポートとからなる。３つのセレクタは、命令ポイ
ンタ部３０８ａ、３０８ｂ、３０８ｃから入力される変
換アドレス（レジスタ番号）に指定されるレジスタのマ
イクロ命令を選択する。３つの出力ポートは、セレクタ
に対応して設けられ、それぞれセレクタに選択されたマ
イクロ命令を分配部３１０を介して実行部５０１ａ〜５
０１ｃに出力する。３つのセレクタ及び出力ポートが設
けられているのは、３つの加減算器５０３（又は３つの
乗算器５０２）に同時に異なるマイクロ命令を供給する
ためである。本実施例では３つの出力ポートは、分配部
３１０を介して３つの加減算器５０３と３つの乗算器５
０２の何れかに選択的に供給するものとする。【００９０】例えば、命令レジスタ部３０９はレジスタ
Ｒ１〜Ｒ１６（レジスタ番号１〜１６）を備えている。
レジスタＲ１〜Ｒ１６に格納されているマイクロプログ
ラムは、ＤＣＴ及びＩＤＣＴにおいて必要な行列演算処
理を表し、上記の３つのレジスタ番号順のいずれによっ
ても同一処理を行うように格納されている。つまり、上
記３つの実行順をもつマイクロプログラムは、実行順序
が可換な一部のマイクロ命令の順序が入れ換えられてい
る。これは、実行部５０１ａ〜５０１ｃが並列にマイク
ロプログラムを実行するので、実行部５０１ａ〜５０１
ｃ間でレジスタ（図外）アクセスの競合など資源干渉を
回避するためである。また、上記行列演算処理は、８×
８行列の乗算、転置、転送をその内容とする。【００９１】次に、命令レジスタ部３０９の各レジスタ
に格納されるマイクロ命令はニーモニック形式では、
「ｏｐＲｉ，Ｒｊ，ｄｅｓｔ，（モディファイフラ
グ）」と表記される。ただし命令レジスタ部３０９のマ
イクロ命令は、「ｏｐとＲｉ，Ｒｊと（モディファイフ
ラグ）」の部分だけである。「ｄｅｓｔ」の部分は命令
メモリ５０６、５０７から指定される。「（モディファ
イフラグ）」の部分命令ポインタ保持部３０８ａ〜３０
８ｃから指定される。【００９２】ここで、”ｏｐ”は乗算命令、加減算命
令、転送命令などを示すオペレーションコード、”Ｒ
ｉ，Ｒｊ”はオペランドである。乗算命令は、３つの実
行部５０１ａ〜ｃ中の各乗算器５０２に実行される命令
であり、加算命令及び転送命令は、３つの実行部５０１
ａ〜ｃ中の各乗算器５０２に実行される命令である。”
ｄｅｓｔ”は演算結果の格納先を示す。この”ｄｅｓ
ｔ”は命令レジスタ部３０９のレジスタではなく、命令
メモリ５０６（乗算命令の場合）又は命令メモリ５０７
（加減算命令や転送命令の場合）から指定される。これ
は、命令レジスタ部３０９のマイクロプログラムを実行
部５０１ａ〜５０１ｃに共通化するためである。もし転
送先をレジスタにより指定すれば実行部５０１ａ〜５０
１ｃそれぞれに個別のマイクロプログラムを用意する必
要があり、マイクロプログラムの容量が数倍に膨らむこ
とになる。【００９３】”モディファイフラグ”は、加減算命令に
おいて、加算であるか減算であるかを示すフラグであ
る。この”モディファイフラグ”は、命令レジスタ部３
０９のレジスタからではなく、命令ポインタ保持部３０
８ａ〜ｃから別途指定される。これは、ＤＣＴ、ＩＤＣ
Ｔでの行列演算に用いられる定数行列中に全要素が”
１”の行（又は列）と全要素が”−１”行（又は列）と
が含まれるので、命令ポインタ３０８ａ〜ｃから”モデ
ィファイフラグ”を指定することにより、命令レジスタ
部３０９の同一マイクロプログラムを共用することを可
能にしている。【００９４】分配部３１０は、命令レジスタ部３０９か
ら入力される３つのマイクロ命令が加減算命令である場
合には、それらの「ｏｐとＲｉ，Ｒｊ」の部分と、命令
メモリ５０６から入力される「ｄｅｓｔ」の部分と、命
令ポインタ部３０８ａ〜ｃから入力される「モディファ
イフラグ」とを３つの加減算器５０３に分配し、同時に
命令メモリ５０６のマイクロ命令を３つの乗算器５０２
に分配する。また、分配部３１０は、命令レジスタ部３
０９から入力される３つのマイクロ命令が乗算命令であ
る場合には、それらの「ｏｐとＲｉ，Ｒｊ」の部分とを
命令メモリ５０６から入力される「ｄｅｓｔ」の部分と
を３つの乗算器５０２に分配し、、命令メモリ５０７の
マイクロ命令を３つの加減算器５０３に分配する。言い
換えれば、分配部３１０により、３つの加減算器５０３
に供給されるマイクロ命令は、３つの加減算器５０３に
共通する命令については命令メモリ５０７から１つのマ
イクロ命令がそれぞれに供給され、３つの加減算器５０
３で異なる加減算命令については命令レジスタ部３０９
からの３つのマイクロ命令がそれぞれに供給される。同
様に、３つの乗算器５０２に供給されるマイクロ命令
は、３つの乗算器５０２に共通する命令については命令
メモリ５０６からマイクロ命令が供給され、３つの乗算
器５０２で異なる乗算算命令については命令レジスタ部
３０９からのマイクロ命令がそれぞれに供給される。【００９５】画素演算部３０のこのような構成によれ
ば、命令メモリ５０６、命令メモリ５０７の記憶容量を
削減することができる。もし、画素演算部３０が命令ポ
インタ保持部３０８ａ〜ｃ、命令レジスタ部３０９、分
配部３１０を備えていないと仮定すると、命令メモリ５
０６、命令メモリ５０７はいずれも、３つの実行部５０
１ａ〜ｃに対して異なるマイクロ命令を供給するには、
３つのマイクロ命令を並列に記憶しなければならない。【００９６】図２２に命令ポインタ保持部３０８ａ〜
ｃ、命令レジスタ部３０９、分配部３１０を備えていな
い場合の命令メモリ５０６及び命令メモリ５０７の記憶
内容の一例を示す。同図では、１６ステップのマイクロ
プログラムが記憶され、１つのマイクロ命令は１６ビッ
ト長としている。この場合、命令メモリ５０６と命令メ
モリ５０７は、３つのマイクロ命令を並列に記録するこ
とから、合計１５３６ビット（１６ステップ×１６ビッ
ト×３×２）の記憶容量を必要とする。【００９７】これに対して、本実施例の画素演算部３０
における、命令ポインタ保持部３０８ａ〜ｃ、命令レジ
スタ部３０９の記憶内容の一例を図２３に示す。同図で
も１６ステップのマイクロプログラムが記憶され、１マ
イクロ命令は１６ビットとしている。同図において、命
令ポインタ保持部３０８ａ〜ｃは、それぞれ１６個のレ
ジスタ番号（４ビット長）を記憶し、命令レジスタ部３
０９は１６個のマイクロ命令を記憶する。この場合、命
令ポインタ保持部３０８ａ〜ｃと命令レジスタ部３０９
との記憶容量は４４８ビット（１６ステップ×（１２＋
１６））でよい。このように画素演算部３０では、マイ
クロプログラムの記憶容量を大幅に削減することができ
る。実際には、「ｄｅｓｔ」「モディファイフラグ」が
別途発行されるようにしているので、その分の記録容量
又は回路が必要である。また、命令メモリ５０６、５０
７はマイクロ命令中の「ｄｅｓｔ」を指定し、また、実
行部５０１ａ〜ｃに共通する乗算命令、加減算命令を発
行するようにしているので、命令メモリ５０６、５０７
を完全に削除することまではしていない。もし、命令レ
ジスタ部３０９に６つの出力ポートを設ければ、命令メ
モリ５０６と命令メモリ５０７とを削除することも可能
になる。【００９８】なお、図２３では、命令ポインタ保持部３
０８ａ〜３０８ｃは、第１プログラムカウンタの値が０
〜１５の場合に、変換アドレス（レジスタ番号）を出力
しているが、これに限らない。例えば第１プログラムカ
ウンタの値が３２〜４７の場合に変換アドレスを出力す
るようにしてもよい。この場合、第１プログラムカウン
タの値に適切なオフセット値を加える構成とすればよ
い。これにより、第１プログラムカウンタが示す任意の
アドレス列を変換アドレスに変換することができる。【００９９】以上の構成により、本実施形態では圧縮映
像データと圧縮音声データのデコード処理だけでなく、
映像および音声データのエンコード処理と、ポリゴンデ
ータに基づくグラフィックス処理とが可能となってい
る。また、複数の実行部の並列動作により処理効率が向
上している。しかも、命令レジスタ部３０８ａ〜３０８
ｃにおいて一部のマイクロ命令の順序を入れ換えたこと
により、複数の実行部間の資源干渉を回避することがで
きるので、さらに処理効率を向上させている。【０１００】なお、上記実施形態では３つの実行部を有
する構成を示しているのは、ＲＧＢカラーのそれぞれを
独立に演算できる点で有利だからである。さらに実行部
の数は、３つ以上あればいくつでもよい。また、上記実
施形態において映像音声処理装置１０００、２０００
は、それぞれ１チップＬＳＩ化することが望ましい。さ
らに外部メモリ３は、チップ外部であるものとして説明
したが、１チップ内に内臓する構成としてもよい。【０１０１】また、上記実施形態では外部メモリに対し
てストリーム入出力部１（あるいはストリーム入出力部
２１）が、ＭＰＥＧストリーム（あるいは映像音声デー
タ）を格納していたが、ホストプロセッサが直接外部メ
モリ３に格納するように構成してもよい。さらに、上記
実施形態においてＩＯプロセッサ５は、４命令サイクル
毎にタスク切替えを行っているが、４命令サイクル以外
の複数命令サイクル毎であってもよい。また、タスク切
替えの命令サイクル数は、タスク毎に予め重み付けをし
て異なる命令サイクル数にしておいてもよい。また優先
度・緊急度に応じてタスク毎の命令サイクル数に重み付
けを行ってもよい。【０１０２】【発明の効果】本発明の映像音声処理装置は、圧縮音声
データと圧縮映像データとを含むデータストリームを外
部から入力、デコードし、デコードしたデータを出力装
置に出力する映像音声処理装置であって、外部要因によ
り非同期に発生する入出力処理を行う入出力処理手段
と、前記入出力処理と並行して、メモリに格納されたデ
ータストリームのデコードを主とするデコード処理を行
うデコード処理手段とを備え、前記デコード処理手段に
よりデコードされた映像データ、デコードされた音声デ
ータはメモリに格納され、前記入出力処理は、外部から
非同期に入力される前記データストリームを入力し、さ
らにメモリに格納することと、メモリに格納されたデー
タストリームをデコード処理手段に供給することと、外
部の表示装置、音声出力装置それぞれの出力レートに合
わせてメモリから読み出し、それらに出力することとを
入出力処理として行うように構成されている。【０１０３】この構成によれば、入出力処理手段とデコ
ード処理手段とがパイプライン的に並列動作することに
加えて、非同期処理とデコード処理とを入出力処理手段
とデコード処理手段とに分担させるので、デコード処理
手段は非同期に発生する処理から解放されてデコード処
理に専従することができる。その結果、本映像音声処理
装置は、ストリームデータ入力、デコード、出力という
一連の処理を効率良く実行するので、ストリームデータ
のフルデコード（フレーム落ちなし）を高速な動作クロ
ックを用いなくても可能にしている。【０１０４】また、前記デコード処理手段は、データス
トリームに対して、条件判断を主とする逐次処理であっ
て、圧縮音声データ及び圧縮映像データのヘッダ解析
と、圧縮音声データのデコードとを含む逐次処理を行な
う逐次処理手段と、前記逐次処理と並行して、定型処理
を行う。定型処理は、圧縮映像データのヘッダ解析を除
く圧縮映像データのデコードである定型処理手段とを備
える構成としてもよい。【０１０５】この構成によれば、処理特性の異なる逐次
処理と並列処理に適した定型処理とを１つのユニットに
併存させることを解消することにより、処理効率を大幅
に向上させることができる。特に、定型処理手段の処理
効率を向上させることができる。なぜなら本映像音声処
理装置において、定型処理手段は上記の非同期処理及び
逐次処理から解放されたことから、圧縮映像データのデ
コードに要求される定型的な種々演算のみに専従できる
るからである。その結果、高速な動作クロックを用いな
くても高い処理能力を得ることができる。【０１０６】さらに、前記入出力処理手段は、外部から
非同期データストリームを入力する入力手段と、外部の
表示装置にデコードされた映像データを出力する映像出
力手段と、外部の音声出力装置にデコードされた音声デ
ータを出力する音声出力手段と、命令メモリに格納され
た第１から第４のタスクを切替えながら実行するプロセ
ッサとを有し、前記第１タスクは入力部から前記メモリ
にデータストリームを転送するプログラムであり、前記
第２タスクは前記メモリからデコード処理手段にデータ
ストリームを供給するプログラムであり、前記第３タス
クは前記メモリから映像出力部にデコードされた映像デ
ータを出力するプログラムであり、前記第４タスクは前
記メモリから音声出力部にデコードされた音声データを
出力するプログラムであると構成してもよい。【０１０７】ここで、前記プロセッサは、前記第１から
第４タスクに対応する少なくとも４つのプログラムカウ
ンタを有するプログラムカウンタ部と、１つのプログラ
ムカウンタが指す命令アドレスを用いて、各タスクプロ
グラムを記憶する命令メモリから命令を取り出す命令フ
ェッチ部と、命令取出部に取出された命令を実行する命
令実行部と、所定数の命令サイクルが経過する毎に、命
令フェッチ部に対してプログラムカウンタを順次切替え
るように制御するタスク制御部とを有する構成としても
よい。【０１０８】この構成によれば、外部装置により定まる
ストリームデータの入力レート及び入力周期、外部表示
装置、外部音声出力装置により定まる映像データ、音声
データそれぞれの出力レート及び出力周期がどのような
範囲であっても、入出力要求に対する応答遅延が極めて
小さいという効果がある。また、本発明の映像音声処理
装置は、圧縮音声データと圧縮映像データとを含むデー
タストリームを入力する入力手段と、データストリーム
に対して、条件判断を主とする逐次処理であって、デー
タストリーム中の所定ブロック単位に付加されたヘッダ
情報の解析と、データストリーム中の圧縮音声データの
復号とを行なう逐次処理手段と、定型演算を主とする定
型処理であって、ヘッダ解析の結果を用いてデータスト
リーム中の圧縮映像データを、前記逐次処理と並行し
て、所定ブロック単位に復号する定型処理手段とを備
え、前記逐次処理手段は前記所定ブロックのヘッダ解析
が終了したとき、定型処理手段に当該所定ブロックのデ
コード開始を指示し、定型処理手段から所定ブロックの
デコード終了通知を受けたとき、次の所定ブロックのヘ
ッダ解析を開始するように構成してもよい。【０１０９】この構成によれば、逐次処理手段が圧縮映
像データに対しても圧縮音声データに対しても多岐にわ
たる条件判断を必要とするヘッダ解析を担当するととも
に音声圧縮データのデコードも担当する。一方、定型処
理手段は、圧縮映像データのブロックデータに対する、
定型的な大量の演算量を担当する。このような役割分担
により、また逐次処理手段は映像デコードに比較して演
算量が少ない音声デコード全般と、圧縮映像データのヘ
ッダ解析と、定型処理手段の制御とを行う。その制御の
下で、定型処理手段は、専ら定型的な演算を行うので、
無駄のない効率的な処理を実現できる。それゆえ高い周
波数で動作させなくても処理能力を得ることができ、製
造コストを低減させることができる。また、逐次処理手
段は、音声デコード全般と、圧縮映像データのヘッダ解
析と、定型処理手段の制御とを順次行うので、１プロセ
ッサにて構成できる。【０１１０】また、前記定型処理手段は、逐次処理手段
の指示に従ってデータストリーム中の圧縮映像データを
可変長復号するデータ変換手段と、可変長復号により得
られた映像ブロックに対して、所定の演算を施すことに
より逆量子化および逆離散余弦変換を行う演算手段と、
逆離散余弦変換後の映像ブロックと復号済みのブロック
を合成することにより動き補償処理を行って映像データ
を復元する合成手段とを有し、前記逐次処理手段は、デ
ータ変換手段により可変長復号されたヘッダ情報を取得
する取得手段と、取得されたヘッダ情報を解析する解析
手段と、解析結果として得られるパラメータを定型処理
手段に通知する通知手段と、入力手段により入力された
データストリーム中の圧縮音声データを復号する音声復
号手段と、前記定型処理手段から所定ブロックのデコー
ド完了を通知する割込み信号を受けたとき、音声復号手
段の動作を停止するとともに取得手段を起動し、前記通
知手段が前記通知をしたとき、前記データ変換手段に映
像ブロックの可変長復号の開始を指示する制御手段とを
有するように構成してもよい。【０１１１】この構成によれば、マクロブロックなど所
定ブロック単位に逐次処理手段は、ヘッダ解析を行った
後音声デコードを行い、定型処理手段により所定ブロッ
クのデコードが完了したとき次のブロックのヘッダ解析
を開始する。このように逐次処理手段は時分割でヘッダ
解析と音声デコードとを繰り返すので１個のプロセッサ
にて低コストで実現することができる。また、定型処理
手段は多岐にわたる条件判断処理をする必要がないの
で、低コストで専用ハードウェア（或はハードウェアと
ファームウェア）化することができる。【０１１２】ここで、前記演算手段は、さらに１ブロッ
クに相当する記憶領域を有する第１バッファを有し、前
記データ変換手段は、データストリーム中の圧縮映像デ
ータを可変長復号する可変長復号手段と、第１バッファ
の記憶領域のアドレスをジグザグスキャン順に並べた第
１アドレス列を記憶する第１アドレステーブル手段と、
第１バッファの記憶領域のアドレスをオルタネートスキ
ャン順に並べた第２アドレス列を記憶する第２アドレス
テーブル手段と、第１アドレス列と第２アドレス列の一
方に従って、可変長復号手段の可変長復号により得られ
るブロックデータを第１バッファに書き込む書き込み手
段とを有する構成としてもよい。【０１１３】この構成によれば、書込み手段は、ジグザ
グスキャンとオルタネートスキャンのどちらにも対応し
て、第１バッファの記憶領域にブロックデータを書き込
むことができる。従って演算手段は、第１バッファの記
憶領域からブロックデータ読み出すときに、読み出しア
ドレスの順番を変更しなくてもよく、スキャンタイプに
拘らず常に同じに読み出しアドレスの順番にて読み出す
ことができる。【０１１４】さらに、前記解析手段は、ヘッダ情報に基
づいて量子化スケールと動きベクトルとを算出し、前記
通知手段は、量子化スケールを演算手段に、動きベクト
ルを合成手段に通知するように構成してもよい。この構
成によれば、動きベクトルの算出を逐次処理手段に担当
させることができ、合成手段は算出された動きベクトル
を用いて定型的に動き補償処理を行うことができる。。【０１１５】また、前記演算手段は、それぞれマイクロ
プログラムを記憶する第１、第２の制御記憶部と、第１
制御記憶部に第１読出アドレスを指定する第１プログラ
ムカウンタと、第２読出アドレスを指定する第２プログ
ラムカウンタと、第１読出アドレスと第２読出アドレス
との一方を選択して第２制御記憶部に出力するセレクタ
と、乗算器と加算器とを有し、第１、第２制御記憶部に
よるマイクロプログラム制御によりブロック単位の逆量
子化と逆離散余弦変換とを実行する実行部とを有する構
成としてもよい。【０１１６】この構成によれば、マイクロプログラム
（ファームウェア）は多岐にわたる条件判断処理を行う
必要がなく、定型的な処理を実現するだけなのでプログ
ラムサイズが小さくかつ作成が容易であり、低コスト化
に適している。しかも、２つのプログラムカウンタを使
用して乗算器と加算器とを独立して並列に動作させるこ
とができる。【０１１７】さらに、前記実行部は、セレクタにより第
２読出アドレスが選択されたとき、乗算器を用いた処理
と加算器を用いた処理とを独立並行して行い、セレクタ
により第１読出アドレスが選択されたとき、乗算器を用
いた処理と加算器を用いた処理とを連動させて行うよう
構成してもよい。この構成によれば、乗算器及び加算器
の遊び時間を減らして処理効率を向上させることができ
る。【０１１８】ここで、前記演算手段は、さらに、データ
変換手段からの映像ブロックを保持する第１バッファ
と、実行部により逆離散余弦変換されたブロックを保持
する第２バッファとを有し、前記第１制御記憶部は、逆
量子化処理するマイクロプログラムと、逆離散余弦変換
するマイクロプログラムとを記憶し、前記第２制御記憶
部は、逆離散余弦変換するマイクロプログラムと、逆離
散余弦変換された映像ブロックを第２バッファに転送す
るマイクロプログラムとを記憶し、前記実行手段は、逆
離散余弦変換された映像ブロックを第２バッファに転送
する処理と、次の映像ブロックを逆量子化する処理とを
並列に実行し、逆量子化された当該映像ブロックを逆離
散余弦変換する処理を乗算器と加算器とを連動させて実
行するように構成してもよい。【０１１９】この構成によれば、逆量子化処理と第２バ
ッファへの転送処理とを並列実行するので処理効率を向
上させることができる。また、前記入力手段は、さらに
ポリゴンデータを入力し、前記逐次処理手段は、さらに
ポリゴンデータを解析してポリゴンの頂点座標とエッジ
の傾きとを算出し、前記定型処理手段は、さらに算出さ
れた頂点座標と傾きと従って、前記ポリゴンの画像デー
タを生成するように構成してもよい。【０１２０】この構成によれば、逐次処理手段はポリゴ
ンデータの解析を担当し、定型処理手段は定型的な画像
データ生成処理を担当する。本映像音声処理装置は、効
率よくポリゴンデータから画像データを生成するグラフ
ィックス処理を行うことができる。ここで、前記第１、
第２制御記憶部は、さらにＤＤＡアルゴリズムによる走
査変換を行うマイクロブログラムを記憶し、前記実行部
は、さらに逐次処理手段により算出された頂点座標と傾
きとに基づいてマイクロプログラム制御により走査変換
を行うように構成してもよい。【０１２１】この構成によれば、画像データの生成は前
記第１、第２制御記憶部に走査変換マイクロプログラム
により簡単に実現することができる。また、前記合成手
段はさらに圧縮すべき映像データから差分画像を表す差
分ブロックを生成し、前記第２バッファはさらに生成さ
れた差分画像を保持し、第１制御記憶部はさらに離散余
弦変換するマイクロプログラムと量子化処理するマイク
ロプログラムとを記憶し、第２制御記憶部はさらに離散
余弦変換するマイクロプログラムと離散余弦変換された
映像ブロックを第１バッファに転送するマイクロプログ
ラムとを記憶し、前記実行手段はさらに第２バッファに
保持された差分ブロックに対して離散余弦変換と量子化
を実行して第１バッファに転送し、前記データ変換手段
はさらに第１バッファのブロックに対して可変長符号化
を行い、前記逐次処理手段はさらにデータ変換手段によ
り可変長符号化された所定のブロックに対してヘッダ情
報を付加するように構成してもよい。【０１２２】この構成によれば、定型処理手段は定型的
な処理として量子化と離散余弦変換を担当し、逐次処理
手段は条件判断を要する処理（ヘッダ情報の付加）を担
当する。この場合、本映像音声処理装置は、高速クロッ
クを用いなくても画像データから圧縮映像データへのエ
ンコード処理を効率よく実行することができる。また、
前記演算手段は、それぞれマイクロプログラムを記憶す
る第１、第２の制御記憶部と、第１制御記憶部に第１読
出アドレスを指定する第１プログラムカウンタと、第２
読出アドレスを指定する第２プログラムカウンタと、第
１読出アドレスと第２読出アドレスとの一方を選択して
第２制御記憶部に出力するセレクタと、乗算器と加算器
とをそれぞれ有し、第１、第２制御記憶部によるマイク
ロプログラム制御によりブロック単位の逆量子化と逆離
散余弦変換とを実行する複数の実行部とを備え、各実行
部は、ブロックを分割した部分ブロックを分担して処理
するように構成してもよい。【０１２３】この構成によれば、複数の実行部が並列に
演算命令を実行するので、定型的な大量の演算を画素レ
ベルで並列化して効率よく実行することができる。ま
た、前記演算手段は、さらに、各実行部に対応して設け
られ、各変換テーブルは所定のアドレス列に対応して部
分的にアドレス順序を入れ換えた変換アドレス保持する
複数のアドレス変換テーブルと、所定の演算を実現する
マイクロプログラムを構成する個々のマイクロ命令を変
換アドレスに対応させて記憶する複数レジスタからなる
命令レジスタ群と、第１及び第２制御記憶部と複数の実
行部との間に設けられ、第１制御記憶部又はセレクタか
ら各実行部に出力されるマイクロ命令を、命令レジスタ
のマイクロ命令に切り替えて複数の実行部に出力する切
り替え部とを備え、前記第１読出アドレス又は第２読出
アドレスが前記所定のアドレス列の中のアドレスである
場合、そのアドレスは前記各アドレス変換テーブルによ
って変換アドレスに変換される。前記命令レジスタ群
は、変換テーブルから出力された各変換アドレスに対応
するマイクロ命令を出力するように構成してもよい。【０１２４】この構成によれば、複数の実行部が並列に
マイクロプログラムを実行する間、実行部間でアクセス
の競合など資源干渉を回避して、さらに効率よく処理す
ることができる。ここで、前記各変換テーブルは、さら
に第１プログラムカウンタが前記所定のアドレス列中の
第１読出アドレスを出力する間、前記レジスタ中の加減
算を示すマイクロ命令出力に伴って、加算すべきか減算
すべきかを示すフラグを前記複数の実行部に出力し、前
記各実行部は、前記フラグに従って加減算を実行し、前
記フラグは、前記第２制御記憶部のマイクロ命令に従っ
て設定されるように構成してもよい。【０１２５】この構成によれば、マイクロ命令により加
算を行うか減算を行うかを変換テーブルが指定するの
で、同じマイクロプログラムを２通りに共用できるの
で、さらに、マイクロプログラムの全容量を低減させる
ことができ、ハードウェア規模の低減、ひいては低コス
ト化を実現できる。また、前記第２制御記憶部は、さら
に第１プログラムカウンタが前記所定のアドレス列中の
第１読出アドレスを出力する間、前記レジスタ中のマイ
クロ命令出力に伴って、マイクロ命令実行結果の格納先
を示す情報を前記複数の実行部に出力し、前記各実行部
は、格納先情報に従って実行結果を格納するように構成
してもよい。【０１２６】この構成によれば、格納先情報は、命令レ
ジスタ群中のマイクロプログラムと別個に指定できるの
で、当該マイクロプロラムを異なる処理例えば行列演算
の部分的な処理において共用することができる。その結
果、さらに、マイクロプログラムの全容量を低減させる
ことができ、ハードウェア規模の低減、ひいては低コス
ト化を実現できる。Description: BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to digital signal processing.
And belongs to the technical field of
Decompression of audio data, compression of video and audio data, graph
The present invention relates to an image processing apparatus that performs a pix process. [0002] In recent years, compression / expansion of digital moving image data has been performed.
Long technology has been established and LSI technology has improved
Decompression of compressed video and audio data
Longer decoder, encoder to compress video and audio data
And graphics processing that performs graphics processing.
Any of various video and audio processing apparatuses is regarded as important. As a first conventional technique, MPEG (Moving)
Picture Experts Group) standard compressed video and audio data
Video / audio decoder for expanding data (Japanese Patent Laid-Open No.
29). This video / audio decoder has one signal processor.
Video decoding and audio decoding using the processing unit
I do. FIG. 1 shows the decoding by the video / audio decoder.
FIG. 4 shows an explanatory diagram of the processing. The vertical axis in the figure is time, and the horizontal axis is calculated
It represents the quantity. When viewed along the vertical axis, video decoding
And audio decoding are performed alternately. This is a common
To decode both video and audio with hardware
is there. As shown in the figure, video decoding consists of sequential processing and block
Process. For sequential processing, data other than blocks
Code, ie header analysis of MPEG stream
This is a process that requires a wide range of condition judgments, and its operation
The amount is small. Block decoding is an MPEG stream
Is decoded and further dequantized in block units,
This is the process of performing inverse DCT (discrete cosine transform), and its operation
The amount is large. As shown in the figure, audio decoding is also diverse.
Sequential processing similar to the above, which requires
Decoding of the data itself. Audio data
The decoding process of the main unit requires higher accuracy than image data.
Must be processed and processed within a limited time
Therefore, it is necessary to perform high-speed processing with high accuracy,
Is big. [0005] As described above, the first prior art is one chip.
And can be implemented with as little hardware as one chip.
It achieves efficient audio-video decoding. Second prior art
As a technique, there is a two-chip decoder. One chip is
Video decoder, one other chip used as audio decoder
Can be FIG. 2 shows decoding by a two-chip decoder.
FIG. 4 shows an explanatory diagram of the processing. Both video decoder and audio decoder
Processing, which includes a number of conditions such as header analysis,
Block decoding processing, which mainly decodes the data
Do. Process video decoder and audio decoder independently
Therefore, the performance of each chip is higher than that of the first prior art.
May be lower. [0006] However, the above-mentioned prior art
According to the technology, there were the following problems. First conventional technique
According to the technique, the signal processing unit decodes both video and audio
High processing capacity is required.
In other words, it operates using a high-speed clock of 100 MHz or more.
Cost is high for semiconductors for consumer use
There is a problem. Processing without using high-speed clock
VLIW (Very Long Instruction)
Word) Processor etc. can not be considered.
However, the cost of the VLIW processor itself is high and
In addition, unless a processor that performs sequential processing is used separately,
There is a problem that the processing becomes inefficient. According to the second prior art, two processors are used.
However, there is a problem that the cost is high because of the use of the heat sink. Toes
Video and audio processors
Using low-cost general-purpose inexpensive processors
Can not. Because a video processor is
Do you need the ability to process image data in real time?
It is. The audio processor is also used for the video processor.
Although it does not require as much computational complexity as
Data requires higher accuracy than image data.
is there. Therefore, inexpensive or low-processing
Is required for both video and audio
Does not meet processing capacity. A digital (satellite) broadcasting tuner
(STB (Set Top Box)) or DVD (D
igital Versatile / Video Disc)
The video / audio processing device is used in the AV decoder
Received from the broadcast wave or disc
Input the MPEG stream read from the
Decode the EG stream and finally display,
Output the video signal and audio signal to the speaker, etc.
The amount of series of processing required by the system is enormous. Most
Recently, such a huge series of processes have been executed efficiently.
There is an increasing demand for video and audio processing devices. According to the present invention, compressed image and compressed audio data are
Stream data input, decode, and output
And perform high processing without operating at high frequency.
With the ability to reduce manufacturing costs.
It is an object to provide an image and sound processing device. Again
The other purpose of this is to decode compressed video data,
Data encoding and graphics processing at low cost
To provide a video and audio processing device. [0010] To solve the above-mentioned problems,
The video and audio processing apparatus of the present invention is capable of
A data stream including video data is input from outside,
Decode and output the decoded data to an output device
An input / output device that occurs asynchronously due to external factors
Input / output processing means for performing processing, and
To decode the data stream stored in memory.
Decoding means for performing main decoding processing.
The video data decoded by the decoding processing means.
Data and decoded audio data are stored in memory,
The input / output processing is performed on the data input asynchronously from the outside.
Input data stream and store it in memory
And decode the data stream stored in the memory.
External display device and audio output device.
Read from memory according to output rate
And output to them as input / output processing.
It is configured as follows. According to this configuration, the input / output processing means and the deco
The code processing means operates in parallel in a pipeline
In addition, asynchronous processing and decoding processing are performed by input / output processing means.
And decoding processing means.
The means is released from the processing that occurs asynchronously and
Be able to devote themselves to reasoning. As a result, this video and audio processing
The devices are called stream data input, decode, and output.
Stream processing is performed efficiently, so stream data
Full decoding (no dropped frames) of high-speed operation
This is possible without using a lock. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS A video / audio processing apparatus according to the present invention will be described.
The embodiments will be described in the following sections. 1 First embodiment 1.1 Schematic configuration of video / audio processing device 1.1.1 Input / output processing unit 1.1.2 Decoding processing unit 1.1.2.1 Sequential processing unit 1.1.2.2 Standard processing unit 1.2 Configuration of video / audio processing device 1.2.1 Input Configuration of output processing unit 1.2.2 Decoding processing unit 1.2.2.1 Sequential processing unit 1.2.2.2 Routine processing unit 1.3 Detailed configuration of each unit 1.3.1 Processor 7 (sequential processing unit) 1.3.2 Routine processing unit 1.3.2.1 Code conversion unit 1.3.2.2 Pixel operation unit 1.3.2.3 Pixel read / write unit 1.3.3 Input / output processing unit 1.3.3.1 IO processor 1.3.3.1.1 Instruction readout circuit 1.3.3.1.2 Task management unit 1.4 Operation description 2 Second embodiment 2.1 Configuration of Video / Audio Processing Device 2.1.1 Pixel Calculation Unit <1. First Embodiment> Video / audio processing in this embodiment
The control device is a satellite broadcast receiver (STB: Set TopBox)
), DVD (Digital Versatile Disc) playback device,
Compressed video provided in DVD-RAM recording / playback devices, etc.
/ MP from audio or DVD as audio data
EG stream is input and decompressed (hereinafter simply decoded
To output video and audio signals to external output
Output to the device. <1.1 Schematic Configuration of Video / Audio Processing Apparatus> FIG.
Configuration of Video / Audio Processing Device in First Embodiment of FIG.
FIG. The video / audio processing apparatus 1000 performs input / output processing.
Unit 1001, decode processing unit 1002, memory controller
And input / output processing and decoding processing are separated.
It is configured to be performed in parallel. Also, external memory
3 is for MPEG stream and audio data after decoding
Temporary working memory, decoded video data
It is used as a frame memory for storing data. <1.1.1 Input / output processing unit> The input / output processing unit 1001
Occurs asynchronously with the internal operation of the image / audio processing device 1000.
Perform input / output processing. This input / output processing is performed by (a) external
Input an MPEG stream that is input asynchronously
Temporary storage in the external memory 3, (b) external memory
Decodes the MPEG stream stored in
002, (c) decoded video data
Data from the external memory 3 and the external table.
Display device and audio output device (not shown)
The content should be output together. <1.1.2 Decoding processing unit> Decoding processing unit 1002
Is independent and parallel to the operation of the input / output processing unit 1001,
MPEG stream supplied by the input / output processing unit 1001
Ream decoding, video data and audio after decoding
The data is stored in the external memory 3. MPEG stream
Decoding processing requires a large amount of computation and a wide variety of processing contents
Therefore, the decoding processing unit 1002
3. Equipped with a standard processing unit 1004, and a wide variety of condition judgments
Mainly for sequential processing and routine mass processing
And parallel processing separated from routine processing suitable for parallel operation
It is configured to be. Here, the sequential processing is MP
EG stream header analysis etc., header detection
And a number of conditional determinations such as header content determination. Again
Type processing is performed in units of blocks consisting of a predetermined number of pixels.
Calculation, so it can be used for pipeline-like parallel processing.
Suitable and completely different data (pixels)
For parallel processing such as vector operation to perform the same operation
Are suitable. <1.1.2.1 Sequential processing unit> The sequential processing unit 1003
Compressed audio data and pressure supplied from the force processing unit 1001
The header analysis of the compressed video data and the routine processing unit 1004
Control to be activated for each black block and compressed audio data
The code processing is performed as the above sequential processing. Header analysis
Is the macroblock header in the MPEG stream
And decoding of motion vectors. Block here
Represents an image composed of 8 * 8 pixels. Macro block
Is composed of four luminance blocks and two chrominance blocks.
You. The motion vector is the 8 * 8 pixel in the reference frame.
This is a vector pointing to a rectangular area, and the block
Indicates which rectangular area in the frame the difference was taken from
You. <1.1.2.2 Standard processing unit> The standard processing unit 1004
Activation of decoding for each macroblock from the processing unit 1003
In response to the instruction, the audio decoding process of the
At the same time, the macroblock decoding process is
Perform as a matter of course. This decoding process is for decoding variable-length codes.
(VLD: Variable Length code Decoding), inverse quantization
(IQ: Inverse Quantization), inverse discrete cosine transform (I
DCT: Inverse Discrete Cosine Transform), motion supplement
Reimbursement (MC: Motion Compensation) must be performed in the same order.
It is assumed. In the motion compensation, the routine processing unit 1004
External memory as a frame memory for the block after decoding
3 via the memory controller 6. <1.2 Configuration of Video / Audio Processing Device> FIG.
FIG. 2 is a block diagram showing a more detailed configuration of the processing device 1000.
You. <1.2.1 Configuration of input / output processing unit>
The processing unit 1001 includes a stream input unit 1, a buffer memory
2. Input / output processor 5 (hereinafter abbreviated as IO processor 5)
), DMAC (Direct Memory Access Controller)
5a, video output unit 12, audio output unit 13, host I / F
And a unit 14. The stream input unit 1 is connected to a serial
MPEG data stream input to
(Hereinafter referred to as MPEG data). That
At this time, the stream input unit 1
From GOP (Group Of Picture: contains one I picture,
MPEG data stream equivalent to about 0.5 seconds of video
The start code is detected by the IO processor.
Notify service 5. By this notification, the converted MPEG data
The buffer memory 2 is controlled by the IO processor 5.
Is forwarded to The buffer memory 2 includes a stream input unit 1
Buffer for temporarily storing MPEG data transferred from
Memory. MPE held in buffer memory 2
The G data is further stored under the control of the input / output processor 5.
Transferred to the external memory 3 via the memory controller 6
You. The external memory 3 is an SDRAM (Synchronous Dynami
c Random Access Memory) chip
Transferred from the memory 2 via the memory controller 6.
MPEG data temporarily stored. In addition, external
The memory 3 stores decoded video data (hereinafter referred to as frame data).
) And the decoded audio data. The input / output processor 5 has a stream input unit.
1, buffer memory 2, external memory 3 (memory control
), Data input between the FIFO memory 4
Control the output. That is, the routes shown in (1) to (4) below
Controls data transfer (DMA transfer). (1) Stream input unit 1 → buffer memory 2 → memory controller
6 → External memory 3 (2) External memory 3 → Memory controller 6 → FIFO memory 4 (3) External memory 3 → Memory controller 6 → Buffer memory 2 → V
Video output unit 12 (4) External memory 3 → memory controller 6 → buffer memory 2 → sound
Voice output unit 13 In these paths, the input / output processor 5
Video data and audio data in the
Control the transmission. (1) and (2) are MPEG data before decoding.
Data transfer path. Ingress and egress in the transfer path of (1) and (2)
The power processor 5 converts the compressed video data and the compressed audio data into
Separately. (3) and (4) are the images after decryption, respectively.
This is a transfer path for image and audio data. Video and audio after decoding
Data is stored on an external display device (not shown) and audio output device (
Outer) It is transferred according to each output rate. The DMAC 5a has a stream input unit 1,
The video output unit 12, the audio output unit 13, and the buffer memory 2
DMA transfer between the buffer memory 2 and the external memory 3
Between the external memory 3 and the FIFO memory 4
DMA transfer between I / O processors under the control of IO processor 5
I do. The video output unit 12 is connected to an external display device (CRT)
Etc.) (for example, the frequency of the horizontal synchronization signal Hsync).
A data request to the input / output processor 5 according to the
The data is input by the input / output processor 5 through the transfer path of (3)
The input video data is output to the display device. The audio output unit 13 is provided for an external audio output device.
Data request to input / output processor 5 according to output rate
To the transfer path of (4) above by the input / output processor 5.
Audio data input from the audio output device (D / A
Output to a combination of barter, audio amplifier, speaker, etc.)
I do. The host I / F unit 14 includes an external host processor,
For example, in the case of a DVD reproducing apparatus, the entire control is performed.
An interface for communicating with the processor.
You. In this communication, the host processor sends an MPEG stream.
Start, stop, fast-forward playback, reverse playback, etc.
Instructions are sent. <1.2.2 Decoding unit> Decoding unit 10 in FIG.
02 is a FIFO memory 4, a sequential processing unit 1003, a fixed form
Processing unit 1004, and the input / output processing unit 1001
Decoding of MPEG data supplied via the FO memory 4
Perform the loading process. Also, the sequential processing unit 1003
The system includes a processor 7 and an internal memory 8. Standard processing unit 1004
Is a code conversion unit 9, a pixel operation unit 10, a pixel read / write unit
11, a buffer 200 and a buffer 201 are provided. The FIFO memory 4 has two FIFOs (hereinafter referred to as FIFOs).
Lower video FIFO, audio FIFO)
Transferred from the external memory 3 under the control of the power processor 5.
Compressed video data and compressed audio data
Memorize it in a ceremony. <1.2.2.1 Sequential processing unit> The processor 7
Read compressed video data and compressed audio data of memory 4
Controls some of the compression video data.
Code processing and decoding of all compressed audio data
And do. Decoding of part of compressed video data
Analysis of header information in MPEG data and motion vector
It includes calculation and control of the compressed video decoding process. this is,
The processor 7 performs all decoding processing of the compressed video data,
This is because the sharing is performed with the standard processing unit 1004. Toes
The processor 7 is capable of performing a variety of condition determinations.
The next processing is shared, and the routine processing unit 1004
And perform various arithmetic processing. Audio decoding, on the other hand,
Since the amount of calculation is smaller than that of image decoding, the processor 7
I am in charge of everything. The function of the processor 7 will be specifically described with reference to FIG.
Will be described. FIG. 5 shows the MPEG stream hierarchically.
Both show the operation timing of each part of the video and audio processing device.
I have. In the figure, the horizontal axis is the time axis. The first level is M
5 shows the flow of a PEG stream. 1 second as in the second level
The MPEG stream in between comprises a plurality of frames (I, P,
B picture). As in the third layer, one frame is
Includes a picture header and multiple slices. The fourth level
One slice consists of a slice header and multiple macro blocks.
Including As in the fifth layer, one macroblock is
Includes a black block header and six blocks. Data structure of first to fifth layers shown in FIG.
Is a publicly known document such as ASCII
Formula latest MPEG textbooks ". Professional
As shown in the fifth and lower layers of FIG.
Analysis of header up to macroblock layer in G stream
The compressed audio data is decoded. At that time, the processor 7
According to the header analysis result for each macroblock.
Code conversion unit 9, pixel operation unit 10, and pixel read / write unit 11
To start decoding macroblocks,
Code conversion unit 9, pixel operation unit 10, and pixel read / write unit 11
While the macroblock is being decoded by
Read compressed audio data from FIFO memory 4 and decorate
To load. Code conversion unit 9, pixel operation unit 10, and pixel reading
The decoding of the macroblock is completed by the writing unit 11.
Then, the processor 7 sends a notice to that effect by an interrupt signal.
The decoding of the compressed audio data is interrupted and the next
Start analyzing the header of the black block. The internal memory 8 stores the work
Memory that temporarily holds the decoded audio data.
You. The held audio data is output by the input / output processor 5.
The data is transferred to the external memory 3 through the path (4). <1.2.2.2 Routine processing unit> The code conversion unit 9 is a FIFO
Variable length decoding of compressed video data read from memory 4
(VLD). As shown in FIG.
Is the header information and motion vector of the decoded data.
To the processor 7
To send the macro block (the luminance blocks Y0 to Y3 and the color difference
6 blocks consisting of blocks Cb and Cr)
(Solid line section in the figure) pixel calculation via buffer 200
Transfer to unit 10. Mac after decoding by code conversion unit 9
Block data is data representing spatial frequency components.
You. The buffer 200 is provided by the code conversion unit 9.
Spatial frequency for one block (for 8 × 8 pixels) to be written
Holds data representing numerical components. The pixel operation unit 10
Block transferred from the buffer conversion unit 9 via the buffer 200.
Inverse quantization (IQ) and inverse separation for lock data
The cosine transform (IDCT) is performed for each block. Pixel performance
If the processing result by the arithmetic unit 10 is a luminance block,
Data representing the luminance value of the
Color difference of the pixel or data representing the difference.
To the pixel read / write unit 11 via the buffer 201
Is done. The buffer 201 is composed of one block (8 × 8 picture).
) Of pixel data. Pixel read / write unit 11
Is a block unit of the processing result of the pixel operation unit 10.
Motion compensation. That is, P picture, B picture
, The decoded reference frame in the external memory 3
From the memory controller to the rectangular area indicated by the motion vector
6 through which the processing results of the pixel operation unit 10 are extracted.
Decompose to the original block image by combining with the lock
I do. The decoding result by the pixel read / write unit 11 is
The data is stored in the external memory 3 via the controller 6. Each content of the above-mentioned motion compensation, IQ and IDCT
Is a well-known technology, so a detailed description is omitted (see above).
Literature). <1.3 Detailed Configuration of Each Unit> Next, the video / audio processing device 10
The detailed configuration of each of the main components of 00 will be described. <1.3.1 Processor 7 (sequential processing unit)>
Analysis of macroblock header by processor 7 and other
FIG. 6 is a diagram showing control contents for a unit. First shown in the figure with abbreviations
Each data in the macro block header is
And so on, and the description is omitted here. As shown in the figure, the processor 7 performs code conversion.
Variable length decoded header by issuing command to section 9
Data is acquired sequentially, and the code conversion unit is
9, the pixel operation unit 10 and the pixel read / write unit 11
Set the data required to decode the block. Concrete
Specifically, first, the processor 7 sends the MBA to the code conversion unit 9.
To get I (Macro BlockAddress Increment)
A command is issued (S101), and the code conversion unit 9
Obtain MBAI. The macro based on this MBAI
If the block data is a skip macroblock (now
The macroblock to be decoded is the same as the previous one
If so), the macro block data is omitted
Proceeds to S117, and if not a skip macroblock
The header analysis is continued (S102, 103). Next, the processor 7 sets the MBT (Macro Bl
issue a command to acquire the
The MBT is acquired from the code conversion unit 9. From this MBT
Scan type is zigzag scan or alternate
The pixel operation unit 10 determines whether the scan
00 is designated (S104). further,
The processor 7 performs the STWC from the already obtained header data.
(Spartial Temporal Weight Code) exists
Is determined (S105), and if there is, a command is issued.
And acquire it (S106). Similarly, the processor 7 sets the FrMT (Fr
ame Motion Type), FiMT (Field Motion Type),
DT (DCT type), QSC (Quantizer Scale Code),
MV (Motion Vector), CBP (Coded Block Patter
n) is acquired (S107-116). At that time,
The sensor 7 converts the analysis results of FrMT, FiMT, and DT into pixels.
Notify the read / write unit 11 and calculate the QSC analysis result as a pixel operation
To the code conversion unit 9.
Notice. This is needed for IQ, IDCT and motion compensation
Is the information, the code conversion unit 9, the pixel operation unit 10, the pixel reading
This is set in the writing unit 11. In the two-processor configuration, a wide variety
Each processor performs the above sequential processing that requires condition judgment.
The configuration was redundant since it was performed individually. Then professional
The processor 7 sends a macroblock data to the code converter 9.
A code start instruction is issued (S117). This allows
The mode conversion unit 9 performs processing for each block in the macroblock.
To start VLD and send the result of VLD through buffer 200
And outputs the result to the pixel operation unit 10. Processor 7
Calculates a motion vector based on the MV data (S1
18), the calculation result is notified to the pixel read / write unit 11
(S119). In the above processing, the motion vector
Is the motion vector data (MV) acquisition (S113).
Then, the motion vector is calculated (S118), and the motion vector is calculated.
Is set in the pixel read / write unit 11 (S119).
A series of processing is required. In this regard, the processor 7
Immediately after obtaining the vector data (MV) (S113)
Calculation and setting of motion vector (S118, 119)
Issue a decode start instruction to the routine processing unit 1004
And then calculate and set the motion vector
I have. As a result, the motion vector calculation and the
And setting processing, and decoding processing to the routine processing unit 1004
Are processed in parallel. That is, the routine processing unit 1
The decoding start timing of 004 is advanced. As described above, one macroblock
Since the header analysis of the compressed video data is completed,
The server 7 obtains the compressed audio data from the FIFO memory 4
Then, the audio decoding process is started (S120). Audio de
Code processing is performed by the code conversion unit 9 to decode macroblocks.
Continue until an interrupt signal indicating code completion is input
It is. This interrupt signal causes the processor 7 to
The above header analysis is started for the block. <1.3.2 Routine processing unit> Next, the routine processing unit 1004
The code conversion unit 9 converts the six blocks in the macro block into
Pixel operation unit 10 and pixel read / write unit 11 are connected in parallel (pipe
(Depending on the line)
ing. Here, the pixel operation unit 10, the pixel read / write unit
11, in order of the code conversion unit 9, their structure will be described in more detail.
explain. <1.3.2.1 Code Conversion Unit 9> FIG.
9 is a block diagram showing the configuration of FIG. The code converter 9 shown in FIG.
1, counter 902, incrementer 903, selector
904, scan table 905, scan table 9
06, flip-flop (hereinafter abbreviated as FF) 907,
And the result of variable length decoding (VLD)
Zigzag scan or alternate in blocks
Write to buffer 200 to arrange in scan order
Is configured. The VLD unit 901 receives the data from the FIFO memory 4
Variable length decoding (VLD) of the read compressed video data
Of the decoded data, header information and motion vector
Information (the broken line section in FIG. 5)
Transfer the macro block (luminance blocks Y0 to Y3 and color
(6 blocks consisting of difference blocks Cb and Cr)
(Solid line section in FIG. 5) is divided into blocks (64 spatial frequencies).
The data is output to the buffer 200 in units of (data). A counter 902, an incrementer 903,
The circuit portion including the selector 904 is connected to the VLD unit 901
From 0 to 63 in synchronization with the output of these spatial frequency data.
Is counted repeatedly. Scan table 905
Indicates the address of the block storage area of the buffer 200.
This table is stored in the order of zag scan.
The output values (0 to 63) of the counter 902 are sequentially input, and
Next, the address is output. FIG. 20 shows the buffer 200
Storage of 8 × 8 spatial frequency data of
The area and the route of zigzag scanning are shown. Scantay
Bull 905 sequentially stores pixel addresses in the route shown in FIG.
Output. The scan table 906 is stored in the buffer 20
0 address of the block storage area
Table stored in the order of
Output values (0 to 63) are sequentially input, and the
Output FIG. 21 shows the 8 × 8
A block storage area for storing spatial frequency data;
This shows the route of the Tanate scan. Scan table 90
5 sequentially outputs pixel addresses in the route shown in FIG.
You. The FF907 is a scan type (zigzag)
Scan or alternate scan).
Carry. This flag is set by the processor 7
You. The selector 908 selects a switch according to the flag of the FF 907.
From the can table 905 and the scan table 906
Select output address and write to buffer 200
Only output as address. <1.3.2.2 Pixel Operation Unit> FIG.
It is a block diagram showing composition. As shown in the figure, the pixel operation unit 10 includes a multiplier 5
02 and an adder / subtractor 503, and an
A program counter (hereinafter abbreviated as a first PC) 504;
Second program counter (hereinafter abbreviated as second PC) 50
5, a first instruction memory 506, and a second instruction memory 507
And a selector 508, and a part of IQ and IDCT.
Are configured to be executed in parallel with overlapping
ing. . The execution unit 501 includes a first instruction memory 506,
Micro instructions sequentially output from the second instruction memory 507
Access and operation of buffers 200 and 201 according to
Execute First instruction memory 506, second instruction memory 5
07 is a block (frequency) held in the buffer 200.
Component) to realize IQ and IDCT.
This is a control memory for storing a black program. FIG.
Stored in the first instruction memory 506 and the second instruction memory 507
Here is an example of a microprogram that has been created. In the figure, the first instruction memory 506 stores I
DCT1A micro program and IQ micro program
And read out the address by the first PC 504.
Address is specified. IQ microprogram buffer
200, and a multiplication operation.
Therefore, the adder / subtractor 503 is not used. Second instruction memory 507
Is the IDCT1B microprogram and IDCT2
And a second program through the selector 508.
Read address is set by 1PC 504 or 2nd PC 505
It is specified. Here, IDCT1 performs multiplication and addition / subtraction.
IDCT means the first half of IDCT processing.
1A microprogram and IDCT1B microprogram
And the RAM are read out at the same time.
Performed using the body. In addition, IDCT2 performs addition and subtraction.
Processing of the latter half of the main IDCT and buffer 201
Means the writing process of the ID of the second instruction memory 507.
By reading the CT2 microprogram
This is executed using the adder / subtractor 503. The IQ is multiplied by the multiplier 502, and the IDCT2 is
Since they are processed by the adder / subtractor 503, they are processed in parallel.
It is possible to work. FIG. 9 shows the relationship between the I
Q shows operation timing charts of IDCT1 and IDCT2.
You. In FIG. 9, the code conversion unit 9
When the data of the luminance block Y0 is written (at timing t
0), the fact is transmitted to the pixel operation unit 10 by the control signal 102.
Know. The pixel operation unit 10 analyzes the header of the processor 7
Using the QS (Quantizer Scale) value set at the time,
First instruction memory according to address designation of first PC 504
By reading 506 IQ microprograms,
To perform IQ on the data in the buffer 200. This and
Selector 508 selects the first PC 504 (in the
(Ming t1). Further, the pixel operation section 10 includes a first PC 50
IDCT1A and IDCT1 according to the address designation
Buffer by reading B microprogram
IDCT1 is performed on 200 data. At this time,
Since the selector 508 selects the first PC 504, the first
The first and second instruction memories 506 and 507 have the first
An address from PC 504 is specified (at timing t
2). Next, the pixel operation unit 10 executes the QS (Quan
address of the first PC 504 using the tizer scale) value.
The IQ microprocessor of the first instruction memory 506 according to the specification
By reading the program, the block of the buffer 200 is read.
The IQ of the data of Y1 is performed, and at the same time, the second PC
505 of the second instruction memory 507
By reading the IDCT2 microprogram
Process the latter half of the IDCT process for block Y0
You. At this time, the selector 508 selects the second PC 505.
You. The first PC 504 and the second PC 505 are independently addressed.
(Timing t3). After this, the pixel operation unit 10 similarly operates as a block
Processing is continued in units (after timing t4). <1.3.2.3 Pixel Read / Write Unit> FIG.
FIG. 3 is a block diagram illustrating a detailed configuration of a unit 11. See the same figure
As described above, the pixel read / write unit 11 includes buffers 71 to 74 (hereinafter referred to as buffers 71 to 74).
Below, referred to as buffers A to D).
, Combining section 76, selectors 77 and 78, read / write control
The control unit 79 is included. The read / write control unit 79 controls the buffer 201
Buffer data for block data input through
~ D to perform motion compensation, and the final decoded image is
The data is transferred to the external memory 3 in lock units. More specifically
Is the motion vector set during the header analysis of the processor 7.
2 frames from the reference frame in the external memory 3 according to the
The memory controller reads a rectangular area corresponding to the lock.
It controls the trawler 6. As a result, buffer A or buffer A
A rectangular area of 2 blocks indicated by the motion vector in file B
The area data is stored. Then, the picture type (I
Or P picture or B picture)
Is performed by the synthesizing unit 76. More buff
Block data input through the
By combining (adding) with the rectangular area after pel interpolation
Calculates the pixel value of the block and stores it in buffer B.
I do. Thus, the final decryption block stored in buffer B is obtained.
The lock is stored in the external memory 3 via the memory controller 6.
Will be transferred. <1.3.3 Input / output processing unit> The input / output processing unit 1001
Perform a large number of data input / output (data transfer) as described above
Multiple tasks to share various data transfers
Switching without overhead, and data input / output request
Is configured not to cause a response delay to
You. The overhead here is at the time of task switching
Save and restore of the context that occurs. In other words
The output processor 5 is provided with an instruction address of the program counter.
Data and register data to memory (stack area)
Eliminate the overhead of returning and returning
It is configured as follows. Here, the detailed configuration
Will be described. <1.3.3.1 IO Processor> FIG.
FIG. 4 is a block diagram illustrating a configuration of a sa. In FIG.
The O processor 5 includes a state monitoring register 51, an instruction memory
52, instruction reading circuit 53, instruction register 54, decoder
55, operation execution unit 56, general-purpose register set group 57,
A disk management unit 58 is provided.
In order to respond to the
It is configured to execute while switching tasks for each cycle)
Have been. The state monitoring register 51 is provided in the register CR1.
~ CR3, and the IO processor 5
Various status data (flags, etc.) for monitoring the status
Hold. For example, the state monitoring register 51
Status of the system input unit 1 (star in MPEG stream)
Code detection flag), the state of the video output unit 12 (horizontal
Flag indicating blanking period, transfer of frame data
Completion flag), the state of the audio output unit 13 (audio frame
Data transfer completion flag) and the buffer memory
2, data between the external memory 3 and the FIFO memory 4
Data transfer status (number of data transfers, data to FIFO memory 4)
(Data request flag). More specifically, the following flags are included. Start code detection flag (hereinafter also referred to as flag 1)
Set when a start code in the trim is detected
Is done. -Horizontal blanking flag (flag 2) This flag indicates a horizontal blanking period.
And is set by the video output unit 12. About 60 microphones
It is set in seconds. -Transfer completion flag of video frame data (flag 3) This flag is transmitted from the external memory 3 to the video output
D when the decoded image data for the frame is transferred
Set by the MAC 5a. • Audio frame data transfer completion flag (flag 4) This flag is transmitted from the external memory 3 to the audio output
DM when the decoded voice data for the frame is transferred
Set by AC5a. Data transfer completion flag (flag 5) This flag is transmitted from the stream input unit 1 to the buffer memory.
2. Compression of the number of data specified by the IO processor 5
When image data is DMA-transferred by the DMAC 5a
(When the terminal count is reached). DMA request flag (flag 6) This flag indicates whether the compressed image data in the buffer memory 2 or
Data to be DMA-transferred compressed audio data to external memory 3
Is a flag indicating that there is a data
(Required from Task 1 to Task 2 to be described later)
Request). A data request flag to the video FIFO (flag 7) This flag is transmitted from the external memory 3 to the FIFO memory 4
A flag requesting data transfer to the video FIFO,
When the amount of compressed video data in the video FIFO falls below a predetermined amount
Is set. This flag takes approximately 5 to 40 microseconds
Period. A data request flag to the audio FIFO (flag 8) This flag is transmitted from the external memory 3 to the
A flag requesting data transfer to the audio FIFO,
If the audio FIFO compressed audio data falls below a certain amount
Is set. This flag is about 15-60 micro
It is set in seconds. Decoder communication request flag (flag 9) This flag is input / output processing from the decoding processing unit 1002.
This is a flag for requesting the unit 1001 to communicate. Host communication request flag (flag 10) This flag is transmitted from the host processor to the input / output processing unit 10
01 is a flag for requesting communication. The above-mentioned flags are transmitted by the IO processor 5.
Tasks are executed instead of interrupts.
Monitored. The instruction memory 52 has a large number of data inputs / outputs.
Task programs that share power (data transfer) control
Memorize the system. In the present embodiment, six tasks 0 to 5 are set.
Memorize the disk program. Task 0 (host I / F task) This task is executed when the flag 10 is set.
Communication with the computer, that is, via the host I / F unit 14
For performing communication processing with the host computer
It is. For example, an MPEG stream from the host processor
Start, stop, fast-forward playback, reverse playback, etc.
Acceptance and notification of decoding status (error etc.)
Will be In this processing, the flag 10 is used as a trigger.
You.・ Task 1 (Purging task) This task is started by the stream input unit 1
Is detected (the above flag 1), the stream input unit
Parsing MPEG data input from 1 (parsing)
To extract and extract individual elementary streams
DMA transfer of the elementary stream
(The first half of the transmission path (1)) to the buffer memory 2.
Program. Elementalis extracted here
The type of trim is compressed video data (video element
Stream), compressed audio data (audio
Oelementary stream), private
There is data. Buffer elementary stream
When stored in the memory 2, the flag 6 is set.
You. Task 2 (stream transfer / audio task) This task is a professional task that controls the following transfer (a) to (c).
Gram. (A) From the buffer memory 2 to the external memory 3
DMA transfer of each elementary stream
The latter half of the transmission route (1)). This transfer is based on the flags 1, 3
Is the trigger. (b) Compressed audio data held in audio FIFO
From the external memory 3 according to the data size (remaining amount)
Compressed audio data into the audio FIFO in the FIFO memory 4
DMA transfer of data (audio transfer in the above transfer path (2))
Transfer to off FIFO). This data transfer is
Data size of compressed audio data held in FIFO
This is done when the amount is less than a certain amount. This roll
Transmission is triggered by the flag 8 described above. (C) From the external memory 3 to the buffer memory 2
To the audio output unit 13 from the buffer memory 2
DMA transfer of audio data after the transfer
(Four)). This transfer is triggered by the flag 2 described above. -Task 3 (Video supply task) This task is to perform the compressed video data held in the video FIFO.
External memory 3 depending on the data size (remaining amount) of data
Video data to the video FIFO in the FIFO memory 4
DMA transfer (video FIFO in transfer path (2) above)
Is a program that processes This data transfer
Transmission of the compressed video data held in the video FIFO
This is performed when the data size becomes smaller than a certain amount.
You. This transfer is triggered by the flag 7 described above. Task 4 (video output task) This task is performed from the external memory 3 to the buffer memory 2.
After decoding from the buffer memory 2 to the video output unit 12,
Processes DMA transfer of video data (the above transfer path (4))
Program. This transfer triggers flag 2 above.
Gar. Task 5 (decoder I / F task) This task is executed by the decode processor 1002
This is a program for processing a command directed to the server 5.
Commands include "getAPTS", "getVPTS", "getSTC"
and so on. getVPTS (Video Presentation Time Stam
p) indicates that the decode processing unit 1002
Of VPTS assigned to compressed video data
Command. getAPTS (Audio Presentat
ion Time Stamp), the decode processing unit 1002
A assigned to the compressed audio data for the processor 5
This is a command for requesting acquisition of a PTS. getSTC (Syst
em Time Clock), the decode processing unit 1002
A command to request the processor 5 to acquire the STC
is there. Upon receiving these commands, the IO processor 5
STC, VPTS, APTS in decoding processing unit 1002
Notify each. STC, VPTS, APTS,
The decoding processing unit 1002 decodes audio and video.
Code, or adjust the decoding progress in frame units.
Used to adjust. In this process, the flag 9 is set.
Trigger. The instruction read circuit 53 has an instruction fetch address
Program counters (hereinafter abbreviated as PC)
Using the PC specified by the task management unit 58
Instruction from the instruction memory 52 to read the instruction
4 is stored. Specifically, the instruction reading circuit 53
Task management with PCs 0-5 corresponding to tasks 0-5
When the designation of the PC by the unit 58 is changed, the hardware
Is configured to switch PCs faster
You. With this configuration, the IO processor 5 can perform task switching.
Saves the PC value of the current task to memory when
Released from the process of returning the PC value of the next task from memory
Have been. The decoder 55 reads from the instruction memory 52.
Decodes the instruction issued and stored in the instruction register 54,
The arithmetic execution unit 56 is controlled so as to execute the instruction.
In addition, the decoder 55 controls the IO processor 5 as a whole.
Instruction reading stage of instruction reading circuit 53, decoder 55
Of the decoding stage and the execution stage of the arithmetic execution unit 56
At least three stages of pipeline control are performed. The arithmetic execution unit 56 is provided with an ALU (Arithmetic L
ogical unit), multiplier, BS (Barrel Shifter)
Specified by the instruction according to the control of the decoder 55.
Perform the operation. The general-purpose register set group 57 includes a task
Six register sets (one register set)
The star set consists of four 32-bit registers and four 16-bit registers.
Register). A total of 24 32 bits
Register and 24 16-bit registers.
The register set corresponding to the task in line is used.
This allows the IO processor 5 to perform task switching.
Save all current register data to memory
From the process of restoring the register data of the next task from
Have been. The task management unit 58 stores a predetermined number of instruction cycles.
The PC of the instruction reading circuit 53 and the general-purpose register
By switching the register set of the
Switch the disk. In this embodiment, the predetermined number is four.
You. Also, the IO processor 5 can execute one instruction in one instruction cycle.
Since the pipeline processing is performed, the task management unit 58
Task every 4 instructions without any overhead
Will switch. This will cause each asynchronous
Response delay is suppressed for various input / output requests. I mean
Response delay to I / O request is only 24 lives at most
Only a command cycle occurs. <1.3.3.1.1 Instruction Read Circuit> FIG. 12 shows an instruction read circuit.
FIG. 53 is a block diagram illustrating a detailed configuration example of a configuration 53. In the figure, the instruction reading circuit 53
Separate PC storage unit 53a, current IFAR (Instruction Fetch
Address Register) 53b, incrementer 53c, next
IFAR 53d, selector 53e, selector 53f, D
Equipped with ECAR (DECode Address Register) 53g,
Instruction reading without overhead when switching tasks
It is configured to switch addresses. The task-specific PC storage unit 53a stores tasks 0 to
5 address registers for each task
Holds the program count value. Each program cow
The event value is a restart address of the corresponding task. Tas
When switching between tasks, the task management unit 58 and the decoder 55
Under the control of the address corresponding to the task to be performed next.
The program count value is read from the
Address register corresponding to the task being executed
The program count value is updated to the new restart address.
You. At this time, the task to be executed next and the current task are
The task management unit 58 executes “nexttaskid (rd add
r) "signal (hereinafter also referred to as task ID)," taskid (wr a
ddr) "signal. The program card corresponding to tasks 0, 1, and 2
The count values are shown in PC0, PC1, and PC2 in FIG. Smell
(0-0) indicates instruction 0 of task 0, and (1-4) indicates
Indicates instruction 4 of disk 1. For example, PC0
Read at the time of restart (instruction cycle t0), the next timer
To the address of instruction (0-4) when switching to
New (instruction cycle t4). Incrementor 53c, next IFAR 53
d, and the loop circuit composed of the selector 53e
Update the instruction read address selected by 3e
Circuit. The address output from the selector 53e is
This is shown as IF1 in FIG. In the figure, for example, task 0
When switching from to the task 1, the selector 53e
In cycle t4, read from task-specific PC storage unit 53a
Select the fetched instruction (1-0) address and cycle
From t5 to t7, increment from the next IFAR 53d
Selected instruction address. The current IFAR 53b is selected by the selector 53e.
The selection output IF1 is held one cycle later, and the instruction memory 5
2 is output as an instruction read address. Paraphrase
The instruction read address of the currently active task
Hold. Instruction read address of current IFAR 53b
Is shown as IF2 in FIG. As shown in FIG.
Specifies the instruction address of a different task every four instruction cycles
are doing. The DECAR 53g is stored in the instruction register 54.
Holds the address of the held instruction. That is,
Refers to an instruction in the code. The DEC in FIG.
Indicates the address held in R53g. Also, in FIG.
EX indicates an instruction address being executed. Selector 53
f indicates the branch address when a branch instruction is executed or an interrupt occurs.
Select, otherwise select next IFAR53d address
I do. Provision of such an instruction reading circuit 53
As a result, the IO processor 5 has four stages as shown in FIG.
(IF1, IF2, DEC, EX) pipeline processing
It is carried out. Of these, the IF1 stage has multiple programs.
This is a stage for selecting and updating a ram count value.
The IF2 stage is a stage for reading an instruction. <1.3.3.1.2 Task Management Unit> FIG. 14 shows the task management unit.
FIG. 58 is a block diagram showing a detailed configuration of the embodiment 58. Smell
The task manager 58 manages the task switching timing.
The slot manager to manage and the order of tasks
It is roughly divided into a scheduler. The slot manager includes a counter 58a,
The latch 58b, the comparator 58c, and the latch unit 58d
Task that instructs task switching every four instruction cycles
Output a switching signal (chgtaskex) to the instruction reading circuit 53.
You. Specifically, the latch 58b outputs the output of the counter 58a.
Two FFs (Flip Flop) holding the lower 2 bits of the force
Circuit. The counter 58a has a clock indicating an instruction cycle.
Increment the 2-bit output value of latch 58b by +1 for each lock
Output the incremented 3 bits. As a result,
Tab 58a repeatedly outputs 1, 2, 3, and 4
Become. The output of the counter 58a is a constant.
When the value matches 4, the task switching signal (chgtaskex) is commanded.
It outputs to the instruction reading circuit 53 and the scheduler. The scheduler has a task round management unit 5
8e, priority encoder 58f, latch 58g
The task switching signal (chgtaskex) is output
Update the task id, and execute the current task
The task id to be executed is output to the instruction reading circuit 53.
Specifically, the latch unit 58d and the latch 58g
In both cases, the current task id is encoded (3 video
). The encoded form has its value
Represents task id. The task round management unit 58 e
When the replacement signal (chgtaskex) is input, the latch unit
The task id to be executed next is referred to
Output in coded format (6 bits). Decoded
Format (6 bits), one bit corresponds to one task
And the bit position represents the task id. Priority
The encoder 58f outputs from the task round management unit 58e.
The task id to be input is encoded from the decoded
To the loaded format. The latch unit 58
d, latch 58g, together with encoded task i
d is held one cycle later. With this configuration, the task round management unit 5
8e receives a task switching signal (chgtaske) from the comparator 58c.
x) is output, the priority encoder 58
From f, change the id of the task to be executed next to "nexttaskid (rd
addr) "as the signal from the latch 58e to the current task id.
As a “taskid (wr addr)” signal. <1.4 Description of operation> First embodiment configured as described above
Video and audio processing apparatus 1000 in the form
The operation will be described. In the input / output processing unit 1001, the stream
MPEG stream asynchronously input from the system input unit 1
Is a buffer memo under the control of the input / output processor 5.
External memory 3 via memory controller 6
And further via the memory controller 6 to F
It is held in the IFO memory 4. At this time, FIFO memory
4, the IO processor 5 executes the task 2
(B) According to the remaining amount by executing task 3
To supply compressed moving image data and compressed audio data. this
A certain amount of compression can be stored in the FIFO memory 4 without excess or shortage.
Since video data and compressed audio data are supplied,
Processing unit 1002 is separated from asynchronous input / output.
Thus, it is possible to exclusively use the decoding process. So far
The processing is performed by the input / output processing unit 1001 for decoding.
The processing is performed in parallel with the processing unit 1002 independently. On the other hand, in the decoding processing unit 1002,
MPEG stream data held in the FIFO memory 4
Hereafter, the processor 7, the code conversion unit 9, the pixel operation unit
10, decoded by the pixel read / write unit 11. FIFO
FIG. 15 is an explanatory diagram showing the decoding operation after the memory 4.
In the figure, the horizontal axis is the time axis and approximately one macroblock
Header analysis and decoding for each block
Is shown. In the vertical direction, each of the decode processing units 1002
Decoding of each block in the
It shows how it is performed. As shown in the figure, the processor 7
Header analysis of video data and data for compressed audio data
The code processing is repeated in a time sharing manner. That is, the processor
The server 7 analyzes the header of one macroblock and analyzes the header.
Code conversion unit 9, pixel operation unit 10, pixel read / write
After the notification to the unit 11, the macro conversion
Instructs to start decoding the lock. Then processor 7
Until the interrupt signal from the code conversion unit 9 is notified.
Then, the decoding process of the compressed audio data is performed. After decoding
Is temporarily stored in the internal memory 8 and further stored in the memory.
DMA transfer to the external memory 3 by the memory controller 6
It is. Also, the code conversion unit 9
Receives the macroblock decoding start instruction from the
Store in the buffer 200 for each block in the block
You. At this time, the code conversion unit 9
Depending on the block scan type reported during analysis
Change the order of write addresses to the buffer 200
You. In other words, the zigzag scan and the alternate scan
The order of the write addresses is changed in the case of the scan.
Accordingly, the pixel operation unit 10 determines the order of the read addresses.
Does not need to be changed, regardless of the scan type.
Can be read in the order of the read addresses
You. The code conversion unit 9 converts the six blocks in the macroblock.
Repeat the above operation until the VLD process is completed
Write to buffer 200. After 6 blocks of VLD
Then, an interrupt is generated in the processor 7. This interrupt signal
Is the macro block decoding end signal End Of Macro Blo
ck (EOMB). The code conversion unit 9 is used for the sixth block.
To detect the block end signal End Of Block (EOB)
Is generating more EOMB. The pixel operation section 10 is parallel to the code conversion section 9.
And stored in the buffer 200 as shown in FIG.
Apply IQ and IDCT to block data in block units.
Then, the processing result is stored in the buffer 201. Pixel reading
The writing unit 11 includes a buffer in parallel with the pixel operation unit 10.
201 block data and a header by the processor 7
1 based on the motion vector notified by the analysis.
As shown in FIG. 5, a rectangle from the reference frame of the external memory 3
Region extraction and block synthesis are performed. Block
The result is stored in the external memory 3 via the FIFO memory 4.
Will be delivered. The above is not a skip macroblock
Operation in case of skip macro block
Indicates that the code conversion unit 9 and the pixel operation unit 10 do not operate,
Only the read / write unit 11 operates. Skip macro block
If there is a block, the same image as the rectangular area in the reference frame
The image is decoded by the pixel read / write unit 11
The image is copied to the external memory 3 as an image. In this case, the code conversion unit 9
The interrupt signal to 7 is generated as follows. Sand
That is, the processor 7 performs motion compensation on the pixel read / write unit 11.
A signal indicating that a control signal for starting the compensation operation has been sent,
That the pixel read / write unit 11 can perform a motion compensation operation.
Signal indicating that it is a skipped macroblock
Take the logical product of the signal and the logical product and the above EOMB
An interrupt signal is input to the processor 7 as a logical sum with the signal
Is done. As described above, the first embodiment of the present invention
According to the video / audio processing device of the form, the storage medium or the communication medium
MPEG stream input processing, display device and sound
Output processing of display image data and audio data to voice output device
And supplies a stream to the decoding processing unit 1002.
The input / output processing unit 1001 shares the processing with the compressed video data.
Decoding unit for decoding data and compressed audio data
1002 is configured to share. This
The decoding processing unit 1002 performs processing that occurs asynchronously.
And can be dedicated to the decoding process.
As a result, MPEG stream input, decode, output
The series of processes described above are executed efficiently,
Full decoding of MPEG stream without lock
(Without dropped frames). Further, the present video / audio processing apparatus is integrated into one chip.
It is desirable to use an LSI. In this case, 100MHz
The following operation clock (actually 54 MHz)
Decoding is possible. In this regard, the operating clock is 100
MHz High performance CP over 200MHz
U can perform the above full decoding if the image size is small
However, the manufacturing cost is high. this
On the other hand, this video and audio processing device is
Excellent in decoding. Further, the decoding processing of the video / audio processing apparatus
The management unit 1002 shares roles as follows. Toes
The processor 7 also performs compressed audio processing on the compressed video data.
A variety of conditional decisions are required for data.
Responsible for analyzing data and decoding compressed audio data
Also in charge. For block data of compressed video data
Requires a large amount of routine computation,
Conversion unit 9, pixel operation unit 10, and pixel read / write unit 11.
Hardware (firmware) for decoding
In charge of As shown in FIG.
The prime operation unit 10 and the pixel read / write unit 11 are pipelined
Have been. The pixel operation unit 10 has an IQ and an IDCT
Column processing is enabled. The pixel read / write unit 11 has two blocks.
Access to reference frames in lock units is realized.
With these, the efficiency of the compressed audio decoding process has been improved.
Hardware dedicated to video decoding is expensive.
High processing power can be obtained without using a fast clock
it can. Specifically, a high-speed clock exceeding 100 MHz
With a clock of about 50 to 60 MHz without using
The processing ability of the degree or more was obtained. Therefore, high-speed devices
It is not necessary to use it, and the manufacturing cost can be reduced. The basic unit of video decoding is
The macro conversion unit 9 and the code conversion unit 9
And the pixel read / write unit 1 in the pixel operation unit 10
Because 1 has 2 blocks,
Buffer buffer capacity can be minimized
It becomes. <2 Second embodiment> Video / audio processing apparatus of the present embodiment
In addition to the ability to decode compressed stream data,
In addition, the compression function (hereinafter referred to as encoding) and graphics
It is configured to perform a fix function. <2.1 Configuration of Video / Audio Processing Apparatus> FIG.
2 shows a configuration of a video and audio processing device according to a second embodiment.
It is a block diagram. This video / audio processing apparatus 2000 has
Input / output unit 21, buffer memory 22, FIFO memo
24, input / output processor 25, memory controller 2
6, processor 27, internal memory 28, code conversion unit 2
9, pixel operation unit 30, pixel read / write unit 31, video output
Unit 12, audio output unit 13, buffer 200, buffer 2
01. The video / audio processing device 2000 is shown in FIG.
In addition to the functions of the video and audio processing apparatus 1000 shown in FIG.
Features have been added. That is, video data and audio data
Data compression function and graphics to draw polygon data.
Functions are added. For this reason, the video / audio processing device 2000
Therefore, components having the same names as those in FIG.
In addition, functions that perform compression and graphics functions
Has been added. The description of the same points as those in FIG.
The following description focuses on the following points. The stream input / output unit 21
The difference is that it is bidirectional. In other words, the input / output process
From the buffer memory 22 under the control of the
When data is transferred, the transferred parallel data is
Convert to real data, and MPEG data stream
And output to the outside. Buffer memory 22, FIFO memory 24
Is also bidirectional. I / O processor 25
Is the route data shown in (1) to (4) shown in the first embodiment.
In addition to controlling the data transfer, the transfer of the route of (5) to (8)
Also control. (1) Stream input / output unit 21 → buffer memory 22 → memory control
26 → external memory 3 (2) external memory 3 → memory controller 26 → FIFO memory 24 (3) external memory 3 → memory controller 26 → buffer memory 22
→ Video output unit 12 (4) External memory 3 → Memory controller 26 → Buffer memory 22
→ Audio output unit 13 (5) External memory 3 → Memory controller 26 → Internal memory 28 (6) External memory 3 → Memory controller 26 → Pixel read / write unit 31 (7) FIFO memory 24 → Memory controller 26 → External memory 3 (8 ) External memory 3 → memory controller 26 → buffer memory 22
→ The stream of the stream input / output unit 21
This is the path of the original data when processing is performed, and (7) and (8) are
4 shows the path of an MPEG stream after compression. First, the encoding process will be described.
The data to be encoded is stored in the external memory 3.
Shall be. The video data of the external memory 3 is
The pixel read / write unit 31 controls the controller 26.
Is transferred to the pixel read / write unit 31. Pixel read / write unit
A process 31 writes video data to the second buffer 201.
And difference image generation processing. The difference image generation processing includes:
Motion detection (calculation of motion vector) and difference in block units
Image generation. Therefore, the pixel read / write unit 31
Is a rectangular area similar to the encoding target block and the reference frame.
Motion to detect a motion vector by searching within the
A detection circuit inside. Note that the motion detection circuit
Instead of the already calculated block of the adjacent frame
Look at the motion vector to be encoded using the motion vector
A stacking motion estimation circuit may be provided. The pixel operation unit 25 calculates the difference image for each block.
Receiving image data, DCT, IDCT, quantization processing
(Hereinafter, Q processing) and IQ are performed. Is quantized in this way
The video data is stored in the buffer 200. Code change
The conversion unit 29 receives the quantized data from the buffer 200.
Variable length code processing (VLC). Variable length coded
The stored data is stored in the first-in first-out memory 24 and
Stored in the external memory 3 through the re-controller 26
At the same time, the processor 27
Header information is added. The video data in the external memory 3 is a memo.
Transferred to the internal memory 28 via the re-controller 26
You. The processor 27 outputs header information for each macroblock.
The audio data in the internal memory 28 is
Data compression processing. As described above, the encoding process
Will be processed in the reverse path to the first embodiment.
You. Next, the graphics processing will be described.
You. Graphics processing is a rectangular type called polygon
In 3D image generation processing performed by combining figures
is there. In this device, the image at the vertex coordinates of the polygon is
Processing to generate pixel data inside polygon from raw data
I do. First, the polygon vertex data is stored in the external memory 3.
Is stored. The vertex data is stored in the memory
By controlling the controller 26, the
Is stored. The processor 27 calculates the vertex from the internal memory 28.
Read data and DDA (Digital Difference Analyze)
Is performed and the data is written to the FIFO memory 24. Co
The conversion unit 29 receives the FIF according to the instruction from the pixel operation unit 30.
The vertex data is read from the O memory 24 and the pixel operation unit 30
Transfer to The pixel operation unit 30 performs a DDA process on the pixels.
It transmits to the read / write unit 31. The pixel read / write unit 31
According to the instruction of the processor 27, the Z buffer processing or α
The memory controller 26 performs the blending process.
The image data is written to the external memory 3 via the external memory 3. <2.1.1 Pixel Operation Unit> FIG.
It is a block diagram showing composition. FIG. 13 is the same as the pixel operation unit 10 shown in FIG.
The same components are assigned the same numbers, and the description is omitted.
The different points will be mainly described. The difference is as shown in the figure.
The pixel operation unit 30 is different from the pixel operation unit 10 shown in FIG.
That the execution unit has three planes (501a to 501c)
And an instruction pointer holding unit 308 and an instruction register 309.
That is, the distribution unit 310 is added. The execution units 501a to 501c have three surfaces.
The reason for this is to improve the calculation performance. Specifically
Is used to convert color images RGB in graphics processing.
Perform independent parallel calculations. In the IQ and Q processing, the multiplier 5
02 is used to increase the speed. IDCT smell
Use multiple multipliers 502 and adder / subtractors 503
By doing so, time is reduced. In IDCT
There is an operation called butterfly operation, which is
Since there is a dependency between all the source data,
Data line for communicating between units 501a to 501c
103 is provided. First instruction memory 506, second instruction memory 5
07 is DCT, Q processing, DD in addition to IDCT and IQ.
A microprogram for A is stored. FIG.
In the first instruction memory 506 and the second instruction memory 507,
An example of memory contents is shown. Q processing micropro
Gram, DCT microprogram and DDA microphone
B programs have been added. Instruction pointer holding units 308a to 308c
Are provided corresponding to the execution units 501a to 501c.
Address input from the first program counter respectively
Conversion table that converts and outputs to the instruction register unit 309
Have The converted address is stored in the instruction register 30
9 means the register number. In addition, instruction pointer storage
The holding units 308a to 308c are respectively
Hold the output flag and output it to the instruction execution units 501a to 501c.
Power. Instruction pointer is held for the conversion table
The units 308a, 308b, 308c are, for example, input address
If the source is 1,2,3,4,5,6,7,8,9,10,11,12
The following converted address is output. Instruction pointer holding unit 308a: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1
1,12 instruction pointer holding unit 308b: 2,1,4,3,6,5,8,7,10,9,1
2,11 instruction pointer holding unit 308c: 4,3,2,1,8,7,6,5,12,11,1
0,9 The instruction register unit 309 includes a microphone as shown in FIG.
B. Multiple registers holding instructions. 3 selectors and 3 registers.
Output port. The three selectors are instruction poi
Input from the counters 308a, 308b, 308c.
Of the register specified by the replacement address (register number).
Select the micro instruction. The three output ports are selectors
Are provided corresponding to the
The micro instruction is executed by the execution units 501a to 501 through the distribution unit 310.
01c. Three selectors and output ports
The only difference is that three adder / subtracters 503 (or three
Supply different multipliers simultaneously to the multiplier 502)
That's why. In this embodiment, the three output ports are
310, three adder / subtracters 503 and three multipliers 5
02 is selectively supplied. For example, the instruction register unit 309 is a register
R1 to R16 (register numbers 1 to 16) are provided.
Microprogram stored in registers R1 to R16
RAM is a matrix operation processing required in DCT and IDCT.
In any of the above three register numbers.
Are stored so as to perform the same processing. In other words, above
The micro program having three execution orders is the execution order
Some micro-instructions where
You. This is because the execution units 501a to 501c
Execution program 501a-501
Resource interference such as register (not shown) access conflict between c
This is to avoid it. Further, the matrix operation processing is performed by 8 ×
The contents include multiplication, transposition, and transfer of eight matrices. Next, each register of the instruction register unit 309
Are stored in mnemonic format.
"Op Ri, Rj, dest, (modifier
G). However, the instruction register
The micro instruction is “op, Ri, Rj and (modify
Lag) ”only. "Dest" is an instruction
It is specified from the memories 506 and 507. "(Modifa
Instruction flag holding sections 308a to 308a)
8c. Here, "op" is a multiplication instruction and an addition / subtraction instruction.
Instruction code, transfer code, etc., "R
i, Rj "are operands.
Instruction executed by each multiplier 502 in row units 501a to 501c
And the addition instruction and the transfer instruction include three execution units 501
Instructions executed by each of the multipliers 502 in a to c. "
“dest” indicates the storage destination of the operation result.
t ″ is not a register of the instruction register unit 309, but an instruction
Memory 506 (for multiplication instructions) or instruction memory 507
(In the case of an addition / subtraction instruction or a transfer instruction). this
Executes the microprogram of the instruction register unit 309
This is to make them common to the units 501a to 501c. If
If the destination is specified by a register, the execution units 501a to 501a
1c It is necessary to prepare an individual microprogram for each
It is necessary to increase the capacity of the microprogram several times.
And The "modify flag" is
Is a flag indicating whether the operation is addition or subtraction.
You. This “modify flag” is stored in the instruction register 3
09, not from the register 09
8a to 8c. This is DCT, IDC
All elements in the constant matrix used for the matrix operation in T are "
1 ”row (or column) and all elements are“ −1 ”row (or column)
Are included in the instruction pointers 308a to 308c.
Instruction flag by specifying the
It is possible to share the same microprogram of unit 309
I'm working. The distribution unit 310 is provided with the instruction register unit 309
If the three micro instructions input from the
In this case, those "op and Ri, Rj" parts and the instruction
The “dest” part input from the memory 506 and the
"Modifier" input from the command pointer units 308a to 308c.
Is distributed to three adder / subtracters 503, and at the same time
The micro instruction of the instruction memory 506 is divided into three multipliers 502.
Distribute to Further, the distribution unit 310 includes the instruction register unit 3
09 are the multiplication instructions
In this case, those "op and Ri, Rj"
"Dest" part input from the instruction memory 506 and
Is distributed to the three multipliers 502 and the instruction memory 507
The micro instruction is distributed to three adder / subtracters 503. say
In other words, the distribution unit 310 allows the three adder / subtracters 503
Is supplied to three adder / subtracters 503
One common instruction is stored in the instruction memory 507 from the instruction memory 507.
Micro instructions are supplied to each of the three adders / subtracters 50.
Instruction register section 309
Are supplied to each. same
, The microinstructions supplied to the three multipliers 502
Are instructions for instructions common to the three multipliers 502
Micro-instructions are supplied from memory 506 and three multiplications
Instruction register section for multiplication instructions different in
Microinstructions from 309 are provided for each. According to such a configuration of the pixel operation unit 30,
If the storage capacity of the instruction memory 506 and the instruction memory 507 is
Can be reduced. If the pixel operation unit 30 receives the instruction
Inter holding units 308a to 308c, instruction register unit 309,
Assuming that the distribution unit 310 is not provided, the instruction memory 5
06 and the instruction memory 507 are all three execution units 50
To supply different micro-instructions for 1a-c,
Three microinstructions must be stored in parallel. FIG. 22 shows instruction pointer holding units 308a to 308a.
c, does not include the instruction register unit 309 and the distribution unit 310
The instruction memory 506 and the instruction memory 507
Here is an example of the content. In the figure, a 16-step micro
The program is stored and one microinstruction is 16 bits
G. In this case, the instruction memory 506 and the instruction
Mori 507 can record three microinstructions in parallel.
From the above, a total of 1536 bits (16 steps × 16 bits)
G × 3 × 2). On the other hand, the pixel operation unit 30 of this embodiment
Instruction pointer holding units 308a to 308c and an instruction register
FIG. 23 shows an example of the storage contents of the star section 309. In the figure
Also, a 16-step microprogram is stored
The micro instruction has 16 bits. In the figure,
The instruction pointer holding units 308a to 308c respectively store 16 records.
The register number (4 bits) is stored in the instruction register 3
09 stores 16 microinstructions. In this case, the life
Instruction pointer holding units 308a-c and instruction register unit 309
Is 448 bits (16 steps × (12+
16)). As described above, the pixel operation unit 30
Program memory capacity can be significantly reduced.
You. Actually, "dest" and "modify flag"
Since it is issued separately, the recording capacity for that
Or a circuit is required. Also, instruction memories 506 and 50
7 designates “dest” in the microinstruction, and
Issues multiplication and addition / subtraction instructions common to row units 501a to 501c.
Instruction memories 506 and 507
Has not been completely removed. If command
If six output ports are provided in the register section 309, the instruction
Memory 506 and instruction memory 507 can be deleted
become. In FIG. 23, the instruction pointer holding unit 3
08a to 308c indicate that the value of the first program counter is 0
Outputs the conversion address (register number) in the case of ~ 15
But not limited to this. For example, the first program
Outputs the conversion address when the counter value is 32 to 47
You may make it. In this case, the first program counter
Configuration to add an appropriate offset value to the
No. As a result, any arbitrary value indicated by the first program counter
The address string can be converted into a conversion address. With the above configuration, in this embodiment, the compression image
In addition to decoding image data and compressed audio data,
Encoding of video and audio data and polygon decoding
Data-based graphics processing.
You. In addition, parallel processing of multiple execution units improves processing efficiency.
It is above. Moreover, the instruction register units 308a to 308
In c, the order of some microinstructions has been changed
Resource interference among multiple execution units.
As a result, the processing efficiency is further improved. In the above embodiment, three execution units are provided.
The configuration that shows that each of the RGB colors is
This is because it can be operated independently. Further execution unit
May be any number as long as it is three or more. In addition,
In the embodiment, the video and audio processing devices 1000 and 2000
Are desirably implemented as one-chip LSIs. Sa
Furthermore, the external memory 3 is described as being outside the chip.
However, it may be configured to be built in one chip. In the above embodiment, the external memory
Stream input / output unit 1 (or stream input / output unit
21) is an MPEG stream (or video / audio data
Data), but the host processor directly
It may be configured to be stored in the memory 3. In addition,
In the embodiment, the IO processor 5 has four instruction cycles.
Task switching is performed every time, but other than 4 instruction cycles
May be performed for each of a plurality of instruction cycles. Also, task off
The number of replacement instruction cycles is weighted in advance for each task.
The number of instruction cycles may be different. Also priority
Weights the number of instruction cycles for each task according to degree and urgency
May be performed. According to the video / audio processing apparatus of the present invention, compressed audio
Data stream containing data and compressed video data
Input and decode from the
Video and audio processing device that outputs to the
I / O processing means that performs I / O processing that occurs asynchronously
And the data stored in the memory in parallel with the input / output processing.
Performs decoding processing mainly for data stream decoding.
Decoding processing means.
More decoded video data and decoded audio data
Data is stored in the memory, and the input / output processing is performed externally.
Inputting said data stream which is input asynchronously;
The data stored in the memory and the data stored in the memory.
Data stream to the decoding processing means,
To the output rate of each display device and audio output device.
To read from memory and output to them.
It is configured to perform as input / output processing. According to this configuration, the input / output processing means and the deco
The code processing means operates in parallel in a pipeline
In addition, asynchronous processing and decoding processing are performed by input / output processing means.
And decoding processing means.
The means is released from the processing that occurs asynchronously and
Be able to devote themselves to reasoning. As a result, this video and audio processing
The devices are called stream data input, decode, and output.
Stream processing is performed efficiently, so stream data
Full decoding (no dropped frames) of high-speed operation
This is possible without using a lock. Further, the decoding processing means includes a data stream
It is a sequential process mainly for condition judgment
Header analysis of compressed audio data and compressed video data
And decoding of the compressed audio data.
Serial processing means, and in parallel with the sequential processing,
I do. Routine processing excludes header analysis of compressed video data.
Standard processing means for decoding compressed video data
The configuration may be as follows. According to this configuration, sequential processing with different processing characteristics
Processing and routine processing suitable for parallel processing in one unit
Greater processing efficiency by eliminating coexistence
Can be improved. In particular, the processing of routine processing means
Efficiency can be improved. Because this video and audio processing
In the processing device, the routine processing means performs the above asynchronous processing and
Since it was released from sequential processing, compressed video data
Can be used exclusively for routine various operations required for code
This is because that. As a result, do not use a high-speed operation clock.
High processing power can be obtained. Further, the input / output processing means may be provided externally.
An input means for inputting an asynchronous data stream and an external
Video output for outputting decoded video data to a display device
Output means and audio data decoded by an external audio output device.
Voice output means for outputting data,
Process to execute while switching the first to fourth tasks
A first task from the input unit to the memory
A program for transferring a data stream to the
The second task is to transfer data from the memory to the decode processing means.
A program for supplying a stream, wherein the third task
The video data is decoded from the memory to the video output unit.
The fourth task is a program that outputs data
From the memory to the audio output unit.
The program may be configured to output. Here, the processor is configured to execute
At least four program cows corresponding to the fourth task
Program counter unit with a
Each task program is executed using the instruction address indicated by the program counter.
Instruction fetching an instruction from the instruction memory
Command section and instructions to execute the instruction fetched by the instruction fetch section.
The instruction execution unit is instructed every time a predetermined number of instruction cycles elapse.
Program counter for instruction fetch section
With a task control unit that controls
Good. According to this configuration, it is determined by the external device.
Input rate and input cycle of stream data, external display
Video data and audio determined by the device and external audio output device
What is the output rate and output cycle of each data
Response delay for I / O requests
It has the effect of being small. In addition, the video / audio processing of the present invention
The device includes data including compressed audio data and compressed video data.
Input means for inputting a data stream, and a data stream
Is a sequential process that mainly focuses on
Header added for each block in the stream
Analysis of information and compression audio data in the data stream
A sequential processing means for performing decoding and a
Data processing using the result of header analysis.
The compressed video data in the stream is
Standard processing means for decoding in predetermined block units.
The sequential processing means analyzes the header of the predetermined block.
When the processing of the predetermined block is completed,
Instruct the start of the code,
When a decoding end notification is received, the
The configuration may be such that the header analysis is started. According to this configuration, the sequential processing means performs the compression
Wide variety of image data and compressed audio data
Responsible for header analysis that requires judging conditions
Also responsible for decoding compressed audio data. On the other hand,
Processing means for the block data of the compressed video data,
Responsible for a large amount of routine calculations. Such division of roles
And the sequential processing means perform in comparison with video decoding.
Audio decoding, which is less complex, and compressed video data
The analysis of the data and the control of the routine processing means are performed. Of that control
Below, the routine processing means performs routine calculations exclusively,
Lean and efficient processing can be realized. Therefore high lap
Processing power can be obtained without operating at wavenumber,
The manufacturing cost can be reduced. In addition,
The column shows the general audio decoding and header decoding of the compressed video data.
Analysis and control of routine processing means are performed sequentially, so that
It can be composed of a heat sink. The routine processing means may be a sequential processing means.
The compressed video data in the data stream is
Data conversion means for variable-length decoding, and variable-length decoding
To perform a predetermined operation on the video block
Calculating means for performing more inverse quantization and inverse discrete cosine transform;
Video block after inverse discrete cosine transform and decoded block
To perform motion compensation by combining
Synthesizing means for restoring the data, wherein the sequential processing means
Obtains header information that has been variable-length decoded by data conversion means
Acquisition means to perform, and analysis to analyze the acquired header information
Standard processing of means and parameters obtained as analysis results
Notification means for notifying the means, and input by the input means
Audio decoding to decode compressed audio data in the data stream
Decoding means and decoding of a predetermined block from the routine processing means.
When receiving an interrupt signal notifying that the
The operation of the step is stopped and the acquisition means is started, and the
When the informing means gives the notification, the information is displayed on the data converting means.
Control means for instructing the start of variable-length decoding of an image block.
You may comprise so that it may have. According to this configuration, a location such as a macro block
The sequential processing means performed header analysis on a fixed block basis.
After that, perform audio decoding and use the
When the decoding of the block is completed, the header analysis of the next block
To start. In this way, the sequential processing means uses the time division header
One processor to repeat analysis and audio decoding
At low cost. Also, routine processing
The means does not need to perform a wide variety of condition judgment processing
Low cost, dedicated hardware (or hardware
Firmware). Here, the calculating means further comprises one block.
A first buffer having a storage area corresponding to the
The data conversion means includes a compressed video data in the data stream.
Length decoding means for variable length decoding data, and a first buffer
The addresses of the storage areas are arranged in zigzag scan order.
First address table means for storing one address string;
The address of the storage area of the first buffer is set to an alternate
Second address for storing a second address sequence arranged in the order of the channels
Table means, one of the first address string and the second address string
According to the variable length decoding means obtained by the variable length decoding means
For writing block data to be written to the first buffer
It may be configured to have a step. According to this configuration, the writing means is provided with a zigzag
Gscan and alternate scan
Write the block data to the storage area of the first buffer
Can be taken. Therefore, the calculating means stores the data in the first buffer.
When reading block data from the storage area,
You do not need to change the order of the dresses, scan type
Regardless of the read address, read in the same order
be able to. [0114] Further, the analyzing means may be configured to execute the processing based on the header information.
Calculate the quantization scale and the motion vector based on the
The notifying means uses the quantization scale as the calculating means,
May be configured to notify the combining means. This structure
According to Sung, the calculation of the motion vector is in charge of the sequential processing means
And the combining means calculates the calculated motion vector.
The motion compensation processing can be routinely performed by using. . Further, each of the arithmetic means is provided with a microcontroller.
A first and second control storage unit for storing a program;
A first program for designating a first read address in the control storage unit
And a second program for specifying a second read address.
RAM counter, first read address and second read address
Selector for selecting one of the above and outputting the selected one to the second control storage unit
And a multiplier and an adder, and the first and second control storage units
Inverse amount per block by microprogram control
And an execution unit for performing the inverse transformation and the inverse discrete cosine transform.
It is good also as a result. According to this configuration, the microprogram
(Firmware) performs a wide variety of condition judgment processing
Since there is no need to implement routine processing,
Small ram size, easy to make, low cost
Suitable for. Moreover, two program counters are used.
To operate the multiplier and the adder independently and in parallel.
Can be. Further, the execution unit is provided with a selector by a selector.
2 When a read address is selected, processing using a multiplier
And processing using the adder are performed independently and in parallel, and the selector
When the first read address is selected by the
Process and the process using the adder
You may comprise. According to this configuration, the multiplier and the adder
Can reduce the play time and improve the processing efficiency
You. Here, the calculating means further comprises:
First buffer holding video block from conversion means
And the block that has been subjected to inverse discrete cosine transform by the execution unit
And the first control storage unit is configured to
Microprogram for quantization processing and inverse discrete cosine transform
And the second control storage.
The microprogram for inverse discrete cosine transform and the inverse
Transfer the cosine transformed video block to the second buffer
And the execution means,
Transfer the discrete cosine transformed video block to the second buffer
And the process of dequantizing the next video block
Execute in parallel and inversely quantize the inversely quantized video block
The cosine transform is performed by linking the multiplier and the adder.
It may be configured to execute. According to this configuration, the inverse quantization processing and the second
Process is performed in parallel with the transfer process to the
Can be up. Further, the input means further comprises:
Inputting polygon data, the sequential processing means further
Analyze polygon data to determine polygon vertex coordinates and edges
And the routine processing means further calculates the inclination of
The vertex coordinates and the inclination, and therefore the image data of the polygon
Data may be generated. According to this configuration, the sequential processing means is a polygon.
Data analysis, and the standard processing means uses standard images.
Responsible for data generation processing. This video and audio processing device
Graph that efficiently generates image data from polygon data
Ix processing. Here, the first,
The second control storage unit is further configured to run by the DDA algorithm.
Storing a microprogram for performing the conversion,
Are the vertex coordinates calculated by the sequential processing means and the inclination.
Scan conversion by microprogram control based on time
May be configured to be performed. According to this configuration, the image data is generated before.
The scan conversion microprogram is stored in the first and second control storage units.
Can be easily realized. In addition, the synthetic hand
The stage represents a difference representing a difference image from the video data to be further compressed.
A second block, wherein said second buffer is further generated.
The first control storage unit further stores the obtained difference image.
Microprogram for string conversion and microphone for quantization
And the second control storage unit is further discrete
Microprogram for cosine transformation and discrete cosine transformation
Microprogram for transferring video block to first buffer
And the execution means further stores in the second buffer
Discrete cosine transform and quantization on retained difference blocks
To transfer the data to the first buffer,
Is also variable length coded for the block in the first buffer
And the sequential processing means further includes a data conversion means.
Header information for a given block that has been
The information may be added. According to this configuration, the routine processing means is of a routine type.
In charge of quantization and discrete cosine transform
Means is responsible for processing that requires condition judgment (addition of header information).
Hit. In this case, the video and audio processing device
To convert compressed image data from image data without using
Code processing can be performed efficiently. Also,
The arithmetic means stores a microprogram, respectively.
The first and second control storage units and the first control storage unit
A first program counter for specifying an output address;
A second program counter for designating a read address;
Select one of the first read address and the second read address
A selector for outputting to the second control storage unit, a multiplier and an adder
And a microphone by the first and second control storage units.
Block-based inverse quantization and inverse separation by program control
And a plurality of execution units for performing a cosine transform.
Section shares and processes the divided partial blocks
May be configured. According to this configuration, a plurality of execution units are connected in parallel.
Executes operation instructions, so a large amount of routine operations can be performed at the pixel level.
It can be executed efficiently by parallelizing with a bell. Ma
Further, the arithmetic means is further provided corresponding to each execution unit.
Each conversion table is associated with a predetermined address string.
Maintains translated addresses that have their address order rearranged
Implement multiple address translation tables and predetermined operations
Change the individual microinstructions that make up the microprogram
Consists of multiple registers that are stored in correspondence with the replacement address
An instruction register group, the first and second control storage units,
Provided between the first control storage unit and the selector.
Micro instruction output to each execution unit from the instruction register
Switch to micro instructions and output to multiple execution units.
A first read address or a second read address.
The address is an address in the predetermined address sequence
In this case, the address is determined according to each address conversion table.
Is converted to a conversion address. Instruction register group
Corresponds to each translation address output from the translation table
May be configured to output a microinstruction. According to this configuration, a plurality of execution units are connected in parallel.
Access between execution units during execution of microprogram
Avoids resource interference such as
Can be Here, each of the conversion tables further includes
And the first program counter in the predetermined address string
During output of the first read address, adjustment in the register
Should be added or subtracted with the output of the microinstruction indicating the operation
Outputting a flag indicating whether to perform the processing to the plurality of execution units;
Each execution unit performs addition and subtraction according to the flag,
The flag is in accordance with a microinstruction in the second control storage unit.
It may be configured to be set. According to this configuration, processing is performed by a microinstruction.
The conversion table specifies whether to perform arithmetic or subtraction
So the same microprogram can be shared in two ways
And further reduce the overall capacity of the microprogram
Hardware scale, and thus low cost
Can be realized. Further, the second control storage unit further stores
And the first program counter in the predetermined address string
While outputting the first read address, the
With the output of the micro instruction, the storage destination of the micro instruction execution result
Is output to the plurality of execution units, and each of the execution units
Is configured to store the execution result according to the storage location information
May be. According to this configuration, the storage destination information is stored in the instruction record.
Can be specified separately from the microprogram in the
The microprogram is processed differently, for example, matrix operation.
Can be shared in the partial processing of. The result
As a result, further reduce the total capacity of the microprogram
Hardware scale, and thus low cost
Can be realized.

【図面の簡単な説明】【図１】第１の従来技術における映像音声デコーダによ
るデコード処理の説明図を示す。【図２】第２の従来技術における２チップ構成のデコー
ダによるデコード処理の説明図を示す。【図３】本発明の第１の実施形態における画像処理装置
の概略構成を示すブロック図である。【図４】本発明の第１の実施形態における画像処理装置
の構成を示すブロック図である。【図５】ＭＰＥＧストリームを階層的に示すとともに画
像処理装置各部の動作タイミングを示す図である。【図６】プロセッサ７によるマクロブロックヘッダの解
析と、他の各部への制御内容とを示す図である。【図７】画素演算部１０の構成を示すブロック図であ
る。【図８】第１命令メモリ５０６及び第２命令メモリ５０
７に記憶されたマイクロプログラムの一例を示す。【図９】画素演算部１０の動作タイミングを示す図であ
る。【図１０】画素読み書き部１１の詳細な構成を示すブロ
ック図である。【図１１】ＩＯプロセッサ５の構成を示すブロック図で
ある。【図１２】命令読出回路５３の詳細な構成例を示すブロ
ック図である。【図１３】ＩＯプロセッサ５の動作タイミングを示すタ
イムチャートである。【図１４】タスク管理部の構成を示すブロック図であ
る。【図１５】ＦＩＦＯメモリ４以降の復号動作を示す説明
図である。【図１６】本発明の第２の実施形態のおける画像処理装
置の構成を示すブロック図である。【図１７】画素演算部３０の構成を示すブロック図であ
る。【図１８】第１命令メモリ５０６、第２命令メモリ５０
７の記憶内容の一例を示す。【図１９】コード変換部９の構成を示すブロック図であ
る。【図２０】８×８個の空間周波数データを記憶するブロ
ック記憶領域と、ジグザグスキャンの順路を示す。【図２１】８×８個の空間周波数データを記憶するブロ
ック記憶領域と、オルタネートスキャンの順路を示す。【図２２】命令ポインタ保持部３０８ａ〜ｃ、命令レジ
スタ部３０９、分配部３１０を備えていない場合の命令
メモリ５０６及び命令メモリ５０７の記憶内容の一例を
示す。【図２３】命令ポインタ保持部３０８ａ〜ｃ、命令レジ
スタ部３０９の記憶内容の一例を示す。【符号の説明】１ストリーム入力部２バッファメモリ３外部メモリ４ＦＩＦＯメモリ５入出力プロセッサ５ａＤＭＡＣ６メモリコントローラ７プロセッサ８内部メモリ９コード変換部１０画素演算部１２ビデオ出力部１３音声出力部１４ホストＩ／Ｆ部１０００映像音声処理装置１００１入出力処理部１００２デコード処理部１００３逐次処理部１００４定型処理部BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is an explanatory diagram of a decoding process by a video / audio decoder according to a first conventional technique. FIG. 2 is an explanatory diagram of a decoding process by a two-chip decoder according to a second conventional technique. FIG. 3 is a block diagram illustrating a schematic configuration of the image processing apparatus according to the first embodiment of the present invention. FIG. 4 is a block diagram illustrating a configuration of an image processing apparatus according to the first embodiment of the present invention. FIG. 5 is a diagram showing an MPEG stream in a hierarchical manner and showing operation timing of each section of the image processing apparatus. FIG. 6 is a diagram showing an analysis of a macroblock header by a processor 7 and contents of control to other units. FIG. 7 is a block diagram illustrating a configuration of a pixel operation unit 10. FIG. 8 shows a first instruction memory 506 and a second instruction memory 50;
7 shows an example of the microprogram stored in FIG. FIG. 9 is a diagram showing operation timings of the pixel operation unit 10; FIG. 10 is a block diagram showing a detailed configuration of a pixel read / write unit 11; FIG. 11 is a block diagram showing a configuration of an IO processor 5. FIG. 12 is a block diagram showing a detailed configuration example of an instruction reading circuit 53. FIG. 13 is a time chart showing the operation timing of the IO processor 5; FIG. 14 is a block diagram illustrating a configuration of a task management unit. FIG. 15 is an explanatory diagram showing a decoding operation after the FIFO memory 4; FIG. 16 is a block diagram illustrating a configuration of an image processing apparatus according to a second embodiment of the present invention. FIG. 17 is a block diagram illustrating a configuration of a pixel operation unit 30. FIG. 18 shows a first instruction memory 506 and a second instruction memory 50
7 shows an example of the storage content of No. 7. FIG. 19 is a block diagram showing a configuration of a code conversion unit 9; FIG. 20 shows a block storage area for storing 8 × 8 spatial frequency data and a zigzag scan route. FIG. 21 shows a block storage area for storing 8 × 8 spatial frequency data and a route of an alternate scan. FIG. 22 shows an example of the contents stored in the instruction memory 506 and the instruction memory 507 when the instruction pointer holding units 308a to 308, the instruction register unit 309, and the distribution unit 310 are not provided. FIG. 23 shows an example of the contents stored in the instruction pointer holding units 308a to 308c and the instruction register unit 309. [Description of Signs] 1 Stream input unit 2 Buffer memory 3 External memory 4 FIFO memory 5 Input / output processor 5a DMAC 6 Memory controller 7 Processor 8 Internal memory 9 Code conversion unit 10 Pixel operation unit 12 Video output unit 13 Audio output unit 14 Host I / F unit 1000 Video / audio processing device 1001 Input / output processing unit 1002 Decoding processing unit 1003 Sequential processing unit 1004 Standard processing unit

───────────────────────────────────────────────────── フロントページの続き (72)発明者木村浩三大阪府門真市大字門真1006番地松下電器産業株式会社内 (56)参考文献特開平８−111642（ＪＰ，Ａ) 特開平６−326615（ＪＰ，Ａ) 特開平９−37249（ＪＰ，Ａ) 大藤健、尾高敏則、櫻井貴康，ＨＤＴＶにも対応できるリアルタイム復合化を実現，日経エレクトロニクス，日経ＢＰ社，1994年３月14日，1994年３月14日号，ｐ．93−100 (58)調査した分野(Int.Cl.⁷，ＤＢ名) H04N 7/24 - 7/68 ────────────────────────────────────────────────── ─── Continuation of the front page (72) Inventor Kozo Kimura 1006 Kazuma Kadoma, Kadoma, Osaka Prefecture Matsushita Electric Industrial Co., Ltd. (56) References JP-A-8-1111642 (JP, A) 326615 (JP, A) Japanese Patent Laid-Open No. 9-37249 (JP, A) Realization of real-time decryption that can support HDTV, Ken Ohto, Toshinori Otaka, Takayasu Sakurai, Nikkei Electronics, Nikkei BP, March 14, 1994 March 14, 1994, p. 93-100 (58) Fields surveyed (Int. Cl. ⁷ , DB name) H04N ^7/ 24-7/68

Claims

(57) [Claims] 1. An input means for inputting a data stream including compressed audio data and compressed video data, and a block in the data stream, which is a fixed form processing mainly including a fixed form calculation. Routine processing means for performing variable length decoding of the header information added to each unit and the compressed video data in the data stream, and decoding of the variable length decoded compressed video data in block units; Analysis processing of header information subjected to variable length decoding processing by the standard processing means,
Serial processing means for performing time-division decoding processing of compressed audio data in the data stream, wherein the routine processing means performs variable-length decoding processing of header information and compressed video data, and performs variable-length decoding of compressed video data blocks. After the long decoding processing is completed, the sequential processing means is instructed to start the analysis processing of the header information of the next block, and the sequential processing means has the variable length encoded header information to be analyzed. Sometimes, the fixed form processing means is instructed to perform variable length decoding processing of the header information, and when the variable length decoded header information is obtained, it is analyzed, and after the acquisition of the header information of the block is completed, the fixed form processing means is Then, a variable length decoding process of the compressed video data of the block is started, and the compressed video data after the variable length decoding is decoded using the analyzed header information. Video and audio processing apparatus characterized by instructing the.