JP3589565B2

JP3589565B2 - Video and audio processing device

Info

Publication number: JP3589565B2
Application number: JP09344798A
Authority: JP
Inventors: 康介 ▲よし▼岡; 誠平井; 督三清原; 浩三木村
Original assignee: Panasonic Corp; Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Corp; Panasonic Holdings Corp
Priority date: 1997-04-07
Filing date: 1998-04-06
Publication date: 2004-11-17
Anticipated expiration: 2018-04-06
Also published as: JPH10341422A

Description

【０００１】
【発明の属する技術分野】
本発明は、デジタル信号処理の技術分野に属するものであって、圧縮された映像及び音声データの伸長、映像及び音声データの圧縮、グラフィックス処理などを行う画像処理装置に関する。
【０００２】
【従来の技術】
近年、ディジタル動画データの圧縮／伸長技術が確立されてきたことや、ＬＳＩ技術が向上してきたこととがあいまって、圧縮映像及び音声データを伸長するデコーダ、映像及び音声データを圧縮するエンコーダ、グラフィックス処理を行うグラフィックス処理などの種々の映像音声処理装置が重要視されている。
【０００３】
第１の従来技術として、ＭＰＥＧ（ＭｏｖｉｎｇＰｉｃｔｕｒｅＥｘｐｅｒｔｓＧｒｏｕｐ）規格の圧縮映像及び音声データを伸長する映像音声デコーダ（特開平８−１１１６４２９）がある。この映像音声デコーダは、１つの信号処理ユニットを用いて映像デコードと音声デコードの両方を行う。
図１に、この映像音声デコーダによるデコード処理の説明図を示す。同図の縦軸は時間を、横軸は演算量を表している。
【０００４】
縦軸に沿って大きく見ると、映像デコードと音声デコードとが交互に処理される。これは、共通のハードウェアで映像、音声の両者をデコードするためである。同図のように映像デコードは、逐次処理とブロック処理とに分けられる。逐次処理は、ブロック以外のデコード、つまりＭＰＥＧストリームのヘッダ解析など多岐にわたる条件判断を必要とする処理であり、その演算量は少ない。ブロックデコードは、ＭＰＥＧストリームの可変長符号を復号しさらにブロック単位に逆量子化、逆ＤＣＴ（離散余弦変換）を行う処理であり、その演算量は大きい。同図のように音声デコードも、多岐にわたる条件判断を必要とする上記と同様の逐次処理と、音声データ本体のデコード処理とに分けられる。音声データ本体のデコード処理は、画像データよりも高い精度が要求され、かつ限られた時間内に処理しなければならないので、精度よく高速に処理する必要があり、その演算量は大きい。
【０００５】
このように、第１の従来技術は、１チップ化を可能にし、１チップという少ないハードウェアで効率的な音声映像デコードを実現している。
第２の従来技術として、２チップ構成のデコーダがある。１チップは映像デコーダ、他の１チップは音声デコーダとして用いられる。図２に２チップ構成のデコーダによるデコード処理の説明図を示す。映像デコーダ、音声デコーダともにヘッダ解析等の条件判断を多数含む逐次処理と、データ本体のデコードを主とするブロックデコード処理とを行う。映像デコーダ、音声デコーダともに、独立に処理するので第１の従来技術と比べて個々のチップの能力は低くてよい。
【０００６】
【発明が解決しようとする課題】
しかしながら上記従来技術によれば、次のような問題があった。
第１の従来技術によれば、信号処理ユニットが映像も音声もデコードしなねればならないので、高い処理能力が要求される。つまり１００ＭＨｚ以上の高速クロックを用いて動作させる必要があり、民生用の半導体としてはコストが高いという問題がある。また、高速クロックを用いずに処理能力を高めるために、ＶＬＩＷ（ＶｅｒｙＬｏｎｇＩｎｓｔｒｕｃｔｉｏｎＷｏｒｄ）プロセッサなどを用いることも考えられなくはないが、ＶＬＩＷプロセッサそのもののコストが高いうえに、別途逐次処理を行うプロセッサを用いなければ全体の処理としては非効率になるという問題がある。
【０００７】
第２の従来技術によれば、２つのプロセッサを用いるのでコストが高いという問題があった。つまり、映像用プロセッサも音声用プロセッサも、処理能力の低い汎用の安価なプロセッサをそのまま使用することはできない。なぜなら映像用のプロセッサは、大量の画像データをリアルタイムに処理する能力が要求されるからである。また音声用のプロセッサは、映像用プロセッサほど多くの演算量を要求されないけれども、音声データの方が画像データよりも高い精度を要求されるからである。それゆえ、安価なあるいは処理能力の低いプロセッサでは、映像用としても音声用としても、要求される処理能力を満たさない。
【０００８】
さらに、ディジタル（衛星）放送用チューナー（ＳＴＢ（ＳｅｔＴｏｐＢｏｘ）と呼ばれる）やＤＶＤ（ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅ／ＶｉｄｅｏＤｉｓｃ）再生装置などに用いられるＡＶデコーダ中に上記映像音声処理装置が用いられる場合には、放送波から受信されたあるいはディスクから読み出されたＭＰＥＧストリームを入力し、そのＭＰＥＧストリームをデコードし、最終的にディスプレイ、スピーカなどへ映像信号出力及び音声信号出力をするまでに必要とされる一連の処理量は膨大なものとなる。最近では、このような一連の膨大な処理を効率良く実行する映像音声処理装置に対する要求が高まっている。
【０００９】
本発明は、圧縮画像及び圧縮音声データを表すストリームデータの入力、デコード、出力という一連の処理を行い、高い周波数で動作させなくても高い処理能力を有し、製造コストを低減させることができる映像音声処理装置を提供することを目的とする。
また本発明の他の目的は、圧縮映像データのデコード、映像データのエンコード、グラフィックス処理を低コストで実現する映像音声処理装置を提供することにある。
【００１０】
【課題を解決するための手段】
上記の課題を解決するため本発明の映像音声処理装置は、圧縮音声データと圧縮映像データとを含むデータストリームを外部から入力、デコードし、デコードしたデータを出力装置に出力する装置であって、外部要因により非同期に発生する入出力処理を行う入出力処理手段と、前記入出力処理と並行して、メモリに格納されたデータストリームのデコードを主とするデコード処理を行うデコード処理手段とを備え、前記デコード処理手段によりデコードされた映像データ、デコードされた音声データはメモリに格納され、前記入出力処理は、外部から非同期に入力される前記データストリームを入力し、さらにメモリに格納することと、メモリに格納されたデータストリームをデコード処理手段に供給することと、外部の表示装置、音声出力装置それぞれの出力レートに合わせてメモリから読み出し、それらに出力することとを入出力処理として行うように構成されている。
【００１１】
この構成によれば、入出力処理手段とデコード処理手段とがパイプライン的に並列動作することに加えて、非同期処理とデコード処理とを入出力処理手段とデコード処理手段とに分担させるので、デコード処理手段は非同期に発生する処理から解放されてデコード処理に専従することができる。その結果、本映像音声処理装置は、ストリームデータ入力、デコード、出力という一連の処理を効率良く実行するので、ストリームデータのフルデコード（フレーム落ちなし）を高速な動作クロックを用いなくても可能にしている。
【００１２】
【発明の実施の形態】
本発明の映像音声処理装置について、その実施の形態を次のように項分けして記載する。
１第１の実施形態
１．１映像音声処理装置の概略構成
１．１．１入出力処理部
１．１．２デコード処理部
１．１．２．１逐次処理部
１．１．２．２定型処理部
１．２映像音声処理装置の構成
１．２．１入出力処理部の構成
１．２．２デコード処理部
１．２．２．１逐次処理部
１．２．２．２定型処理部
１．３各部の詳細構成
１．３．１プロセッサ７（逐次処理部）
１．３．２定型処理部
１．３．２．１コード変換部
１．３．２．２画素演算部
１．３．２．３画素読み書き部
１．３．３入出力処理部
１．３．３．１ＩＯプロセッサ
１．３．３．１．１命令読出回路
１．３．３．１．２タスク管理部
１．４動作説明
２第２の実施形態
２．１映像音声処理装置の構成
２．１．１画素演算部
＜１．第１の実施形態＞
本実施形態における映像音声処理装置は、衛星放送受信装置（ＳＴＢ：ＳｅｔＴｏｐＢｏｘと呼ばれる）、ＤＶＤ（ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｃ）再生装置、ＤＶＤ−ＲＡＭ記録再生装置などに備えられ、圧縮映像／音声データとして衛星放送から又はＤＶＤからのＭＰＥＧストリームを入力し、伸長処理（以下単にデコードと呼ぶ）を行って、映像信号及び音声信号を外部の出力装置に出力する。
＜１．１映像音声処理装置の概略構成＞
図３は、本発明の第１の実施形態における映像音声処理装置の概略構成を示すブロック図である。
【００１３】
映像音声処理装置１０００は、入出力処理部１００１、デコード処理部１００２、メモリコントローラ６を備え、入出力処理とデコード処理とを分離して並行して行うように構成されている。また、外部メモリ３は、ＭＰＥＧストリームやデコード後の音声データを一時的に記憶する作業用メモリ、デコード後の映像データを記憶するフレームメモリとして利用される。
＜１．１．１入出力処理部＞
入出力処理部１００１は、映像音声処理装置１０００の内部動作とは非同期に発生する入出力処理を行う。この入出力処理は、（ａ）外部から非同期に入力されるＭＰＥＧストリームを入力して外部メモリ３に一時的に格納すること、（ｂ）外部メモリ３に格納されたＭＰＥＧストリームをデコード処理部１００２に供給すること、（ｃ）デコードされた映像データ、音声データを外部メモリ３から読み出し、外部の表示装置、音声出力装置（図外）それぞれの出力レートに合わせて出力することを内容とする。
＜１．１．２デコード処理部＞
デコード処理部１００２は、入出力処理部１００１の動作とは独立に並行して、入出力処理部１００１によって供給されるＭＰＥＧストリームのデコードし、デコード後の映像データ及び音声データを外部メモリ３に格納する。ＭＰＥＧストリームのデコード処理は演算量が多く処理内容も多岐にわたるため、デコード処理部１００２は、逐次処理部１００３、定型処理部１００４とを備え、多岐に亘る条件判断を主とする逐次処理と、定型的な大量の演算を主としかつ並列演算に適した定型処理とを分離して並行して実行するように構成されている。ここで、逐次処理は、ＭＰＥＧストリームのヘッダ解析などであり、ヘッダの検出及びヘッダ内容の判定等の多数の条件判断含む。また定型処理は、所定数の画素からなるブロック単位に各種演算を施す必要があるので、パイプライン的な並列処理に適していて、かつ、異なるデータ（画素）に対して全く同じ演算を施すというベクトル演算のような並列処理に適している。
＜１．１．２．１逐次処理部＞
逐次処理部１００３は、入出力処理部１００１から供給される圧縮音声データ及び圧縮映像データのヘッダ解析と、定型処理部１００４をマクロブロック毎に起動する制御と、圧縮音声データのデコード処理とを上記逐次処理として行う。ヘッダ解析は、ＭＰＥＧストリームにおけるマクロブロックヘッダの解析と、動きベクトルの復号を含む。ここでブロックとは、８＊８画素からなる画像を表す。マクロブロックとは、４つの輝度ブロックと２つの色差ブロックからなる。動きベクトルとは、参照フレーム中の８＊８画素の矩形領域を指すベクトルであり、当該ブロックが参照フレーム中のどの矩形領域との差分がとられたかを指し示す。
＜１．１．２．２定型処理部＞
定型処理部１００４は、逐次処理部１００３からマクロブロック毎にデコードの起動指示を受けて逐次処理部１００３の音声デコード処理と並行して、マクロブロックのデコード処理を上記定型処理として行う。このデコード処理は、可変長符号の復号（ＶＬＤ：ＶａｒｉａｂｌｅＬｅｎｇｔｈｃｏｄｅＤｅｃｏｄｉｎｇ）、逆量子化（ＩＱ：ＩｎｖｅｒｓｅＱｕａｎｔｉｚａｔｉｏｎ）、逆離散余弦変換（ＩＤＣＴ：ＩｎｖｅｒｓｅＤｉｓｃｒｅｔｅＣｏｓｉｎｅＴｒａｎｓｆｏｒｍ）、動き補償（ＭＣ：ＭｏｔｉｏｎＣｏｍｐｅｎｓａｔｉｏｎ）を同順に施すことを内容とする。定型処理部１００４は、動き補償において、復号後のブロックをフレームメモリとしての外部メモリ３にメモリコントローラ６を介して格納する。
＜１．２映像音声処理装置の構成＞
図４は、映像音声処理装置１０００のより詳細な構成を示すブロック図である。
＜１．２．１入出力処理部の構成＞
同図において入出力処理部１００１は、ストリーム入力部１、バッファメモリ２、入出力プロセッサ５（以下ＩＯプロセッサ５と略す）、ＤＭＡＣ（ＤｉｒｅｃｔＭｅｍｏｒｙＡｃｃｅｓｓＣｏｎｔｒｏｌｌｅｒ）５ａ、ビデオ出力部１２、音声出力部１３、ホストＩ／Ｆ部１４とを備える。
【００１４】
ストリーム入力部１は、外部からシリアルに入力されるＭＰＥＧデータストリームをパラレルデータ（以降、ＭＰＥＧデータと呼ぶ）に変換する。その際、ストリーム入力部１は、ＭＰＥＧデータストリームからＧＯＰ（ＧｒｏｕｐＯｆＰｉｃｔｕｒｅ：Ｉピクチャを１つ含み、約０．５秒分の動画に相当するＭＰＥＧデータストリーム）のスタートコードを検出し、その旨をＩＯプロセッサ５に通知する。この通知により変換後のＭＰＥＧデータは、ＩＯプロセッサ５の制御によりバッファメモリ２に転送される。
【００１５】
バッファメモリ２は、ストリーム入力部１から転送されたＭＰＥＧデータを一時的に保持する緩衝用メモリである。バッファメモリ２に保持されたＭＰＥＧデータは、さらに入出力プロセッサ５の制御の下でメモリコントローラ６を介して外部メモリ３に転送される。
外部メモリ３は、ＳＤＲＡＭ（ＳｙｎｃｈｒｏｎｏｕｓＤｙｎａｍｉｃＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）チップにより構成され、バッファメモリ２からメモリコントローラ６を介して転送されたＭＰＥＧデータを一時的に保持する。さらに、外部メモリ３は復号後の映像データ（以降、フレームデータとも呼ぶ）および復号後の音声データも保持する。
【００１６】
入出力プロセッサ５は、ストリーム入力部１、バッファメモリ２、外部メモリ３（メモリコントローラ６が介在する）、ＦＩＦＯメモリ４の間のデータ入出力を制御する。すなわち以下の（１）〜（４）に示す経路のデータ転送（ＤＭＡ転送）を制御する。
（１）ストリーム入力部１→バッファメモリ２→メモリコントローラ６→外部メモリ３
（２）外部メモリ３→メモリコントローラ６→ＦＩＦＯメモリ４
（３）外部メモリ３→メモリコントローラ６→バッファメモリ２→ビデオ出力部１２
（４）外部メモリ３→メモリコントローラ６→バッファメモリ２→音声出力部１３
これらの径路では入出力プロセッサ５は、ＭＰＥＧデータ中の映像データと音声データとを独立にそれぞれの転送を制御する。また、（１）、（２）は復号前のＭＰＥＧデータの転送経路である。（１）、（２）の転送経路において入出力プロセッサ５は、圧縮映像データと圧縮音声データとを別個に転送する。（３）、（４）はそれぞれ、復号後の映像、音声データの転送経路である。復号後の映像、音声データは、外部の表示装置（図外）、音声出力装置（図外）それぞれの出力レートに合わせて転送される。
【００１７】
ＤＭＡＣ５ａは、ストリーム入力部１、ビデオ出力部１２、音声出力部１３とバッファメモリ２との間のＤＭＡ転送、バッファメモリ２と外部メモリ３との間のＤＭＡ転送、外部メモリ３とＦＩＦＯメモリ４の間のＤＭＡ転送をＩＯプロセッサ５の制御に従って実行する。
ビデオ出力部１２は、外部の表示装置（ＣＲＴ等）の出力レート（たとえば水平同期信号Ｈｓｙｎｃの周期）に合せて入出力プロセッサ５にデータ要求を出し、入出力プロセッサ５により上記（３）の転送経路により入力される映像データをその表示装置に出力する。
【００１８】
音声出力部１３は、外部の音声出力装置の出力レートに合せて入出力プロセッサ５にデータ要求を出し、入出力プロセッサ５により上記（４）の転送経路により入力される音声データを音声出力装置（Ｄ／Ａコンバータ、音声アンプ、スピーカの組み合わせ等）に出力する。
ホストＩ／Ｆ部１４は、外部のホストプロセッサ、たとえばＤＶＤ再生装置の場合にはその制御全般を行うプロセッサとの通信を行うためのインターフェースである。この通信では、ホストプロセッサからＭＰＥＧストリームのデコード開始、停止、早送り再生、逆再生等の指示などが送られる。
＜１．２．２デコード処理部＞
図４のデコード処理部１００２は、ＦＩＦＯメモリ４、逐次処理部１００３、定型処理部１００４と備え、入出力処理部１００１からＦＩＦＯメモリ４を介して供給されるＭＰＥＧデータのデコード処理を行う。また、逐次処理部１００３は、プロセッサ７と内部メモリ８とを備える。定型処理部１００４は、コード変換部９、画素演算部１０、画素読み書き部１１、バッファ２００、バッファ２０１を備える。
【００１９】
ＦＩＦＯメモリ４は、２つのＦＩＦＯ（以下映像ＦＩＦＯ、音声ＦＩＦＯと呼ぶ）からなり、入出力プロセッサ５の制御の下で外部メモリ３から転送された圧縮映像データ、圧縮音声データをそれぞれ先入れ先出し式に記憶する。
＜１．２．２．１逐次処理部＞
プロセッサ７は、ＦＩＦＯメモリ４の圧縮映像データ及び圧縮音声データの読み出しを制御するとともに、圧縮映像データに対する一部のデコード処理と、圧縮音声データに対する全デコード処理とを行う。圧縮映像データの一部のデコード処理とは、ＭＰＥＧデータ中のヘッダ情報の解析と動きベクトルの計算と圧縮映像デコード処理の制御とを含む。これは、圧縮映像データの全デコード処理を、プロセッサ７と、定型処理部１００４とで分担して行うためである。つまりプロセッサ７は多岐にわたる条件判断を必要とする逐次処理を分担し、定型処理部１００４は、大量の定型的な演算処理を分担する。これに対し音声デコードは、映像デコードに比べて演算量が少ないのでプロセッサ７が全部を担当している。
【００２０】
プロセッサ７の機能を図５を用いて具体的に説明する。図５はＭＰＥＧストリームを階層的に示とともに映像音声処理装置各部の動作タイミングを示している。同図において横軸は時間軸である。第１階層はＭＰＥＧストリームの流れを示す。第２階層のように１秒間のＭＰＥＧストリームは、複数のフレーム（Ｉ、Ｐ、Ｂピクチャ）を含む。第３階層のように１フレームは、ピクチャヘッダと複数のスライスを含む。第４階層のように１スライスは、スライスヘッダと複数のマクロブロックを含む。第５階層のように１マクロブロックは、マクロブロックヘッダと６つのブロックを含む。
【００２１】
同図に示す第１〜第５階層のデータ構成は、公知文献、例えば株式会社アスキー「ポイント図解式最新ＭＰＥＧ教科書」に詳しく説明されている。
プロセッサ７は、同図の第５階層以下に示すように、ＭＰＥＧストリーム中のマクロブロック層までのヘッダ解析と圧縮音声データの復号とを行う。その際、プロセッサ７は、マクロブロック単位のヘッダ解析結果に従って、コード変換部９、画素演算部１０及び画素読み書き部１１に対してマクロブロックのデコードを開始を指示し、コード変換部９、画素演算部１０及び画素読み書き部１１によってマクロブロックのデコードがなされている間、ＦＩＦＯメモリ４から圧縮音声データの読み出してデコードする。コード変換部９、画素演算部１０及び画素読み書き部１１によりマクロブロックのデコードが終了したと、プロセッサ７は、割込み信号によりその旨の通知を受け、圧縮音声データのデコードを中断して、次のマクロブロックのヘッダ解析を開始する。
【００２２】
内部メモリ８は、プロセッサ７のワークメモリであり、復号された音声データを一時的に保持する。保持された音声データは、入出力プロセッサ５により上記（４）の経路で外部メモリ３に転送される。
＜１．２．２．２定型処理部＞
コード変換部９は、ＦＩＦＯメモリ４から読み出された圧縮映像データを可変長復号（ＶＬＤ）する。図５に示すように、コード変換部９は、復号後のデータのうち、ヘッダ情報及び動きベクトルに関する情報（図中の破線区間）をプロセッサ７に転送し、マクロブロック（輝度ブロックＹ０〜Ｙ３と色差ブロックＣｂ、Ｃｒとからなる６ブロック）のデータ（図中の実線区間）をバッファ２００を介して画素演算部１０に転送する。コード変換部９による復号後のマクロブロックのデータは空間周波数成分を表すデータである。
【００２３】
バッファ２００は、コード変換部９により書き込まれる１ブロック（８×８画素分）分の空間周波数成分を表すデータを保持する。
画素演算部１０は、コード変換部９からバッファ２００を介して転送されたブロックデータに対して、逆量子化処理（ＩＱ）及び逆離散余弦変換（ＩＤＣＴ）をブロック単位に行う。画素演算部１０による処理結果は、輝度ブロックであれば画素の輝度値又はその差分を表すデータであり、色差ブロックであれば画素の色差又はその差分を表すデータであり、バッファ２０１を介して画素読み書き部１１に転送される。
【００２４】
バッファ２０１は、１ブロック（８×８画素分）分の画素データを保持する。
画素読み書き部１１は、画素演算部１０の処理結果に対して、ブロック単位に動き補償を行う。すなわち、Ｐピクチャ、Ｂピクチャについては、外部メモリ３内の復号済みの参照フレームから動きベクトルが示す矩形領域をメモリコントローラ６を介して切り出して、画素演算部１０の処理結果のブロックと合成することにより、元のブロック画像に復号する。画素読み書き部１１による復号結果は、メモリコントローラ６を介して外部メモリ３に格納される。
【００２５】
上記の動き補償、ＩＱ、ＩＤＣＴの各内容については公知技術なので詳しい説明は省略する（上記文献参照）。
＜１．３各部の詳細構成＞
次に、映像音声処理装置１０００の主要な各部の詳細な構成について説明する。
＜１．３．１プロセッサ７（逐次処理部）＞
図６は、プロセッサ７によるマクロブロックヘッダの解析と、他の各部への制御内容とを示す図である。まず同図に略語で示してあるマクロブロックヘッダ中の各データは上記文献等に説明されているのでここでは説明を省略する。
【００２６】
同図のようにプロセッサ７は、コード変換部９にコマンドを発行して可変長復号されたヘッダ部分のデータを逐次取得し、その内容に従ってコード変換部９、画素演算部１０、画素読み書き部１１に対してマクロブロックのデコードに必要なデータを設定する。
具体的には、まずプロセッサ７は、コード変換部９にＭＢＡＩ（ＭａｃｒｏＢｌｏｃｋＡｄｄｒｅｓｓＩｎｃｒｅｍｅｎｔ）を取得するためのコマンドを発行して（Ｓ１０１）、コード変換部９からＭＢＡＩを取得する。このＭＢＡＩに基づき当該マクロブロックデータがスキップマクロブロックであれば（今デコードしようとしているマクロブロックが前回と同じであれば）、マクロブロックデータが省略されているのでＳ１１７に進み、スキップマクロブロックでなければヘッダ解析を続ける（Ｓ１０２、１０３）。
【００２７】
次いで、プロセッサ７はＭＢＴ（ＭａｃｒｏＢｌｏｃｋＴｙｐｅ）を取得するためのコマンドを発行して、コード変換部９からＭＢＴを取得する。このＭＢＴからブロックのスキャンタイプがジグザグスキャンかオールタネートスキャンかを判断し、画素演算部１０にバッファ２００の読み出し順序を指示する（Ｓ１０４）。
さらに、プロセッサ７は既に取得したヘッダデータからＳＴＷＣ（ＳｐａｒｔｉａｌＴｅｍｐｏｒａｌＷｅｉｇｈｔＣｏｄｅ）が存在するか否かを判定し（Ｓ１０５）、存在する場合にはコマンドを発行して取得する（Ｓ１０６）。
【００２８】
同様にしてプロセッサ７は、ＦｒＭＴ（ＦｒａｍｅＭｏｔｉｏｎＴｙｐｅ）、ＦｉＭＴ（ＦｉｅｌｄＭｏｔｉｏｎＴｙｐｅ）、ＤＴ（ＤＣＴｔｙｐｅ）、ＱＳＣ（ＱｕａｎｔｉｚｅｒＳｃａｌｅＣｏｄｅ）、ＭＶ（ＭｏｔｉｏｎＶｅｃｔｏｒ）、ＣＢＰ（ＣｏｄｅｄＢｌｏｃｋＰａｔｔｅｒｎ）を取得する（Ｓ１０７〜１１６）。その際、プロセッサ７は、ＦｒＭＴ、ＦｉＭＴ、ＤＴの解析結果を画素読み書き部１１に通知し、ＱＳＣの解析結果を画素演算部１０に通知し、ＣＢＰの解析結果をコード変換部９に通知する。これによりＩＱ、ＩＤＣＴ、動き補償に必要が情報が、コード変換部９、画素演算部１０、画素読み書き部１１に設定される。
【００２９】
また２プロセッサ構成では、多岐にわたる条件判断を必要とする上記の逐次処理を各プロセッサが個別に行うため冗長な構成になっていた。
次いで、プロセッサ７はコード変換部９に対してマクロブロックのデコード開始指示を発行する（Ｓ１１７）。これによりコード変換部９は、マクロブロック内の各ブロックについてＶＬＤを開始し、ＶＬＤの結果をバッファ２００を介して画素演算部１０に出力する。さらにプロセッサ７は、ＭＶデータに基づいて動きベクトルを計算し（Ｓ１１８）、その計算結果を画素読み書き部１１に通知する（Ｓ１１９）。
【００３０】
上記処理において、動きベクトルに関しては、動きベクトルのデータ（ＭＶ）取得（Ｓ１１３）し、動きベクトルの計算（Ｓ１１８）し、動きベクトルを画素読み書き部１１に設定する（Ｓ１１９）という一連の処理が必要である。この点、プロセッサ７は、動きベクトルデータ（ＭＶ）を取得（Ｓ１１３）した直後に動きベクトルの計算及び設定（Ｓ１１８、１１９）しないで、定型処理部１００４へのデコード開始指示を発行してから動きベクトルを計算及び設定を行うようにしている。これにより、プロセッサ７の動きベクトル計算および設定処理と、定型処理部１００４へのデコード処理とが並列に処理されるようになる。つまり定型処理部１００４のデコード開始タイミングを早くしている。
【００３１】
以上のようにしてマクロブロック１つ分の圧縮映像データのヘッダ解析が完了するので、プロセッサ７は、ＦＩＦＯメモリ４から圧縮音声データを取得して、音声デコード処理を開始する（Ｓ１２０）。音声デコード処理は、コード変換部９からマクロブロックのデコード完了を示す割り込み信号が入力されるまで続けられる。この割り込み信号によりプロセッサ７は次のマクロブロックに対して上記ヘッダ解析を開始する。
＜１．３．２定型処理部＞
次に、定型処理部１００４は、マクロブロック内の６つのブロックをコード変換部９、画素演算部１０、画素読み書き部１１を並列に（パイプライン的に）に動作させることによりデコード処理を行っている。ここでは、画素演算部１０、画素読み書き部１１、コード変換部９の順にそれらの構成をより詳細に説明する。
＜１．３．２．１コード変換部９＞
図１９は、コード変換部９の構成を示すブロック図である。
【００３２】
同図のコード変換部９は、ＶＬＤ部９０１、カウンタ９０２、インクリメンタ９０３、セレクタ９０４、スキャンテーブル９０５、スキャンテーブル９０６、フリップフロップ（以下ＦＦと略す）９０７、セレクタ９０８とを備え、可変長復号（ＶＬＤ）した結果をブロック単位に、ジグザグスキャン又はオルタネートスキャンの順に配列するようにバッファ２００に書き込むよう構成されている。
【００３３】
ＶＬＤ部９０１は、ＦＩＦＯメモリ４から読み出された圧縮映像データを可変長復号（ＶＬＤ）し、復号後のデータのうち、ヘッダ情報及び動きベクトルに関する情報（図５中の破線区間）をプロセッサ７に転送し、マクロブロック（輝度ブロックＹ０〜Ｙ３と色差ブロックＣｂ、Ｃｒとからなる６ブロック）のデータ（図５中の実線区間）をブロック（６４個の空間周波数データ）単位にバッファ２００に出力する。
【００３４】
カウンタ９０２、インクリメンタ９０３、セレクタ９０４からなる回路部分は、ＶＬＤ部９０１からの空間周波数データの出力に同期して、０から６３までを繰り返しカウントする。
スキャンテーブル９０５は、バッファ２００のブロック記憶領域のアドレスをジグザグスキャンの順に記憶しているテーブルであり、カウンタ９０２の出力値（０〜６３）が順に入力され、順次そのアドレスを出力する。図２０にバッファ２００中の８×８個の空間周波数データを記憶するブロック記憶領域と、ジグザグスキャンの順路を示す。スキャンテーブル９０５は、同図の順路における画素アドレスを順次出力する。
【００３５】
スキャンテーブル９０６は、バッファ２００のブロック記憶領域のアドレスをオルタネートスキャンの順に記憶しているテーブルであり、カウンタ９０２の出力値（０〜６３）が順に入力され、順次そのアドレスを出力する。図２１にバッファ２００中の８×８個の空間周波数データを記憶するブロック記憶領域と、オルタネートスキャンの順路を示す。スキャンテーブル９０５は、同図の順路における画素アドレスを順次出力する。
【００３６】
ＦＦ９０７は、スキャンタイプ（ジグザグスキャンかオルタネートスキャンか）を示すフラグを保持する。このフラグは、プロセッサ７により設定される。
セレクタ９０８は、ＦＦ９０７のフラグに応じてスキャンテーブル９０５とスキャンテーブル９０６とから出力されるアドレスを選択し、バッファ２００に書き込みアドレスとして出力する。
＜１．３．２．２画素演算部＞
図７は、画素演算部１０の構成を示すブロック図である。
【００３７】
同図のように画素演算部１０は、乗算器５０２と加減算器５０３からなる実行部５０１と、第１プログラムカウンタ（以降、第１ＰＣと略す）５０４と、第２プログラムカウンタ（以降、第２ＰＣと略す）５０５と、第１命令メモリ５０６と、第２命令メモリ５０７と、セレクタ５０８とを有し、ＩＱとＩＤＣＴの一部とをオーバラップさせて並列に実行できるように構成されている。。
【００３８】
実行部５０１は、第１命令メモリ５０６、第２命令メモリ５０７から順次出力されるマイクロ命令に従って、バッファ２００、２０１のアクセス及び演算を実行する。
第１命令メモリ５０６、第２命令メモリ５０７は、バッファ２００に保持されたブロック（周波数成分）に対して、ＩＱ、ＩＤＣＴを実現するためのマイクロプログラムを記憶する制御記憶である。図８に、第１命令メモリ５０６及び第２命令メモリ５０７に記憶されたマイクロプログラムの一例を示す。
【００３９】
同図において、第１命令メモリ５０６はＩＤＣＴ１Ａマイクロプログラムと、ＩＱマイクロプログラムとを記憶し、第１ＰＣ５０４によって読み出しアドレスが指定される。ＩＱマイクロプグラムは、バッファ２００の読み出しと、乗算とを主体とする演算処理であり、加減算器５０３を用いない。
第２命令メモリ５０７はＩＤＣＴ１Ｂマイクロプログラムと、ＩＤＣＴ２マイクロプログラムとを記憶し、セレクタ５０８を介して第１ＰＣ５０４又は第２ＰＣ５０５により読出アドレスが指定される。ここで、ＩＤＣＴ１は、乗算及び加減算を主とするＩＤＣＴの前半部分の処理を意味し、ＩＤＣＴ１ＡマイクロプログラムとＩＤＣＴ１Ｂマイクロプログラムとが同時に読み出されることにより実行部５０１全体を使って実行される。また、ＩＤＣＴ２は、加減算を主とするＩＤＣＴの後半部分の処理とバッファ２０１への書き出し処理を意味し、第２命令メモリ５０７のＩＤＣＴ２マイクロプログラムが読み出されることによって加減算器５０３を使って実行される。
【００４０】
ＩＱは乗算器５０２により、ＩＤＣＴ２は加減算器５０３により処理されるので、これらは並列動作可能になっている。図９に、画素演算部１０によるＩＱ、ＩＤＣＴ１、ＩＤＣＴ２の動作タイミング図を示す。
図９において、コード変換部９はバッファ２００に輝度ブロックＹ０のデータを書き込むと（タイミングｔ０）、その旨を制御信号１０２にて画素演算部１０に通知する。画素演算部１０は、プロセッサ７のヘッダ解析時に設定されたＱＳ（ＱｕａｎｔｉｚｅｒＳｃａｌｅ）値を用いて、第１ＰＣ５０４のアドレス指定に従って第１命令メモリ５０６のＩＱマイクロプログラムを読み出すことによってバッファ２００のデータに対してＩＱを行う。このとき、セレクタ５０８は第１ＰＣ５０４を選択する（タイミングｔ１）。
【００４１】
さらに、画素演算部１０は、第１ＰＣ５０４のアドレス指定に従ってＩＤＣＴ１Ａ及びＩＤＣＴ１Ｂマイクロプログラムを読み出すことによってバッファ２００のデータに対してＩＤＣＴ１を行う。このとき、セレクタ５０８は第１ＰＣ５０４を選択するので、第１命令メモリ５０６、第２命令メモリ５０７の双方に第１ＰＣ５０４からのアドレスが指定される（タイミングｔ２）。
【００４２】
次に、画素演算部１０は、上記ＱＳ（ＱｕａｎｔｉｚｅｒＳｃａｌｅ）値を用いて、第１ＰＣ５０４のアドレス指定に従って第１命令メモリ５０６のＩＱマイクロプログラムを読み出すことによってバッファ２００のブロックＹ１のデータに対してＩＱを行い、同時に、第２ＰＣ５０５のアドレス指定に従って第２命令メモリ５０７のＩＤＣＴ２マイクロプログラムを読み出すことによってブロックＹ０に対してＩＤＣＴ処理の後半部分を処理する。このときセレクタ５０８は第２ＰＣ５０５を選択する。第１ＰＣ５０４と第２ＰＣ５０５とは独立にアドレスを指定することになる（タイミングｔ３）。
【００４３】
この後も同様に画素演算部１０はブロック単位に処理を続ける（タイミングｔ４以降）。
＜１．３．２．３画素読み書き部＞
図１０は、画素読み書き部１１の詳細な構成を示すブロック図である。
同図のように画素読み書き部１１は、バッファ７１〜７４（以下、バッファＡ〜Ｄと呼ぶ）と、ハーフぺル補間部７５と、合成部７６と、セレクタ７７、７８と、読み書き制御部７９とからなる。
【００４４】
読み書き制御部７９は、バッファ２０１を介して入力されるブロックデータに対して、バッファＡ〜Ｄを用いて動き補償を行い、最終的な復号画像を２ブロック単位で外部メモリ３に転送する。より具体的には、プロセッサ７のヘッダ解析時に設定された動きベクトルに従って、外部メモリ３中の参照フレームから２ブロック分に相当する矩形領域を読み出すようメモリコントローラ６を制御する。その結果、バッファＡ又はバッファＢに動きベクトルが指し示す２ブロック分の矩形領域のデータが格納される。その後、ピクチャの種類（ＩかＰかＢピクチャか）に応じて２ブロック分の矩形領域のハーフペル補間を合成部７６にて行う。さらにバッファ２０１を介して入力されるブロックデータと、ハーフペル補間後の矩形領域とを合成（加算）することにより、当該ブロックの画素値を算出し、バッファＢに格納する。こうしてバッファＢに格納された最終的な復号ブロックはメモリコントローラ６を介して外部メモリ３に転送される。
＜１．３．３入出力処理部＞
入出力処理部１００１は、上記のように多数のデータ入出力（データ転送）を実行するために、種々のデータ転送を分担する複数のタスクをオーバーヘッドなく切り替え、しかもデータ入出力要求に対して応答遅延を生じさせないように構成されている。ここでいうオーバーヘッドは、タスクスイッチ時に発生するコンテキストの退避及び復帰である。つまり入出力プロセッサ５は、プログラムカウンタの命令アドレスやレジスタデータをメモリ（スタック領域）に退避及び復帰することにより生ずるオーバーヘッドを解消するように構成されている。ここでは、その詳細な構成について説明する。
＜１．３．３．１ＩＯプロセッサ＞
図１１は、ＩＯプロセッサ５の構成を示すブロック図である。同図において、ＩＯプロセッサ５は、状態監視レジスタ５１、命令メモリ５２、命令読出回路５３、命令レジスタ５４、デコーダ５５、演算実行部５６、汎用レジスタセット群５７、タスク管理部５８を備え、非同期に発生する複数のイベントに対応するために、極めて短い周期（例えば４命令サイクル）毎にタスクを切り替えながら実行するよう構成されている。
【００４５】
状態監視レジスタ５１は、レジスタＣＲ１〜ＣＲ３からなり、ＩＯプロセッサ５が種々の入出力状態を監視するための種々の状態データ（フラグなど）を保持する。例えば、状態監視レジスタ５１は、ストリーム入力部１の状態（ＭＰＥＧストリームにおけるスタートコード検出フラグ）、ビデオ出力部１２の状態（水平ブランキング期間を示すフラグ、フレームデータの転送完了フラグ）、音声出力部１３の状態（音声フレームデータの転送完了フラグ）や、それらとバッファメモリ２、外部メモリ３及びＦＩＦＯメモリ４との間でのデータ転送の状態（データ転送数、ＦＩＦＯメモリ４へのデータ要求フラグ）などを示す状態データを保持する。
【００４６】
より具体的には、以下のフラグ等を含む。
・スタートコード検出フラグ（以下フラグ１とも呼ぶ）
このフラグは、ストリーム入力部１によってＭＰＥＧストリームにおけるスタートコードが検出されたとき設定される。
・水平ブランキングフラグ（フラグ２）
このフラグは、水平ブランキング期間を示すフラグであり、ビデオ出力部１２により設定される。約６０マイクロ秒周期で設定される。
・映像フレームデータの転送完了フラグ（フラグ３）
このフラグは、外部メモリ３からビデオ出力部１２へ１フレーム分の復号された画像データが転送されたときＤＭＡＣ５ａによって設定される。
・音声フレームデータの転送完了フラグ（フラグ４）
このフラグは、外部メモリ３から音声出力部１３へ１フレーム分の復号された音声データが転送されたときＤＭＡＣ５ａによって設定される。
・データ転送完了フラグ（フラグ５）
このフラグは、ストリーム入力部１からバッファメモリ２へＩＯプロセッサ５により指定されたデータ数の圧縮画像データがＤＭＡＣ５ａによりＤＭＡ転送されたとき（ターミナルカウントになったとき）に設定される。
・ＤＭＡ要求フラグ（フラグ６）
このフラグは、バッファメモリ２の圧縮画像データ又は圧縮音声データを外部メモリ３へＤＭＡ転送すべきデータがあることを示すフラグであり、ＩＯプロセッサ５により設定される（後述するタスク１からタスク２への要求）。
・映像ＦＩＦＯへのデータ要求フラグ（フラグ７）
このフラグは、外部メモリ３からＦＩＦＯメモリ４中の映像ＦＩＦＯへのデータ転送を要求するフラグであり、映像ＦＩＦＯの圧縮映像データが所定量以下になったとき設定される。このフラグは、約５〜４０マイクロ秒周期で設定される。
・音声ＦＩＦＯへのデータ要求フラグ（フラグ８）
このフラグは、外部メモリ３からＦＩＦＯメモリ４中の音声ＦＩＦＯへのデータ転送を要求するフラグであり、音声ＦＩＦＯの圧縮音声データが所定量以下になったときに設定される。このフラグは、約１５〜６０マイクロ秒周期で設定される。
・デコーダ通信要求フラグ（フラグ９）
このフラグは、デコード処理部１００２から入出力処理部１００１へ通信を要求するフラグである。
・ホスト通信要求フラグ（フラグ１０）
このフラグは、ホストプロセッサから入出力処理部１００１へ通信を要求するフラグである。
【００４７】
上記のフラグ類は、ＩＯプロセッサ５により実行される各タスクにより、割り込みではなく、定常的に監視される。
命令メモリ５２は、多数のデータ入出力（データ転送）制御を分担する複数のタスクプログラムを記憶する。本実施例では、タスク０〜５の６つのタスクプログラムを記憶する。
・タスク０（ホストＩ／Ｆタスク）
本タスクは、上記フラグ１０が設定されたとき、ホストコンピュ−タとの通信、つまりホストＩ／Ｆ部１４を介したホストコンピュ−タとの通信処理を行うためのタスクである。例えば、ホストプロセッサからのＭＰＥＧストリームのデコード開始、停止、早送り再生、逆再生等の受け付けと、デコード状況（エラー等）の通知などが行われる。この処理は、上記フラグ１０をトリガーとする。
・タスク１（パージングタスク）
本タスクは、ストリーム入力部１によりスタートコードが検出されたとき（上記フラグ１）、ストリーム入力部１から入力されるＭＰＥＧデータを解析（パージング）して、個々のエレメンタリストリームを抽出して、抽出されたエレメンタリストリームを、ＤＭＡ転送（上記転送経路（１）の前半部分）によりバッファメモリ２に転送するプログラムである。ここで抽出されるエレメンタリストリームの種類は、圧縮映像データ（ビデオエレメンタリーストリームとも呼ぶ）、圧縮音声データ（オーディオエレメンタリーストリームとも呼ぶ）、プライベートデータなどがある。エレメンタリストリームをバッファメモリ２に格納したときに、上記フラグ６が設定される。
・タスク２（ストリーム転送／オーディオタスク）
本タスクは、次の（ａ）〜（ｃ）の転送を制御するプログラムである。
【００４８】
（ａ）バッファメモリ２から外部メモリ３へ個々のエレメンタリーストリームのＤＭＡ転送（上記転送経路（１）の後半部分）。この転送は、上記フラグ１、３をトリガーとする。
（ｂ）オーディオＦＩＦＯに保持されている圧縮音声データのデータサイズ（残量）に応じて、外部メモリ３からＦＩＦＯメモリ４のオーディオＦＩＦＯへの圧縮音声データのＤＭＡ転送（上記転送経路（２）におけるオーディオＦＩＦＯへの転送）。このデータ転送は、オーディオＦＩＦＯに保持されている圧縮音声データのデータサイズが一定量よりも少なくなった場合になされる。この転送は、上記フラグ８をトリガーとする。
【００４９】
（ｃ）外部メモリ３からバッファメモリ２へ、さらにバッファメモリ２から音声出力部１３へ復号後のオーディオデータのＤＭＡ転送（上記転送経路（４））。この転送は、上記フラグ２をトリガーとする。
・タスク３（映像供給タスク）
本タスクは、映像ＦＩＦＯに保持されている圧縮映像データのデータサイズ（残量）に応じて、外部メモリ３からＦＩＦＯメモリ４の映像ＦＩＦＯへの圧縮映像データのＤＭＡ転送（上記転送経路（２）における映像ＦＩＦＯへの転送）を処理するプログラムである。このデータ転送は、映像ＦＩＦＯに保持されている圧縮映像データのデータサイズが一定量よりも少なくなった場合になされる。この転送は、上記フラグ７をトリガーとする。
・タスク４（ビデオ出力タスク）
本タスクは、外部メモリ３からバッファメモリ２へ、さらにバッファメモリ２からビデオ出力部１２へ復号後の映像データのＤＭＡ転送（上記転送経路（４））を処理するプログラムである。この転送は、上記フラグ２をトリガーとする。
・タスク５（デコーダＩ／Ｆタスク）
本タスクは、デコード処理部１００２からＩＯプロセッサ５に向けてのコマンドを処理するプログラムである。コマンドには、「ｇｅｔＡＰＴＳ」、「ｇｅｔＶＰＴＳ」、「ｇｅｔＳＴＣ」などがある。ｇｅｔＶＰＴＳ（ＶｉｄｅｏＰｒｅｓｅｎｔａｔｉｏｎＴｉｍｅＳｔａｍｐ）は、デコード処理部１００２がＩＯプロセッサ５に対して圧縮映像データに付与されているＶＰＴＳの取得を要求するコマンドである。ｇｅｔＡＰＴＳ（ＡｕｄｉｏＰｒｅｓｅｎｔａｔｉｏｎＴｉｍｅＳｔａｍｐ）は、デコード処理部１００２がＩＯプロセッサ５に対して圧縮音声データに付与されているＡＰＴＳの取得を要求するコマンドである。ｇｅｔＳＴＣ（ＳｙｓｔｅｍＴｉｍｅＣｌｏｃｋ）は、デコード処理部１００２がＩＯプロセッサ５に対してＳＴＣの取得を要求するコマンドである。これらのコマンドを受けたＩＯプロセッサ５は、デコード処理部１００２にＳＴＣ、ＶＰＴＳ、ＡＰＴＳをそれぞれ通知する。ＳＴＣ、ＶＰＴＳ、ＡＰＴＳは、デコード処理部１００２において音声と映像とのデコードを同期させたり、フレーム単位でデコードの進度を調整するために用いられる。この処理は、上記フラグ９をトリガーとする。
【００５０】
命令読出回路５３は、命令フェッチアドレスを指すプログラムカウンタ（以下ＰＣと略す）を複数個備え、タスク管理部５８により指定されたＰＣを用いて命令メモリ５２から命令を読み出して命令レジスタ５４に格納する。具体的には、命令読出回路５３は、上記タスク０〜５に対応するＰＣ０〜５を有し、タスク管理部５８によるＰＣの指定が変更されたとき
、ハードウェアにより高速にＰＣを切り替えるように構成されている。この構成によりＩＯプロセッサ５は、タスクスイッチに際して現在のタスクのＰＣ値をメモリに退避し、メモリから次のタスクのＰＣ値を復帰する処理から解放されている。
【００５１】
デコーダ５５は、命令メモリ５２から読み出されて命令レジスタ５４に格納された命令を解読し、当該命令を実行するように演算実行部５６を制御する。加えて、デコーダ５５は、ＩＯプロセッサ５全体を、命令読出回路５３の命令読み出しステージ、デコーダ５５の解読ステージ、演算実行部５６の実行ステージの少なくとも３段からなるパイプライン制御を行う。
【００５２】
演算実行部５６は、ＡＬＵ（ＡｒｉｔｈｍｅｔｉｃＬｏｇｉｃａｌＵｎｉｔ）、乗算器、ＢＳ（ＢａｒｒｅｌＳｈｉｆｔｅｒ）などを有し、デコーダ５５の制御に従って、命令で指定された演算を実行する。
汎用レジスタセット群５７は、タスク０〜タスク５に対応する６つのレジスタセット（１レジスタセットは４本の３２ビットレジスタと４本の１６ビットレジスタ）を備えている。全部で２４本の３２ビットレジスタと２４本の１６ビットレジスタとを有し、実行中のタスクに対応するレジスタセットが使用される。これによりＩＯプロセッサ５は、タスクスイッチに際して現在の全レジスタデータをメモリに退避し、メモリから次のタスクのレジスタデータを復帰する処理から解放されている。
【００５３】
タスク管理部５８は、所定数の命令サイクル数毎に、命令読出回路５３のＰＣ及び汎用レジスタセット群５７のレジスタセットを切り替えることによりタスク切替えを行う。本実施例では上記所定数は４である。またＩＯプロセッサ５は１命令を１命令サイクルでパイプライン処理するので、タスク管理部５８は、上記オーバーヘッドを生じることなしに４命令毎にタスクを切り替えることになる。これにより非同期に発生する各種の入出力要求に対して応答遅延を抑えている。つまり入出力要求に対する応答遅延は、最大でもわずか２４命令サイクルしか生じない。
＜１．３．３．１．１命令読出回路＞
図１２は、命令読出回路５３の詳細な構成例を示すブロック図である。
【００５４】
同図において命令読出回路５３は、タスク別ＰＣ格納部５３ａ、現ＩＦＡＲ（ＩｎｓｔｒｕｃｔｉｏｎＦｅｔｃｈＡｄｄｒｅｓｓＲｅｇｉｓｔｅｒ）５３ｂ、インクリメンタ５３ｃ、次ＩＦＡＲ５３ｄ、セレクタ５３ｅ、セレクタ５３ｆ、ＤＥＣＡＲ（ＤＥＣｏｄｅＡｄｄｒｅｓｓＲｅｇｉｓｔｅｒ）５３ｇを備え、タスク切替えに際してオーバーヘッドなしに命令読み出しアドレスを切り替えるように構成されている。
【００５５】
タスク別ＰＣ格納部５３ａは、タスク０〜５に対応する６本のアドレスレジスタを有し、タスク毎にプログラムカウント値を保持する。各プログラムカウント値は、対応するタスクの再開アドレスである。タスク切替えに際して、タスク管理部５８及びデコーダ５５の制御の下で、次に実行すべきタスクに対応するアドレスレジスタからプログラムカウント値が読み出され、現に実行しているタスクに対応するアドレスレジスタのプログラムカウント値が新たな再開アドレスに更新される。このとき、次に実行すべきタスク、現タスクは、それぞれタスク管理部５８により”ｎｅｘｔｔａｓｋｉｄ（ｒｄａｄｄｒ）”信号（以下タスクＩＤとも呼ぶ）、”ｔａｓｋｉｄ（ｗｒａｄｄｒ）”信号により指定される。
【００５６】
タスク０、１、２に対応するプログラムカウント値を図１３のＰＣ０、１、２に示す。同図において、（０−０）はタスク０の命令０を、（１−４）はタスク１の命令４を表す。例えば、ＰＣ０は、タスク０の再開に際して読み出され（命令サイクルｔ０）、次のタスクへの切替に際して、命令（０−４）のアドレスに更新される（命令サイクルｔ４）。
【００５７】
インクリメンタ５３ｃ、次ＩＦＡＲ５３ｄ、セレクタ５３ｅからなるループ回路は、セレクタ５３ｅにより選択された命令読み出しアドレスを更新する回路である。セレクタ５３ｅから出力されるアドレスを図１３のＩＦ１に示す。同図において、例えばタスク０からタスク１への切替えに際して、セレクタ５３ｅは、サイクルｔ４においてタスク別ＰＣ格納部５３ａから読み出された命令（１−０）アドレスを選択し、サイクルｔ５〜ｔ７において次ＩＦＡＲ５３ｄからのインクリメントされた命令アドレスを選択する。
【００５８】
現ＩＦＡＲ５３ｂは、セレクタ５３ｅの選択出力ＩＦ１を１サイクル遅れて保持し、命令メモリ５２に命令読み出しアドレスとして出力する。言い換えれば、現在アクティブなタスクの命令読み出しアドレスを保持する。現ＩＦＡＲ５３ｂの命令読み出しアドレスを、図１３のＩＦ２に示す。同図に示すように、ＩＦ２は４命令サイクル毎に異なるタスクの命令アドレスを指している。
【００５９】
ＤＥＣＡＲ５３ｇは、命令レジスタ５４に保持されている命令のアドレスを保持する。つまり、デコード中の命令を指す。図１３中のＤＥＣに、ＤＥＣＡＲ５３ｇに保持されたアドレスを示す。また、図１３中のＥＸは、実行中の命令アドレスを示す。
セレクタ５３ｆは、分岐命令実行時や割込み発生時に分岐アドレスを選択し、それ以外は次ＩＦＡＲ５３ｄのアドレスを選択する。
【００６０】
このような命令読出回路５３を備えることにより、ＩＯプロセッサ５は、図１３に示すように４段（ＩＦ１、ＩＦ２、ＤＥＣ、ＥＸ）のパイプライン処理を行っている。このうちＩＦ１ステージは、複数プログラムカウント値の選択及び更新を行うステージである。ＩＦ２ステージは、命令を読み出すステージである。＜１．３．３．１．２タスク管理部＞
図１４は、タスク管理部５８の詳細な構成を示すブロック図である。同図においてタスク管理部５８は、タスクの切替えタイミングを管理するスロットマネージャと、タスクの順序を管理するスケジューラとに大別される。
【００６１】
スロットマネージャは、カウンタ５８ａ、ラッチ５８ｂ、比較器５８ｃ、ラッチユニット５８ｄを有し、４命令サイクル毎にタスク切替えを指示するタスク切替信号（ｃｈｇｔａｓｋｅｘ）を命令読出回路５３へ出力する。
具体的には、ラッチ５８ｂは、カウンタ５８ａの出力の下位２ビットを保持する２個のＦＦ（ＦｌｉｐＦｌｏｐ）回路である。カウンタ５８ａは、命令サイクルを示すクロック毎にラッチ５８ｂの２ビットの出力値を＋１インクリメントした３ビットを出力する。その結果、カウンタ５８ａは、１、２、３、４を繰り返し出力することになる。比較器５８ｃは、カウンタ５８ａの出力値が定数４と一致したときにタスク切替信号（ｃｈｇｔａｓｋｅｘ）を命令読出回路５３とスケジューラとに出力する。
【００６２】
スケジューラは、タスクラウンド管理部５８ｅ、プライオリティエンコーダ５８ｆ、ラッチ５８ｇを備え、タスク切替信号（ｃｈｇｔａｓｋｅｘ）が出力されるごとに、タスクｉｄを更新し、現在のタスクｉｄと次に実行すべきタスクｉｄとを命令読出回路５３に出力する。
具体的には、ラッチユニット５８ｄ、ラッチ５８ｇは、ともに現在のタスクｉｄをエンコードされた形式（３ビット）で保持する。エンコードされた形式は、その値がタスクｉｄを表す。
【００６３】
タスクラウンド管理部５８ｅは、タスク切替信号（ｃｈｇｔａｓｋｅｘ）が入力されたとき、ラッチユニット５８ｄを参照して、次に実行すべきタスクｉｄを、デコードされた形式（６ビット）で出力する。デコードされた形式（６ビット）は、１ビットが１タスクに対応し、ビット位置がタスクｉｄを表す。
プライオリティエンコーダ５８ｆは、タスクラウンド管理部５８ｅから出力されるタスクｉｄを、デコードされた形式からエンコードされた形式に変換する。上記ラッチユニット５８ｄ、ラッチ５８ｇは、ともにエンコードされたタスクｉｄを１サイクル遅れて保持する。
【００６４】
この構成により、タスクラウンド管理部５８ｅは、比較器５８ｃからタスク切替信号（ｃｈｇｔａｓｋｅｘ）が出力されたとき、プライオリティエンコーダ５８ｆから次に実行すべきタスクのｉｄを”ｎｅｘｔｔａｓｋｉｄ（ｒｄａｄｄｒ）”信号として、ラッチ５８ｅから現タスクｉｄを”ｔａｓｋｉｄ（ｗｒａｄｄｒ）”信号として出力する。
＜１．４動作説明＞
以上のように構成された第１の実施形態における映像音声処理装置１０００について、その動作を説明する。
【００６５】
入出力処理部１００１において、ストリーム入力部１から非同期に入力されるＭＰＥＧストリームは、入出力プロセッサ５の制御によって、バッファメモリ２、メモリコントローラ６を介して一旦外部メモリ３に格納され、さらに、メモリコントローラ６を介してＦＩＦＯメモリ４に保持される。このときＦＩＦＯメモリ４に対して、ＩＯプロセッサ５は、上記タスク２（ｂ）、タスク３を実行することによりその残量に応じて、圧縮動画データ、圧縮音声データを供給する。これにより、ＦＩＦＯメモリ４には過不足なく一定量の圧縮動画データ、圧縮音声データが供給されるので、デコード処理部１００２は、非同期の入出力とは切り離されて、デコード処理に専従することができる。ここまでの処理は、上記入出力処理部１００１により、デコード処理部１００２とは独立に並行してなされる。
【００６６】
一方、デコード処理部１００２において、ＦＩＦＯメモリ４に保持されたＭＰＥＧストリームデータは、以降プロセッサ７、コード変換部９、画素演算部１０、画素読み書き部１１により復号される。ＦＩＦＯメモリ４以降の復号動作を示す説明図を図１５に示す。
同図では、横軸を時間軸としておおよそ１マクロブロック分のヘッダ解析及び各ブロック毎のデコードの様子を示している。また縦方向はデコード処理部１００２の各部においてブロック毎のデコードがパイプライン的に実行される様子を示している。
【００６７】
同図に示すように、プロセッサ７は、圧縮映像データのヘッダ解析と、圧縮音声データに対するデコード処理とを時分割で繰り返す。すなわち、プロセッサ７は、１マクロブロック分のヘッダ解析を行い、解析結果をコード変換部９、画素演算部１０、画素読み書き部１１に通知した後、コード変換部９に対してマクロブロックのデコード開始を指示する。その後プロセッサ７は、コード変換部９からの割込み信号が通知されるまで、圧縮音声データのデコード処理を行う。デコード後の音声データは内部メモリ８に一旦保持され、さらにメモリコントローラ６により外部メモリ３にＤＭＡ転送される。
【００６８】
また、コード変換部９は、プロセッサ７からマクロブロックのデコード開始指示を受けて、マクロブロック内の各ブロック毎にバッファ２００に格納する。このときコード変換部９は、プロセッサ７のヘッダ解析時に通知されたブロックのスキャンタイプに応じてバッファ２００への書き込みアドレスの順番を変更する。つまりジグザグスキャンの場合と、オルタネートスキャンの場合とで書き込みアドレスの順番を変更する。これにより画素演算部１０は、読み出しアドレスの順番を変更しなくてもよく、スキャンタイプに拘らず常に同じに読み出しアドレスの順番にて読み出すことができる。コード変換部９は、マクロブロック内の６つのブロックをＶＬＤ処理をし終えるまで上記動作を繰り返してバッファ２００に書き出す。６ブロックのＶＬＤを終えるとプロセッサ７に割込みを発生する。この割込み信号は、マクロブロックデコード終了信号ＥｎｄＯｆＭａｃｒｏＢｌｏｃｋ（ＥＯＭＢ）である。コード変換部９は６つ目のブロックのブロック終了信号ＥｎｄＯｆＢｌｏｃｋ（ＥＯＢ）を検出することによりＥＯＭＢを生成している。
【００６９】
画素演算部１０は、コード変換部９と並行して、図９に示したようにバッファ２００に格納されたブロックデータをブロック単位にＩＱ、ＩＤＣＴを施し、その処理結果をバッファ２０１に格納する。
画素読み書き部１１は、画素演算部１０と並行して、バッファ２０１のブロックデータと、プロセッサ７によるヘッダ解析により通知された動きベクトルとに基づいて、図１５に示すように外部メモリ３の参照フレームからの矩形領域の切り出しと、ブロック合成とを行う。ブロック合成結果は、ＦＩＦＯメモリ４を介して外部メモリ３に格納される。
【００７０】
上記は、スキップマクロブロックではない場合の動作であるが、スキップマクロブロックの場合にはコード変換部９及び画素演算部１０は動作せず、画素読み書き部１１のみが動作する。スキップマクロブロックがある場合には、参照フレーム中の矩形領域と同じ画像なので、画素読み書き部１１により、その画像が復号画像として外部メモリ３にコピーされることになる。
【００７１】
この場合、コード変換部９からプロセッサ７への割込み信号は次のようにして生成される。すなわち、プロセッサ７が画素読み書き部１１に対して動き補償動作の開始の制御信号を送付したことを示す信号と、画素読み書き部１１が動き補償動作が可能であることを示す信号と、スキップマクロブロックであることを示す信号との論理積を取り、さらにこの論理積と上記のＥＯＭＢ信号との論理和として割込み信号がプロセッサ７に入力される。
【００７２】
以上説明してきたように本発明の第１実施形態の映像音声処理装置によれば、記憶媒体や通信媒体からのＭＰＥＧストリーム入力処理と、表示装置及び音声出力装置への表示画像データ及び音声データの出力処理と、デコード処理部１００２へストリームを供給する処理とを入出力処理部１００１が分担し、圧縮映像データ及び圧縮音声データのデコード処理をデコード処理部１００２が分担するように構成されている。これにより、デコード処理部１００２は、非同期に発生する処理から解放されてデコード処理に専従することができる。その結果、ＭＰＥＧストリーム入力、デコード、出力という一連の処理を効率良く実行するので、高速な動作クロックを用いなくてもＭＰＥＧストリームのフルデコード（フレーム落ちなし）を実現することができる。
【００７３】
また、本映像音声処理装置は、１チップにＬＳＩ化することが望ましい。この場合、１００ＭＨｚ以下の動作クロック（実際には５４ＭＨｚ）で上記フルデコードが可能である。この点、動作クロックが１００ＭＨｚさらには２００ＭＨｚを越える近年の高性能ＣＰＵは、画像サイズが小さければ上記フルデコードを可能にしているが、その反面製造コストが高価である。これに対して、本映像音声処理装置は、製造コストの点とフルデコードの点で優れている。
【００７４】
さらに、本映像音声処理装置のデコード処理部１００２は、次のように役割分担している。
つまり、プロセッサ７が圧縮映像データに対しても圧縮音声データに対しても多岐にわたる条件判断を必要とするヘッダ解析を担当するとともに音声圧縮データのデコードも担当する。圧縮映像データのブロックデータに対しては、定型的な大量の演算量が要求されるので、コード変換部９、画素演算部１０、画素読み書き部１１という専用のハードウェア（ファームウェア）が、デコード処理を担当する。図１５に示したようにコード変換部９、画素演算部１０、画素読み書き部１１は、パイプライン化されている。画素演算部１０は、ＩＱとＩＤＣＴとが並列処理が可能になっている。画素読み書き部１１は２ブロック単位の参照フレームのアクセスを実現している。これらにより圧縮音声デコード処理の効率化が達成されているので、映像デコード専用のハードウェア部分は高速クロックを用いなくとも、高い処理能力を得ることができる。具体的には１００ＭＨｚを越える高速クロックを用いずに５０〜６０ＭＨｚ程度のクロックで従来と同程度以上の処理能力が得られた。従って、高速素子を用いる必要がなく製造コストを押さえることができる。
【００７５】
また、映像デコードの基本単位をプロセッサ７においてマクロブロック単位、コード変換部９および画素演算部１０においてブロック、画素読み書き部１１において２ブロックとしているので、映像デコードにおける緩衝バッファの容量を最小限に抑えることが可能となる。
＜２第２の実施形態＞
本実施形態の映像音声処理装置は、圧縮ストリームデータのデコード機能に加えて、さらに、圧縮機能（以降、エンコード処理と呼ぶ）とグラフィックス機能を果たすように構成されている。
＜２．１映像音声処理装置の構成＞
図１６は、本発明の第２の実施形態における映像音声処理装置の構成を示すブロック図である。
【００７６】
この映像音声処理装置２０００は、ストリーム入出力部２１、バッファメモリ２２、ＦＩＦＯメモリ２４、入出力プロセッサ２５、メモリコントローラ２６、プロセッサ２７、内部メモリ２８、コード変換部２９、画素演算部３０、画素読み書き部３１、ビデオ出力部１２、音声出力部１３、バッファ２００、バッファ２０１とからなる。映像音声処理装置２０００は、図４に示した映像音声処理装置１０００の機能に加えて、次の機能が付加されている。すなわち、映像データと音声データの圧縮機能と、ポリゴンデータを描画するグラフィックス機能とが付加されている。
【００７７】
そのため、映像音声処理装置２０００において、図４と同名称の構成要素は全く同じ機能を有し、さらに、圧縮機能とグラフィックス機能を果たす機能が付加されている。以下図４と同じ点は説明を省略し、異なる点を中心に説明する。
ストリーム入出力部２１は、双方向になっている点が異なる。つまり、入出力プロセッサ２５の制御によりバッファメモリ２２からＭＰＥＧデータを転送されると、転送されたパラレルデータをシリアルデータに変換して、ＭＰＥＧデータストリームとして外部に出力する。
【００７８】
バッファメモリ２２、ＦＩＦＯメモリ２４も双方向になった点が異なる。
入出力プロセッサ２５は、第１実施形態に示した（１）〜（４）に示すの経路のデータ転送を制御することに加えて、（５）〜（８）の径路の転送をも制御する。
（１）ストリーム入出力部２１→バッファメモリ２２→メモリコントローラ２６→外部メモリ３
（２）外部メモリ３→メモリコントローラ２６→ＦＩＦＯメモリ２４
（３）外部メモリ３→メモリコントローラ２６→バッファメモリ２２→ビデオ出力部１２
（４）外部メモリ３→メモリコントローラ２６→バッファメモリ２２→音声出力部１３
（５）外部メモリ３→メモリコントローラ２６→内部メモリ２８
（６）外部メモリ３→メモリコントローラ２６→画素読み書き部３１
（７）ＦＩＦＯメモリ２４→メモリコントローラ２６→外部メモリ３
（８）外部メモリ３→メモリコントローラ２６→バッファメモリ２２→ストリーム入出力部２１
（５）（６）の径路は、映像データ、音声データのエンコード処理を行う場合の元のデータの径路であり、（７）（８）は、圧縮後のＭＰＥＧストリームの径路を示す。
【００７９】
まず、エンコード処理について説明する。エンコードすべきデータは外部メモリ３に格納されているものとする。外部メモリ３の映像データは、メモリコントローラ２６を画素読み書き部３１が制御することにより画素読み書き部３１に転送される。
画素読み書き部３１は映像データを第２のバッファ２０１に書き込む処理と差分画像生成処理を行なう。差分画像生成処理は、ブロック単位の動き検出（動きベクトルの算出）と差分画像の生成とからなる。そのため、画素読み書き部３１は、符号化対象ブロックと類似する矩形領域と参照フレーム内で探索することにより動きベクトルを検出する動き検出回路を内部に有している。なお動き検出回路の代わりに、隣接するフレームの既に計算済みのブロックの動きベクトルを利用して符号化対象の動きベクトルを見積もる動き見積回路を備えるようにしてもよい。
【００８０】
画素演算部２５は、ブロック単位に差分画像データを受け取り、ＤＣＴ、ＩＤＣＴ、量子化処理（以降、Ｑ処理）、ＩＱを行なう。こうして量子化された映像データはバッファ２００に格納される。
コード変換部２９は、バッファ２００から量子化データを受け取り可変長符号処理（ＶＬＣ）を行なう。可変長符号化されたデータは先入れ先出しメモリ２４に格納され、メモリコントローラ２６を通して外部メモリ３に格納されるとともに、プロセッサ２７によりマクロブロック毎にヘッダ情報が付加される。
【００８１】
また、外部メモリ３の映像データは、メモリコントローラ２６を介して内部メモリ２８に転送される。プロセッサ２７は、マクロブロック毎にヘッダ情報を付加する処理と時分割で、内部メモリ２８の音声データの圧縮処理を行う。
以上のように、エンコード処理は、第１の実施形態と逆の径路で処理されることになる。
【００８２】
次に、グラフィックス処理について説明する。グラフィックス処理は、ポリゴンと呼ばれる矩形型図形の組合せによって行なわれる三次元画像生成処理である。本装置においてはポリゴンの頂点座標における画素データからポリゴン内部の画素データを生成する処理を行う。
最初にポリゴンの頂点データは外部メモリ３に格納されている。
【００８３】
頂点データは、プロセッサ２７がメモリコントローラ２６を制御することにより内部メモリ２８に格納される。プロセッサ２７は内部メモリ２８より頂点データを読みだしＤＤＡ（ＤｉｇｉｔａｌＤｉｆｆｅｒｅｎｃｅＡｎａｌｙｚｅ）の前処理を行ないＦＩＦＯメモリ２４に書き込む。
コード変換部２９は、画素演算部３０の指示に従ってＦＩＦＯメモリ２４から頂点データを読みだし画素演算部３０に転送する。
【００８４】
画素演算部３０は、ＤＤＡ処理を行ない画素読み書き部３１に送信する。画素読み書き部３１は、プロセッサ２７の指示に従い、Ｚバッファ処理あるいはαブレンディング処理を行ないメモリコントローラ２６を介して外部メモリ３に画像データを書き出す。
＜２．１．１画素演算部＞
図１７は、画素演算部３０の構成を示すブロック図である。
【００８５】
同図は、図７に示した画素演算部１０と同じ構成要素には同じ番号を付与し、説明を省略し、以下異なる点を中心に説明する。
異なる点は、同図のように画素演算部３０は、図７に示した画素演算部１０に対して実行部が３面（５０１ａ〜５０１ｃ）になっている点と、命令ポインタ保持部３０８と命令レジスタ３０９と分配部３１０とが追加された点とである。
【００８６】
実行部５０１ａ〜５０１ｃが３面になっているのは、演算性能を向上させるためである。具体的には、グラフィックス処理においてはカラー画像ＲＧＢを独立に並列演算する。ＩＱおよびＱ処理では、乗算器５０２を３つ用いて高速化を図っている。ＩＤＣＴにおいては乗算器５０２および加減算器５０３を複数用いることによって時間短縮を図っている。ＩＤＣＴにおいてはバタフライ演算と呼ばれる演算が存在し、これは演算の元となる全てのデータ間で依存関係があるので、実行部５０１ａ〜５０１ｃのユニット間通信を行なうデータ線１０３を設けている。
【００８７】
第１命令メモリ５０６、第２命令メモリ５０７は、ＩＤＣＴ、ＩＱに加えてＤＣＴ、Ｑ処理、ＤＤＡ用のマイクロプログラムが格納されている。図１８に、第１命令メモリ５０６、第２命令メモリ５０７の記憶内容の一例を示す。図８に比べてＱ処理マイクロプログラムと、ＤＣＴマイクロプログラムと、ＤＤＡマイクロプログラムとが追加されている。
【００８８】
命令ポインタ保持部３０８ａ〜３０８ｃは、実行部５０１ａ〜５０１ｃに対応して設けられ、それぞれ第１プログラムカウンタから入力されるアドレスを変換して命令レジスタ部３０９に出力する変換テーブルを有する。変換後のアドレスは、命令レジスタ部３０９のレジスタ番号を意味する。さらに、命令ポインタ保持部３０８ａ〜３０８ｃは、それぞれ後述するモディファイフラグを保持し命令実行部５０１ａ〜５０１ｃに出力する。
【００８９】
変換テーブルについては命令ポインタ保持部３０８ａ、３０８ｂ、３０８ｃは、例えば入力アドレスが１，２，３，４，５，６，７，８，９，１０，１１，１２である場合に、それぞれ次のような変換後アドレスを出力する。
命令ポインタ保持部３０８ａ：１，２，３，４，５，６，７，８，９，１０，１１，１２
命令ポインタ保持部３０８ｂ：２，１，４，３，６，５，８，７，１０，９，１２，１１
命令ポインタ保持部３０８ｃ：４，３，２，１，８，７，６，５，１２，１１，１０，９
命令レジスタ部３０９は、図２３に示すように、マイクロ命令を保持する複数のレジスタ３つのセレクタと３つの出力ポートとからなる。３つのセレクタは、命令ポインタ部３０８ａ、３０８ｂ、３０８ｃから入力される変換アドレス（レジスタ番号）に指定されるレジスタのマイクロ命令を選択する。３つの出力ポートは、セレクタに対応して設けられ、それぞれセレクタに選択されたマイクロ命令を分配部３１０を介して実行部５０１ａ〜５０１ｃに出力する。３つのセレクタ及び出力ポートが設けられているのは、３つの加減算器５０３（又は３つの乗算器５０２）に同時に異なるマイクロ命令を供給するためである。本実施例では３つの出力ポートは、分配部３１０を介して３つの加減算器５０３と３つの乗算器５０２の何れかに選択的に供給するものとする。
【００９０】
例えば、命令レジスタ部３０９はレジスタＲ１〜Ｒ１６（レジスタ番号１〜１６）を備えている。レジスタＲ１〜Ｒ１６に格納されているマイクロプログラムは、ＤＣＴ及びＩＤＣＴにおいて必要な行列演算処理を表し、上記の３つのレジスタ番号順のいずれによっても同一処理を行うように格納されている。つまり、上記３つの実行順をもつマイクロプログラムは、実行順序が可換な一部のマイクロ命令の順序が入れ換えられている。これは、実行部５０１ａ〜５０１ｃが並列にマイクロプログラムを実行するので、実行部５０１ａ〜５０１ｃ間でレジスタ（図外）アクセスの競合など資源干渉を回避するためである。また、上記行列演算処理は、８×８行列の乗算、転置、転送をその内容とする。
【００９１】
次に、命令レジスタ部３０９の各レジスタに格納されるマイクロ命令はニーモニック形式では、
「ｏｐＲｉ，Ｒｊ，ｄｅｓｔ，（モディファイフラグ）」
と表記される。ただし命令レジスタ部３０９のマイクロ命令は、「ｏｐとＲｉ，Ｒｊと（モディファイフラグ）」の部分だけである。「ｄｅｓｔ」の部分は命令メモリ５０６、５０７から指定される。「（モディファイフラグ）」の部分命令ポインタ保持部３０８ａ〜３０８ｃから指定される。
【００９２】
ここで、”ｏｐ”は乗算命令、加減算命令、転送命令などを示すオペレーションコード、”Ｒｉ，Ｒｊ”はオペランドである。乗算命令は、３つの実行部５０１ａ〜ｃ中の各乗算器５０２に実行される命令であり、加算命令及び転送命令は、３つの実行部５０１ａ〜ｃ中の各乗算器５０２に実行される命令である。
”ｄｅｓｔ”は演算結果の格納先を示す。この”ｄｅｓｔ”は命令レジスタ部３０９のレジスタではなく、命令メモリ５０６（乗算命令の場合）又は命令メモリ５０７（加減算命令や転送命令の場合）から指定される。これは、命令レジスタ部３０９のマイクロプログラムを実行部５０１ａ〜５０１ｃに共通化するためである。もし転送先をレジスタにより指定すれば実行部５０１ａ〜５０１ｃそれぞれに個別のマイクロプログラムを用意する必要があり、マイクロプログラムの容量が数倍に膨らむことになる。
【００９３】
”モディファイフラグ”は、加減算命令において、加算であるか減算であるかを示すフラグである。この”モディファイフラグ”は、命令レジスタ部３０９のレジスタからではなく、命令ポインタ保持部３０８ａ〜ｃから別途指定される。これは、ＤＣＴ、ＩＤＣＴでの行列演算に用いられる定数行列中に全要素が”１”の行（又は列）と全要素が”−１”行（又は列）とが含まれるので、命令ポインタ３０８ａ〜ｃから”モディファイフラグ”を指定することにより、命令レジスタ部３０９の同一マイクロプログラムを共用することを可能にしている。
【００９４】
分配部３１０は、命令レジスタ部３０９から入力される３つのマイクロ命令が加減算命令である場合には、それらの「ｏｐとＲｉ，Ｒｊ」の部分と、命令メモリ５０６から入力される「ｄｅｓｔ」の部分と、命令ポインタ部３０８ａ〜ｃから入力される「モディファイフラグ」とを３つの加減算器５０３に分配し、同時に命令メモリ５０６のマイクロ命令を３つの乗算器５０２に分配する。また、分配部３１０は、命令レジスタ部３０９から入力される３つのマイクロ命令が乗算命令である場合には、それらの「ｏｐとＲｉ，Ｒｊ」の部分とを命令メモリ５０６から入力される「ｄｅｓｔ」の部分とを３つの乗算器５０２に分配し、、命令メモリ５０７のマイクロ命令を３つの加減算器５０３に分配する。言い換えれば、分配部３１０により、３つの加減算器５０３に供給されるマイクロ命令は、３つの加減算器５０３に共通する命令については命令メモリ５０７から１つのマイクロ命令がそれぞれに供給され、３つの加減算器５０３で異なる加減算命令については命令レジスタ部３０９からの３つのマイクロ命令がそれぞれに供給される。同様に、３つの乗算器５０２に供給されるマイクロ命令は、３つの乗算器５０２に共通する命令については命令メモリ５０６からマイクロ命令が供給され、３つの乗算器５０２で異なる乗算算命令については命令レジスタ部３０９からのマイクロ命令がそれぞれに供給される。
【００９５】
画素演算部３０のこのような構成によれば、命令メモリ５０６、命令メモリ５０７の記憶容量を削減することができる。
もし、画素演算部３０が命令ポインタ保持部３０８ａ〜ｃ、命令レジスタ部３０９、分配部３１０を備えていないと仮定すると、命令メモリ５０６、命令メモリ５０７はいずれも、３つの実行部５０１ａ〜ｃに対して異なるマイクロ命令を供給するには、３つのマイクロ命令を並列に記憶しなければならない。
【００９６】
図２２に命令ポインタ保持部３０８ａ〜ｃ、命令レジスタ部３０９、分配部３１０を備えていない場合の命令メモリ５０６及び命令メモリ５０７の記憶内容の一例を示す。同図では、１６ステップのマイクロプログラムが記憶され、１つのマイクロ命令は１６ビット長としている。この場合、命令メモリ５０６と命令メモリ５０７は、３つのマイクロ命令を並列に記録することから、合計１５３６ビット（１６ステップ×１６ビット×３×２）の記憶容量を必要とする。
【００９７】
これに対して、本実施例の画素演算部３０における、命令ポインタ保持部３０８ａ〜ｃ、命令レジスタ部３０９の記憶内容の一例を図２３に示す。同図でも１６ステップのマイクロプログラムが記憶され、１マイクロ命令は１６ビットとしている。同図において、命令ポインタ保持部３０８ａ〜ｃは、それぞれ１６個のレジスタ番号（４ビット長）を記憶し、命令レジスタ部３０９は１６個のマイクロ命令を記憶する。この場合、命令ポインタ保持部３０８ａ〜ｃと命令レジスタ部３０９との記憶容量は４４８ビット（１６ステップ×（１２＋１６））でよい。このように画素演算部３０では、マイクロプログラムの記憶容量を大幅に削減することができる。実際には、「ｄｅｓｔ」「モディファイフラグ」が別途発行されるようにしているので、その分の記録容量又は回路が必要である。また、命令メモリ５０６、５０７はマイクロ命令中の「ｄｅｓｔ」を指定し、また、実行部５０１ａ〜ｃに共通する乗算命令、加減算命令を発行するようにしているので、命令メモリ５０６、５０７を完全に削除することまではしていない。もし、命令レジスタ部３０９に６つの出力ポートを設ければ、命令メモリ５０６と命令メモリ５０７とを削除することも可能になる。
【００９８】
なお、図２３では、命令ポインタ保持部３０８ａ〜３０８ｃは、第１プログラムカウンタの値が０〜１５の場合に、変換アドレス（レジスタ番号）を出力しているが、これに限らない。例えば第１プログラムカウンタの値が３２〜４７の場合に変換アドレスを出力するようにしてもよい。この場合、第１プログラムカウンタの値に適切なオフセット値を加える構成とすればよい。これにより、第１プログラムカウンタが示す任意のアドレス列を変換アドレスに変換することができる。
【００９９】
以上の構成により、本実施形態では圧縮映像データと圧縮音声データのデコード処理だけでなく、映像および音声データのエンコード処理と、ポリゴンデータに基づくグラフィックス処理とが可能となっている。また、複数の実行部の並列動作により処理効率が向上している。しかも、命令レジスタ部３０８ａ〜３０８ｃにおいて一部のマイクロ命令の順序を入れ換えたことにより、複数の実行部間の資源干渉を回避することができるので、さらに処理効率を向上させている。
【０１００】
なお、上記実施形態では３つの実行部を有する構成を示しているのは、ＲＧＢカラーのそれぞれを独立に演算できる点で有利だからである。さらに実行部の数は、３つ以上あればいくつでもよい。
また、上記実施形態において映像音声処理装置１０００、２０００は、それぞれ１チップＬＳＩ化することが望ましい。さらに外部メモリ３は、チップ外部であるものとして説明したが、１チップ内に内臓する構成としてもよい。
【０１０１】
また、上記実施形態では外部メモリに対してストリーム入出力部１（あるいはストリーム入出力部２１）が、ＭＰＥＧストリーム（あるいは映像音声データ）を格納していたが、ホストプロセッサが直接外部メモリ３に格納するように構成してもよい。
さらに、上記実施形態においてＩＯプロセッサ５は、４命令サイクル毎にタスク切替えを行っているが、４命令サイクル以外の複数命令サイクル毎であってもよい。また、タスク切替えの命令サイクル数は、タスク毎に予め重み付けをして異なる命令サイクル数にしておいてもよい。また優先度・緊急度に応じてタスク毎の命令サイクル数に重み付けを行ってもよい。
【０１０２】
【発明の効果】
本発明の映像音声処理装置は、圧縮音声データと圧縮映像データとを含むデータストリームを外部から入力、デコードし、デコードしたデータを出力装置に出力する映像音声処理装置であって、外部要因により非同期に発生する入出力処理を行う入出力処理手段と、前記入出力処理と並行して、メモリに格納されたデータストリームのデコードを主とするデコード処理を行うデコード処理手段とを備え、前記デコード処理手段によりデコードされた映像データ、デコードされた音声データはメモリに格納され、前記入出力処理は、外部から非同期に入力される前記データストリームを入力し、さらにメモリに格納することと、メモリに格納されたデータストリームをデコード処理手段に供給することと、外部の表示装置、音声出力装置それぞれの出力レートに合わせてメモリから読み出し、それらに出力することとを入出力処理として行うように構成されている。
【０１０３】
この構成によれば、入出力処理手段とデコード処理手段とがパイプライン的に並列動作することに加えて、非同期処理とデコード処理とを入出力処理手段とデコード処理手段とに分担させるので、デコード処理手段は非同期に発生する処理から解放されてデコード処理に専従することができる。その結果、本映像音声処理装置は、ストリームデータ入力、デコード、出力という一連の処理を効率良く実行するので、ストリームデータのフルデコード（フレーム落ちなし）を高速な動作クロックを用いなくても可能にしている。
【０１０４】
また、前記デコード処理手段は、データストリームに対して、条件判断を主とする逐次処理であって、圧縮音声データ及び圧縮映像データのヘッダ解析と、圧縮音声データのデコードとを含む逐次処理を行なう逐次処理手段と、前記逐次処理と並行して、定型処理を行う。定型処理は、圧縮映像データのヘッダ解析を除く圧縮映像データのデコードである定型処理手段とを備える構成としてもよい。
【０１０５】
この構成によれば、処理特性の異なる逐次処理と並列処理に適した定型処理とを１つのユニットに併存させることを解消することにより、処理効率を大幅に向上させることができる。特に、定型処理手段の処理効率を向上させることができる。なぜなら本映像音声処理装置において、定型処理手段は上記の非同期処理及び逐次処理から解放されたことから、圧縮映像データのデコードに要求される定型的な種々演算のみに専従できるるからである。その結果、高速な動作クロックを用いなくても高い処理能力を得ることができる。
【０１０６】
さらに、前記入出力処理手段は、外部から非同期データストリームを入力する入力手段と、外部の表示装置にデコードされた映像データを出力する映像出力手段と、外部の音声出力装置にデコードされた音声データを出力する音声出力手段と、命令メモリに格納された第１から第４のタスクを切替えながら実行するプロセッサとを有し、前記第１タスクは入力部から前記メモリにデータストリームを転送するプログラムであり、前記第２タスクは前記メモリからデコード処理手段にデータストリームを供給するプログラムであり、前記第３タスクは前記メモリから映像出力部にデコードされた映像データを出力するプログラムであり、前記第４タスクは前記メモリから音声出力部にデコードされた音声データを出力するプログラムであると構成してもよい。
【０１０７】
ここで、前記プロセッサは、前記第１から第４タスクに対応する少なくとも４つのプログラムカウンタを有するプログラムカウンタ部と、１つのプログラムカウンタが指す命令アドレスを用いて、各タスクプログラムを記憶する命令メモリから命令を取り出す命令フェッチ部と、命令取出部に取出された命令を実行する命令実行部と、所定数の命令サイクルが経過する毎に、命令フェッチ部に対してプログラムカウンタを順次切替えるように制御するタスク制御部とを有する構成としてもよい。
【０１０８】
この構成によれば、外部装置により定まるストリームデータの入力レート及び入力周期、外部表示装置、外部音声出力装置により定まる映像データ、音声データそれぞれの出力レート及び出力周期がどのような範囲であっても、入出力要求に対する応答遅延が極めて小さいという効果がある。
また、本発明の映像音声処理装置は、圧縮音声データと圧縮映像データとを含むデータストリームを入力する入力手段と、データストリームに対して、条件判断を主とする逐次処理であって、データストリーム中の所定ブロック単位に付加されたヘッダ情報の解析と、データストリーム中の圧縮音声データの復号とを行なう逐次処理手段と、定型演算を主とする定型処理であって、ヘッダ解析の結果を用いてデータストリーム中の圧縮映像データを、前記逐次処理と並行して、所定ブロック単位に復号する定型処理手段とを備え、前記逐次処理手段は前記所定ブロックのヘッダ解析が終了したとき、定型処理手段に当該所定ブロックのデコード開始を指示し、定型処理手段から所定ブロックのデコード終了通知を受けたとき、次の所定ブロックのヘッダ解析を開始するように構成してもよい。
【０１０９】
この構成によれば、逐次処理手段が圧縮映像データに対しても圧縮音声データに対しても多岐にわたる条件判断を必要とするヘッダ解析を担当するとともに音声圧縮データのデコードも担当する。一方、定型処理手段は、圧縮映像データのブロックデータに対する、定型的な大量の演算量を担当する。このような役割分担により、また逐次処理手段は映像デコードに比較して演算量が少ない音声デコード全般と、圧縮映像データのヘッダ解析と、定型処理手段の制御とを行う。その制御の下で、定型処理手段は、専ら定型的な演算を行うので、無駄のない効率的な処理を実現できる。それゆえ高い周波数で動作させなくても処理能力を得ることができ、製造コストを低減させることができる。また、逐次処理手段は、音声デコード全般と、圧縮映像データのヘッダ解析と、定型処理手段の制御とを順次行うので、１プロセッサにて構成できる。
【０１１０】
また、前記定型処理手段は、逐次処理手段の指示に従ってデータストリーム中の圧縮映像データを可変長復号するデータ変換手段と、可変長復号により得られた映像ブロックに対して、所定の演算を施すことにより逆量子化および逆離散余弦変換を行う演算手段と、逆離散余弦変換後の映像ブロックと復号済みのブロックを合成することにより動き補償処理を行って映像データを復元する合成手段とを有し、
前記逐次処理手段は、データ変換手段により可変長復号されたヘッダ情報を取得する取得手段と、取得されたヘッダ情報を解析する解析手段と、解析結果として得られるパラメータを定型処理手段に通知する通知手段と、入力手段により入力されたデータストリーム中の圧縮音声データを復号する音声復号手段と、前記定型処理手段から所定ブロックのデコード完了を通知する割込み信号を受けたとき、音声復号手段の動作を停止するとともに取得手段を起動し、前記通知手段が前記通知をしたとき、前記データ変換手段に映像ブロックの可変長復号の開始を指示する制御手段とを有するように構成してもよい。
【０１１１】
この構成によれば、マクロブロックなど所定ブロック単位に逐次処理手段は、ヘッダ解析を行った後音声デコードを行い、定型処理手段により所定ブロックのデコードが完了したとき次のブロックのヘッダ解析を開始する。このように逐次処理手段は時分割でヘッダ解析と音声デコードとを繰り返すので１個のプロセッサにて低コストで実現することができる。また、定型処理手段は多岐にわたる条件判断処理をする必要がないので、低コストで専用ハードウェア（或はハードウェアとファームウェア）化することができる。
【０１１２】
ここで、前記演算手段は、さらに１ブロックに相当する記憶領域を有する第１バッファを有し、前記データ変換手段は、データストリーム中の圧縮映像データを可変長復号する可変長復号手段と、第１バッファの記憶領域のアドレスをジグザグスキャン順に並べた第１アドレス列を記憶する第１アドレステーブル手段と、第１バッファの記憶領域のアドレスをオルタネートスキャン順に並べた第２アドレス列を記憶する第２アドレステーブル手段と、第１アドレス列と第２アドレス列の一方に従って、可変長復号手段の可変長復号により得られるブロックデータを第１バッファに書き込む書き込み手段とを有する構成としてもよい。
【０１１３】
この構成によれば、書込み手段は、ジグザグスキャンとオルタネートスキャンのどちらにも対応して、第１バッファの記憶領域にブロックデータを書き込むことができる。従って演算手段は、第１バッファの記憶領域からブロックデータ読み出すときに、読み出しアドレスの順番を変更しなくてもよく、スキャンタイプに拘らず常に同じに読み出しアドレスの順番にて読み出すことができる。
【０１１４】
さらに、前記解析手段は、ヘッダ情報に基づいて量子化スケールと動きベクトルとを算出し、前記通知手段は、量子化スケールを演算手段に、動きベクトルを合成手段に通知するように構成してもよい。
この構成によれば、動きベクトルの算出を逐次処理手段に担当させることができ、合成手段は算出された動きベクトルを用いて定型的に動き補償処理を行うことができる。。
【０１１５】
また、前記演算手段は、それぞれマイクロプログラムを記憶する第１、第２の制御記憶部と、第１制御記憶部に第１読出アドレスを指定する第１プログラムカウンタと、第２読出アドレスを指定する第２プログラムカウンタと、第１読出アドレスと第２読出アドレスとの一方を選択して第２制御記憶部に出力するセレクタと、乗算器と加算器とを有し、第１、第２制御記憶部によるマイクロプログラム制御によりブロック単位の逆量子化と逆離散余弦変換とを実行する実行部と
を有する構成としてもよい。
【０１１６】
この構成によれば、マイクロプログラム（ファームウェア）は多岐にわたる条件判断処理を行う必要がなく、定型的な処理を実現するだけなのでプログラムサイズが小さくかつ作成が容易であり、低コスト化に適している。しかも、２つのプログラムカウンタを使用して乗算器と加算器とを独立して並列に動作させることができる。
【０１１７】
さらに、前記実行部は、セレクタにより第２読出アドレスが選択されたとき、乗算器を用いた処理と加算器を用いた処理とを独立並行して行い、セレクタにより第１読出アドレスが選択されたとき、乗算器を用いた処理と加算器を用いた処理とを連動させて行うよう構成してもよい。
この構成によれば、乗算器及び加算器の遊び時間を減らして処理効率を向上させることができる。
【０１１８】
ここで、前記演算手段は、さらに、データ変換手段からの映像ブロックを保持する第１バッファと、実行部により逆離散余弦変換されたブロックを保持する第２バッファとを有し、前記第１制御記憶部は、逆量子化処理するマイクロプログラムと、逆離散余弦変換するマイクロプログラムとを記憶し、前記第２制御記憶部は、逆離散余弦変換するマイクロプログラムと、逆離散余弦変換された映像ブロックを第２バッファに転送するマイクロプログラムとを記憶し、前記実行手段は、逆離散余弦変換された映像ブロックを第２バッファに転送する処理と、次の映像ブロックを逆量子化する処理とを並列に実行し、逆量子化された当該映像ブロックを逆離散余弦変換する処理を乗算器と加算器とを連動させて実行するように構成してもよい。
【０１１９】
この構成によれば、逆量子化処理と第２バッファへの転送処理とを並列実行するので処理効率を向上させることができる。
また、前記入力手段は、さらにポリゴンデータを入力し、前記逐次処理手段は、さらにポリゴンデータを解析してポリゴンの頂点座標とエッジの傾きとを算出し、前記定型処理手段は、さらに算出された頂点座標と傾きと従って、前記ポリゴンの画像データを生成するように構成してもよい。
【０１２０】
この構成によれば、逐次処理手段はポリゴンデータの解析を担当し、定型処理手段は定型的な画像データ生成処理を担当する。本映像音声処理装置は、効率よくポリゴンデータから画像データを生成するグラフィックス処理を行うことができる。
ここで、前記第１、第２制御記憶部は、さらにＤＤＡアルゴリズムによる走査変換を行うマイクロブログラムを記憶し、前記実行部は、さらに逐次処理手段により算出された頂点座標と傾きとに基づいてマイクロプログラム制御により走査変換を行うように構成してもよい。
【０１２１】
この構成によれば、画像データの生成は前記第１、第２制御記憶部に走査変換マイクロプログラムにより簡単に実現することができる。
また、前記合成手段はさらに圧縮すべき映像データから差分画像を表す差分ブロックを生成し、前記第２バッファはさらに生成された差分画像を保持し、第１制御記憶部はさらに離散余弦変換するマイクロプログラムと量子化処理するマイクロプログラムとを記憶し、第２制御記憶部はさらに離散余弦変換するマイクロプログラムと離散余弦変換された映像ブロックを第１バッファに転送するマイクロプログラムとを記憶し、前記実行手段はさらに第２バッファに保持された差分ブロックに対して離散余弦変換と量子化を実行して第１バッファに転送し、前記データ変換手段はさらに第１バッファのブロックに対して可変長符号化を行い、前記逐次処理手段はさらにデータ変換手段により可変長符号化された所定のブロックに対してヘッダ情報を付加するように構成してもよい。
【０１２２】
この構成によれば、定型処理手段は定型的な処理として量子化と離散余弦変換を担当し、逐次処理手段は条件判断を要する処理（ヘッダ情報の付加）を担当する。この場合、本映像音声処理装置は、高速クロックを用いなくても画像データから圧縮映像データへのエンコード処理を効率よく実行することができる。
また、前記演算手段は、それぞれマイクロプログラムを記憶する第１、第２の制御記憶部と、第１制御記憶部に第１読出アドレスを指定する第１プログラムカウンタと、第２読出アドレスを指定する第２プログラムカウンタと、第１読出アドレスと第２読出アドレスとの一方を選択して第２制御記憶部に出力するセレクタと、乗算器と加算器とをそれぞれ有し、第１、第２制御記憶部によるマイクロプログラム制御によりブロック単位の逆量子化と逆離散余弦変換とを実行する複数の実行部とを備え、各実行部は、ブロックを分割した部分ブロックを分担して処理するように構成してもよい。
【０１２３】
この構成によれば、複数の実行部が並列に演算命令を実行するので、定型的な大量の演算を画素レベルで並列化して効率よく実行することができる。
また、前記演算手段は、さらに、各実行部に対応して設けられ、各変換テーブルは所定のアドレス列に対応して部分的にアドレス順序を入れ換えた変換アドレス保持する複数のアドレス変換テーブルと、所定の演算を実現するマイクロプログラムを構成する個々のマイクロ命令を変換アドレスに対応させて記憶する複数レジスタからなる命令レジスタ群と、第１及び第２制御記憶部と複数の実行部との間に設けられ、第１制御記憶部又はセレクタから各実行部に出力されるマイクロ命令を、命令レジスタのマイクロ命令に切り替えて複数の実行部に出力する切り替え部とを備え、前記第１読出アドレス又は第２読出アドレスが前記所定のアドレス列の中のアドレスである場合、そのアドレスは前記各アドレス変換テーブルによって変換アドレスに変換される。前記命令レジスタ群は、変換テーブルから出力された各変換アドレスに対応するマイクロ命令を出力するように構成してもよい。
【０１２４】
この構成によれば、複数の実行部が並列にマイクロプログラムを実行する間、実行部間でアクセスの競合など資源干渉を回避して、さらに効率よく処理することができる。
ここで、前記各変換テーブルは、さらに第１プログラムカウンタが前記所定のアドレス列中の第１読出アドレスを出力する間、前記レジスタ中の加減算を示すマイクロ命令出力に伴って、加算すべきか減算すべきかを示すフラグを前記複数の実行部に出力し、前記各実行部は、前記フラグに従って加減算を実行し、前記フラグは、前記第２制御記憶部のマイクロ命令に従って設定されるように構成してもよい。
【０１２５】
この構成によれば、マイクロ命令により加算を行うか減算を行うかを変換テーブルが指定するので、同じマイクロプログラムを２通りに共用できるので、さらに、マイクロプログラムの全容量を低減させることができ、ハードウェア規模の低減、ひいては低コスト化を実現できる。
また、前記第２制御記憶部は、さらに第１プログラムカウンタが前記所定のアドレス列中の第１読出アドレスを出力する間、前記レジスタ中のマイクロ命令出力に伴って、マイクロ命令実行結果の格納先を示す情報を前記複数の実行部に出力し、前記各実行部は、格納先情報に従って実行結果を格納するように構成してもよい。
【０１２６】
この構成によれば、格納先情報は、命令レジスタ群中のマイクロプログラムと別個に指定できるので、当該マイクロプロラムを異なる処理例えば行列演算の部分的な処理において共用することができる。その結果、さらに、マイクロプログラムの全容量を低減させることができ、ハードウェア規模の低減、ひいては低コスト化を実現できる。
【図面の簡単な説明】
【図１】第１の従来技術における映像音声デコーダによるデコード処理の説明図を示す。
【図２】第２の従来技術における２チップ構成のデコーダによるデコード処理の説明図を示す。
【図３】本発明の第１の実施形態における画像処理装置の概略構成を示すブロック図である。
【図４】本発明の第１の実施形態における画像処理装置の構成を示すブロック図である。
【図５】ＭＰＥＧストリームを階層的に示すとともに画像処理装置各部の動作タイミングを示す図である。
【図６】プロセッサ７によるマクロブロックヘッダの解析と、他の各部への制御内容とを示す図である。
【図７】画素演算部１０の構成を示すブロック図である。
【図８】第１命令メモリ５０６及び第２命令メモリ５０７に記憶されたマイクロプログラムの一例を示す。
【図９】画素演算部１０の動作タイミングを示す図である。
【図１０】画素読み書き部１１の詳細な構成を示すブロック図である。
【図１１】ＩＯプロセッサ５の構成を示すブロック図である。
【図１２】命令読出回路５３の詳細な構成例を示すブロック図である。
【図１３】ＩＯプロセッサ５の動作タイミングを示すタイムチャートである。
【図１４】タスク管理部の構成を示すブロック図である。
【図１５】ＦＩＦＯメモリ４以降の復号動作を示す説明図である。
【図１６】本発明の第２の実施形態のおける画像処理装置の構成を示すブロック図である。
【図１７】画素演算部３０の構成を示すブロック図である。
【図１８】第１命令メモリ５０６、第２命令メモリ５０７の記憶内容の一例を示す。
【図１９】コード変換部９の構成を示すブロック図である。
【図２０】８×８個の空間周波数データを記憶するブロック記憶領域と、ジグザグスキャンの順路を示す。
【図２１】８×８個の空間周波数データを記憶するブロック記憶領域と、オルタネートスキャンの順路を示す。
【図２２】命令ポインタ保持部３０８ａ〜ｃ、命令レジスタ部３０９、分配部３１０を備えていない場合の命令メモリ５０６及び命令メモリ５０７の記憶内容の一例を示す。
【図２３】命令ポインタ保持部３０８ａ〜ｃ、命令レジスタ部３０９の記憶内容の一例を示す。
【符号の説明】
１ストリーム入力部
２バッファメモリ
３外部メモリ
４ＦＩＦＯメモリ
５入出力プロセッサ
５ａＤＭＡＣ
６メモリコントローラ
７プロセッサ
８内部メモリ
９コード変換部
１０画素演算部
１２ビデオ出力部
１３音声出力部
１４ホストＩ／Ｆ部
１０００映像音声処理装置
１００１入出力処理部
１００２デコード処理部
１００３逐次処理部
１００４定型処理部[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention belongs to the technical field of digital signal processing, and relates to an image processing apparatus that performs expansion of compressed video and audio data, compression of video and audio data, and graphics processing.
[0002]
[Prior art]
In recent years, combined with the establishment of digital video data compression / decompression technology and the improvement of LSI technology, a decoder for expanding compressed video and audio data, an encoder for compressing video and audio data, and a graphic Various types of video / audio processing devices such as graphics processing for performing image processing are regarded as important.
[0003]
As a first prior art, there is a video / audio decoder (JP-A-8-116429) for expanding compressed video and audio data of the Moving Picture Experts Group (MPEG) standard. This video / audio decoder performs both video decoding and audio decoding using one signal processing unit.
FIG. 1 is an explanatory diagram of a decoding process by the video / audio decoder. In the figure, the vertical axis represents time, and the horizontal axis represents the amount of calculation.
[0004]
When viewed along the vertical axis, video decoding and audio decoding are alternately performed. This is for decoding both video and audio with common hardware. As shown in the figure, video decoding is divided into sequential processing and block processing. The sequential processing is processing that requires a wide variety of condition determinations, such as decoding other than blocks, that is, analysis of the header of an MPEG stream, and the amount of calculation is small. The block decoding is a process of decoding a variable-length code of an MPEG stream and performing inverse quantization and inverse DCT (discrete cosine transform) on a block-by-block basis, and the amount of calculation is large. As shown in the figure, audio decoding is also divided into sequential processing similar to the above, which requires a wide variety of condition judgments, and decoding processing of the audio data itself. The decoding process of the audio data itself requires higher accuracy than the image data and must be processed within a limited time. Therefore, it is necessary to perform the processing with high accuracy and high speed, and the amount of calculation is large.
[0005]
As described above, the first conventional technology enables one-chip integration and realizes efficient audio-video decoding with hardware as small as one chip.
As a second conventional technique, there is a decoder having a two-chip configuration. One chip is used as a video decoder, and the other chip is used as an audio decoder. FIG. 2 is an explanatory diagram of a decoding process by a two-chip decoder. Both the video decoder and the audio decoder perform sequential processing including a large number of condition determinations such as header analysis, and block decoding processing mainly for decoding the data body. Since both the video decoder and the audio decoder process independently, the performance of each chip may be lower than in the first prior art.
[0006]
[Problems to be solved by the invention]
However, according to the above prior art, there are the following problems.
According to the first prior art, the signal processing unit must decode both video and audio, so that high processing performance is required. That is, it is necessary to operate using a high-speed clock of 100 MHz or more, and there is a problem that the cost is high as a semiconductor for consumer use. In order to increase the processing capacity without using a high-speed clock, it is conceivable to use a VLIW (Very Long Instruction Word) processor or the like. However, the cost of the VLIW processor itself is high and separate sequential processing is performed. Unless a processor is used, there is a problem that the entire processing becomes inefficient.
[0007]
According to the second conventional technique, there is a problem that the cost is high because two processors are used. In other words, neither a video processor nor an audio processor can use a general-purpose inexpensive processor with low processing capability. This is because a video processor is required to be capable of processing a large amount of image data in real time. Also, although the processor for audio does not require as much computational complexity as the processor for video, the audio data is required to have higher accuracy than the image data. Therefore, an inexpensive or low-performance processor does not satisfy the required processing capability for both video and audio.
[0008]
Further, when the video / audio processing device is used in an AV decoder used in a digital (satellite) broadcast tuner (called STB (Set Top Box)) or a DVD (Digital Versatile / Video Disc) reproducing device, An MPEG stream received from a broadcast wave or read from a disc is input, the MPEG stream is decoded, and a series of video signals and audio signals are finally output to a display, a speaker, and the like. Would be enormous. Recently, there has been an increasing demand for a video and audio processing device that efficiently executes such a series of enormous processes.
[0009]
The present invention performs a series of processing of inputting, decoding, and outputting stream data representing compressed image and compressed audio data, has high processing capability without operating at high frequency, and can reduce manufacturing cost. It is an object to provide a video and audio processing device.
It is another object of the present invention to provide a video / audio processing apparatus which realizes decoding of compressed video data, encoding of video data, and graphics processing at low cost.
[0010]
[Means for Solving the Problems]
In order to solve the above problems, the video and audio processing apparatus of the present invention is an apparatus that externally inputs and decodes a data stream including compressed audio data and compressed video data, and outputs the decoded data to an output device. Input / output processing means for performing input / output processing asynchronously generated by an external factor; and decoding processing means for performing decoding processing mainly for decoding a data stream stored in a memory in parallel with the input / output processing. The video data decoded by the decoding processing means and the decoded audio data are stored in a memory, and the input / output processing receives the data stream asynchronously input from the outside, and further stores the data stream in a memory. Supplying the data stream stored in the memory to the decoding processing means, and providing an external display device and audio output. Device read from the memory in accordance with the respective output rates, and is configured to perform and outputting them as input and output processing.
[0011]
According to this configuration, in addition to the input / output processing means and the decoding processing means operating in parallel in a pipeline, the asynchronous processing and the decoding processing are shared between the input / output processing means and the decoding processing means. The processing means can be released from the processing that occurs asynchronously and can exclusively use the decoding processing. As a result, the video and audio processing apparatus efficiently executes a series of processing of stream data input, decode, and output, thereby enabling full decoding of stream data (without dropping frames) without using a high-speed operation clock. ing.
[0012]
BEST MODE FOR CARRYING OUT THE INVENTION
Embodiments of the video and audio processing apparatus of the present invention will be described in the following sections.
1. First Embodiment
1.1 Schematic configuration of video and audio processing device
1.1.1 Input / output processing unit
1.1.2 Decoding processing unit
1.1.2.1 sequential processing unit
1.1.2.2 Standard processing unit
1.2 Configuration of video and audio processing device
1.2.1 Configuration of input / output processing unit
1.2.2 Decoding processing unit
1.2.2.1 sequential processing unit
1.2.2.2 Standard processing unit
1.3 Detailed configuration of each part
1.3.1 Processor 7 (sequential processing unit)
1.3.2 Standard processing unit
1.3.2.1 Code conversion unit
1.3.2.2 Pixel operation unit
1.3.2.3 Pixel read / write unit
1.3.3 Input / output processing unit
1.3.3.1 IO processor
1.3.3.1.1 Instruction reading circuit
1.3.2.1.2 Task management unit
1.4 Description of operation
2 Second embodiment
2.1 Configuration of video and audio processing device
2.1.1 Pixel operation unit
<1. First Embodiment>
The video / audio processing device according to the present embodiment is provided in a satellite broadcast receiving device (STB: Set Top Box), a DVD (Digital Versatile Disc) reproducing device, a DVD-RAM recording / reproducing device, and the like, and serves as compressed video / audio data. An MPEG stream from a satellite broadcast or a DVD is input, decompressed (hereinafter, simply referred to as decoding), and a video signal and an audio signal are output to an external output device.
<1.1 Schematic Configuration of Video / Audio Processing Device>
FIG. 3 is a block diagram illustrating a schematic configuration of the video and audio processing device according to the first embodiment of the present invention.
[0013]
The video / audio processing apparatus 1000 includes an input / output processing unit 1001, a decoding processing unit 1002, and a memory controller 6, and is configured to perform input / output processing and decoding processing separately and in parallel. The external memory 3 is used as a working memory for temporarily storing an MPEG stream and audio data after decoding, and a frame memory for storing video data after decoding.
<1.1.1 Input / output processing unit>
The input / output processing unit 1001 performs input / output processing that occurs asynchronously with the internal operation of the video / audio processing apparatus 1000. This input / output processing includes (a) inputting an MPEG stream that is asynchronously input from the outside and temporarily storing it in the external memory 3, and (b) decoding the MPEG stream stored in the external memory 3 into a decoding processing unit 1002. And (c) reading the decoded video data and audio data from the external memory 3 and outputting them in accordance with the output rates of the external display device and audio output device (not shown).
<1.1.2 Decoding processing unit>
The decoding processing unit 1002 decodes the MPEG stream supplied by the input / output processing unit 1001 and stores the decoded video data and audio data in the external memory 3 in parallel with the operation of the input / output processing unit 1001. I do. The decoding processing of the MPEG stream requires a large amount of calculation and a wide variety of processing contents. It is configured to execute a large amount of operations mainly and a fixed process suitable for parallel operations separately and in parallel. Here, the sequential processing is, for example, header analysis of an MPEG stream, and includes a number of conditions such as header detection and header content determination. In addition, since the routine processing needs to perform various operations on a block unit including a predetermined number of pixels, it is suitable for parallel processing in a pipeline manner, and performs exactly the same operation on different data (pixels). Suitable for parallel processing such as vector operation.
<1.1.2.1 Sequential processing unit>
The sequential processing unit 1003 performs the header analysis of the compressed audio data and the compressed video data supplied from the input / output processing unit 1001, the control of activating the standard processing unit 1004 for each macroblock, and the decoding processing of the compressed audio data. This is performed as a sequential process. The header analysis includes analysis of a macroblock header in an MPEG stream and decoding of a motion vector. Here, the block represents an image composed of 8 * 8 pixels. The macro block includes four luminance blocks and two color difference blocks. The motion vector is a vector indicating a rectangular area of 8 * 8 pixels in the reference frame, and indicates a difference between the block and the rectangular area in the reference frame.
<1.1.2.2 Standard processing unit>
The routine processing unit 1004 receives the decoding start instruction for each macroblock from the sequential processing unit 1003, and performs the macroblock decoding processing as the above-described routine processing in parallel with the audio decoding processing of the sequential processing unit 1003. This decoding process includes decoding of a variable length code (VLD: Variable Length Code Decoding), inverse quantization (IQ: Inverse Quantization), inverse discrete cosine transform (IDCT: Inverse Discrete Cosine Transform), and motion compensation (MC: Motion Conversion). Are performed in the same order. In the motion compensation, the routine processing unit 1004 stores the decoded block in the external memory 3 as a frame memory via the memory controller 6.
<1.2 Configuration of Video / Audio Processing Device>
FIG. 4 is a block diagram showing a more detailed configuration of the video and audio processing device 1000.
<1.2.1 Configuration of input / output processing unit>
In the figure, an input / output processing unit 1001 includes a stream input unit 1, a buffer memory 2, an input / output processor 5 (hereinafter abbreviated as an IO processor 5), a DMAC (Direct Memory Access Controller) 5a, a video output unit 12, and an audio output unit 13. , A host I / F unit 14.
[0014]
The stream input unit 1 converts an externally serially input MPEG data stream into parallel data (hereinafter, referred to as MPEG data). At this time, the stream input unit 1 detects a start code of a GOP (Group of Picture: an MPEG data stream including one I picture and corresponding to a moving image for about 0.5 seconds) from the MPEG data stream, and to that effect. Is notified to the IO processor 5. According to this notification, the converted MPEG data is transferred to the buffer memory 2 under the control of the IO processor 5.
[0015]
The buffer memory 2 is a buffer memory that temporarily holds the MPEG data transferred from the stream input unit 1. The MPEG data held in the buffer memory 2 is further transferred to the external memory 3 via the memory controller 6 under the control of the input / output processor 5.
The external memory 3 includes an SDRAM (Synchronous Dynamic Random Access Memory) chip, and temporarily holds MPEG data transferred from the buffer memory 2 via the memory controller 6. Further, the external memory 3 also holds decoded video data (hereinafter also referred to as frame data) and decoded audio data.
[0016]
The input / output processor 5 controls data input / output between the stream input unit 1, the buffer memory 2, the external memory 3 (with the memory controller 6 interposed), and the FIFO memory 4. That is, it controls the data transfer (DMA transfer) of the paths shown in the following (1) to (4).
(1) Stream input unit 1 → buffer memory 2 → memory controller 6 → external memory 3
(2) External memory 3 → memory controller 6 → FIFO memory 4
(3) External memory 3 → memory controller 6 → buffer memory 2 → video output unit 12
(4) External memory 3 → memory controller 6 → buffer memory 2 → audio output unit 13
In these paths, the input / output processor 5 controls the transfer of video data and audio data in the MPEG data independently. (1) and (2) are transfer paths of MPEG data before decoding. In the transfer paths (1) and (2), the input / output processor 5 transfers the compressed video data and the compressed audio data separately. (3) and (4) are transfer paths of the decoded video and audio data, respectively. The decoded video and audio data are transferred according to the respective output rates of an external display device (not shown) and an audio output device (not shown).
[0017]
The DMAC 5a includes a DMA transfer between the stream input unit 1, the video output unit 12, the audio output unit 13 and the buffer memory 2, a DMA transfer between the buffer memory 2 and the external memory 3, and a DMA transfer between the external memory 3 and the FIFO memory 4. DMA transfer is executed under the control of the IO processor 5.
The video output unit 12 issues a data request to the input / output processor 5 in accordance with the output rate (for example, the cycle of the horizontal synchronizing signal Hsync) of an external display device (CRT or the like), and the input / output processor 5 transfers the above (3). The video data input through the path is output to the display device.
[0018]
The audio output unit 13 issues a data request to the input / output processor 5 in accordance with the output rate of the external audio output device, and converts the audio data input by the input / output processor 5 through the transfer path (4) into the audio output device ( D / A converter, audio amplifier, combination of speakers, etc.).
The host I / F unit 14 is an interface for performing communication with an external host processor, for example, a processor that performs overall control of a DVD playback device. In this communication, instructions such as decoding start, stop, fast forward playback, and reverse playback of the MPEG stream are sent from the host processor.
<1.2.2 Decoding processing unit>
4 includes a FIFO memory 4, a sequential processing unit 1003, and a standard processing unit 1004, and decodes MPEG data supplied from the input / output processing unit 1001 via the FIFO memory 4. Further, the sequential processing unit 1003 includes a processor 7 and an internal memory 8. The routine processing unit 1004 includes a code conversion unit 9, a pixel operation unit 10, a pixel read / write unit 11, a buffer 200, and a buffer 201.
[0019]
The FIFO memory 4 includes two FIFOs (hereinafter, referred to as a video FIFO and an audio FIFO), and stores compressed video data and compressed audio data transferred from the external memory 3 under the control of the input / output processor 5 in a first-in first-out manner. I do.
<1.2.2.1 Sequential processing unit>
The processor 7 controls reading of the compressed video data and the compressed audio data from the FIFO memory 4 and performs a partial decoding process on the compressed video data and a full decoding process on the compressed audio data. The decoding of a part of the compressed video data includes analysis of header information in the MPEG data, calculation of a motion vector, and control of the compressed video decoding process. This is because the entire decoding process of the compressed video data is shared between the processor 7 and the standard processing unit 1004. That is, the processor 7 shares sequential processing that requires a wide variety of condition judgments, and the routine processing unit 1004 shares a large amount of routine arithmetic processing. On the other hand, the processor 7 is in charge of the audio decoding since the amount of calculation is smaller than that of the video decoding.
[0020]
The function of the processor 7 will be specifically described with reference to FIG. FIG. 5 shows the operation timing of each section of the video and audio processing apparatus together with the MPEG stream in a hierarchical manner. In the figure, the horizontal axis is the time axis. The first layer shows the flow of the MPEG stream. An MPEG stream for one second like the second layer includes a plurality of frames (I, P, and B pictures). As in the third layer, one frame includes a picture header and a plurality of slices. As in the fourth layer, one slice includes a slice header and a plurality of macroblocks. As in the fifth layer, one macroblock includes a macroblock header and six blocks.
[0021]
The data structure of the first to fifth layers shown in the figure is described in detail in a known document, for example, ASCII "Point Illustrated Latest MPEG Textbook".
The processor 7 analyzes the header up to the macroblock layer in the MPEG stream and decodes the compressed audio data as shown in the fifth and lower layers in FIG. At this time, the processor 7 instructs the code conversion unit 9, the pixel operation unit 10, and the pixel read / write unit 11 to start decoding the macro block according to the header analysis result in the macro block unit, and the code conversion unit 9, the pixel operation While the macroblock is being decoded by the unit 10 and the pixel read / write unit 11, the compressed audio data is read from the FIFO memory 4 and decoded. When the decoding of the macroblock is completed by the code conversion unit 9, the pixel operation unit 10, and the pixel read / write unit 11, the processor 7 receives the notification to that effect by an interrupt signal, suspends the decoding of the compressed audio data, and Start parsing the macroblock header.
[0022]
The internal memory 8 is a work memory of the processor 7, and temporarily stores the decoded audio data. The held audio data is transferred to the external memory 3 by the input / output processor 5 through the path (4).
<1.2.2.2 Routine processing unit>
The code converter 9 performs variable length decoding (VLD) on the compressed video data read from the FIFO memory 4. As shown in FIG. 5, the code conversion unit 9 transfers the header information and the information on the motion vector (broken line section in the figure) among the decoded data to the processor 7, and the macro block (the luminance blocks Y0 to Y3 and Data (six blocks composed of the color difference blocks Cb and Cr) (solid line sections in the drawing) are transferred to the pixel operation unit 10 via the buffer 200. The macroblock data decoded by the code conversion unit 9 is data representing a spatial frequency component.
[0023]
The buffer 200 holds data representing a spatial frequency component for one block (8 × 8 pixels) written by the code conversion unit 9.
The pixel operation unit 10 performs an inverse quantization process (IQ) and an inverse discrete cosine transform (IDCT) on the block data transferred from the code conversion unit 9 via the buffer 200 in block units. The processing result by the pixel operation unit 10 is data representing a luminance value of a pixel or its difference in the case of a luminance block, and data representing a color difference or its difference in the case of a chrominance block. The data is transferred to the read / write unit 11.
[0024]
The buffer 201 holds one block (8 × 8 pixels) of pixel data.
The pixel read / write unit 11 performs motion compensation on the processing result of the pixel operation unit 10 in block units. That is, for the P picture and the B picture, a rectangular area indicated by the motion vector is cut out from the decoded reference frame in the external memory 3 via the memory controller 6 and synthesized with the block of the processing result of the pixel operation unit 10. To decode the original block image. The decoding result by the pixel read / write unit 11 is stored in the external memory 3 via the memory controller 6.
[0025]
Since the contents of the motion compensation, IQ, and IDCT are well-known technologies, detailed descriptions thereof are omitted (see the above-mentioned document).
<1.3 Detailed configuration of each part>
Next, a detailed configuration of main components of the video and audio processing device 1000 will be described.
<1.3.1 Processor 7 (sequential processing unit)>
FIG. 6 is a diagram showing the analysis of the macroblock header by the processor 7 and the contents of control to other units. First, the respective data in the macroblock header indicated by the abbreviations in the figure are described in the above-mentioned documents and the like, and thus the description is omitted here.
[0026]
As shown in the figure, the processor 7 issues a command to the code conversion unit 9 to sequentially obtain data of the header portion subjected to variable length decoding, and according to the contents thereof, the code conversion unit 9, the pixel operation unit 10, the pixel read / write unit 11 , The data necessary for decoding the macroblock is set.
Specifically, first, the processor 7 issues a command for obtaining a macro block address increment (MBAI) to the code conversion unit 9 (S101), and obtains the MBAI from the code conversion unit 9. If the macroblock data is a skipped macroblock based on the MBAI (if the macroblock to be decoded is the same as the previous macroblock), the macroblock data has been omitted, so the process proceeds to S117 and must be a skipped macroblock. If so, the header analysis is continued (S102, 103).
[0027]
Next, the processor 7 issues a command for obtaining an MBT (Macro Block Type), and obtains the MBT from the code conversion unit 9. From the MBT, it is determined whether the scan type of the block is a zigzag scan or an alternate scan, and the reading order of the buffer 200 is instructed to the pixel operation unit 10 (S104).
Further, the processor 7 determines whether or not an STWC (Spatial Temporal Weight Code) exists from the already acquired header data (S105), and issues and acquires a command if it exists (S106).
[0028]
In the same manner, the processor 7 obtains FrMT (Frame Motion Type), FiMT (Field Motion Type), DT (DCT type), QSC (Quantizer Scale Code), MV (Motion Vector), and CB (CBP) to obtain BP (Motion Vector). S107-116). At that time, the processor 7 notifies the pixel read / write unit 11 of the analysis results of FrMT, FiMT, and DT, notifies the pixel operation unit 10 of the analysis result of QSC, and notifies the code conversion unit 9 of the analysis result of CBP. As a result, information necessary for IQ, IDCT, and motion compensation is set in the code conversion unit 9, the pixel operation unit 10, and the pixel read / write unit 11.
[0029]
Further, the two-processor configuration has a redundant configuration because each processor individually performs the above-described sequential processing that requires various condition judgments.
Next, the processor 7 issues a macroblock decoding start instruction to the code converter 9 (S117). Accordingly, the code conversion unit 9 starts VLD for each block in the macroblock, and outputs the result of VLD to the pixel operation unit 10 via the buffer 200. Further, the processor 7 calculates a motion vector based on the MV data (S118), and notifies the pixel read / write unit 11 of the calculation result (S119).
[0030]
In the above processing, regarding the motion vector, a series of processing of obtaining the motion vector data (MV) (S113), calculating the motion vector (S118), and setting the motion vector in the pixel read / write unit 11 (S119) is necessary. It is. In this regard, the processor 7 does not calculate and set the motion vector (S118, 119) immediately after obtaining the motion vector data (MV) (S113, 119), but issues a decode start instruction to the The vector is calculated and set. Thereby, the motion vector calculation and setting process of the processor 7 and the decoding process to the routine processing unit 1004 are processed in parallel. That is, the decoding start timing of the routine processing unit 1004 is advanced.
[0031]
Since the header analysis of the compressed video data for one macroblock is completed as described above, the processor 7 acquires the compressed audio data from the FIFO memory 4 and starts the audio decoding process (S120). The audio decoding process is continued until an interrupt signal indicating that macroblock decoding has been completed is input from the code conversion unit 9. In response to this interrupt signal, the processor 7 starts the header analysis for the next macro block.
<1.3.2 Standard processing unit>
Next, the routine processing unit 1004 performs a decoding process on the six blocks in the macro block by operating the code conversion unit 9, the pixel operation unit 10, and the pixel read / write unit 11 in parallel (in a pipeline). I have. Here, the configurations of the pixel operation unit 10, the pixel read / write unit 11, and the code conversion unit 9 will be described in detail in this order.
<1.3.2.1 Code conversion unit 9>
FIG. 19 is a block diagram showing a configuration of the code conversion unit 9.
[0032]
The code conversion unit 9 shown in the figure includes a VLD unit 901, a counter 902, an incrementer 903, a selector 904, a scan table 905, a scan table 906, a flip-flop (hereinafter abbreviated as FF) 907, and a selector 908, and The (VLD) result is written in the buffer 200 so as to be arranged in a block unit in the order of zigzag scan or alternate scan.
[0033]
The VLD unit 901 performs variable-length decoding (VLD) on the compressed video data read from the FIFO memory 4 and, among the decoded data, header information and motion vector information (broken line sections in FIG. 5). And outputs macroblock data (six blocks consisting of luminance blocks Y0 to Y3 and color difference blocks Cb and Cr) (solid line sections in FIG. 5) to the buffer 200 in block (64 spatial frequency data) units. I do.
[0034]
The circuit portion including the counter 902, the incrementer 903, and the selector 904 repeatedly counts from 0 to 63 in synchronization with the output of the spatial frequency data from the VLD section 901.
The scan table 905 is a table that stores the addresses of the block storage areas of the buffer 200 in the order of zigzag scan. The output values (0 to 63) of the counter 902 are sequentially input, and the addresses are sequentially output. FIG. 20 shows a block storage area for storing 8 × 8 pieces of spatial frequency data in the buffer 200 and a zigzag scan route. The scan table 905 sequentially outputs the pixel addresses in the route shown in FIG.
[0035]
The scan table 906 is a table that stores the addresses of the block storage areas of the buffer 200 in the order of the alternate scan. The output values (0 to 63) of the counter 902 are sequentially input, and the addresses are sequentially output. FIG. 21 shows a block storage area in the buffer 200 for storing 8 × 8 pieces of spatial frequency data and a route of the alternate scan. The scan table 905 sequentially outputs the pixel addresses in the route shown in FIG.
[0036]
The FF 907 holds a flag indicating a scan type (zigzag scan or alternate scan). This flag is set by the processor 7.
The selector 908 selects an address output from the scan table 905 and the scan table 906 according to the flag of the FF 907, and outputs the address to the buffer 200 as a write address.
<1.3.2.2 Pixel operation unit>
FIG. 7 is a block diagram illustrating a configuration of the pixel operation unit 10.
[0037]
As shown in the drawing, the pixel operation unit 10 includes an execution unit 501 including a multiplier 502 and an adder / subtractor 503, a first program counter (hereinafter abbreviated as a first PC) 504, and a second program counter (hereinafter a second PC). 505, a first instruction memory 506, a second instruction memory 507, and a selector 508, and are configured so that IQ and a part of the IDCT can be overlapped and executed in parallel. .
[0038]
The execution unit 501 accesses and operates the buffers 200 and 201 according to micro instructions sequentially output from the first instruction memory 506 and the second instruction memory 507.
The first instruction memory 506 and the second instruction memory 507 are control memories for storing microprograms for realizing IQ and IDCT for blocks (frequency components) held in the buffer 200. FIG. 8 shows an example of the microprogram stored in the first instruction memory 506 and the second instruction memory 507.
[0039]
In the figure, a first instruction memory 506 stores an IDCT1A microprogram and an IQ microprogram, and the first PC 504 specifies a read address. The IQ microprogram is an arithmetic processing mainly including reading of the buffer 200 and multiplication, and does not use the adder / subtractor 503.
The second instruction memory 507 stores the IDCT1B microprogram and the IDCT2 microprogram, and the read address is specified by the first PC 504 or the second PC 505 via the selector 508. Here, IDCT1 means the first half of processing of the IDCT mainly including multiplication and addition / subtraction, and is executed using the entire execution unit 501 by reading the IDCT1A microprogram and the IDCT1B microprogram at the same time. Also, IDCT2 means a process of the latter half of the IDCT mainly including addition and subtraction and a process of writing to the buffer 201, and is executed using the adder / subtractor 503 by reading the IDCT2 microprogram in the second instruction memory 507. .
[0040]
Since IQ is processed by the multiplier 502 and IDCT2 is processed by the adder / subtractor 503, they can be operated in parallel. FIG. 9 shows an operation timing chart of IQ, IDCT1, and IDCT2 by the pixel operation unit 10.
9, when the code conversion unit 9 writes the data of the luminance block Y0 into the buffer 200 (timing t0), the code conversion unit 9 notifies the pixel calculation unit 10 to that effect by the control signal 102. The pixel operation unit 10 reads the IQ microprogram in the first instruction memory 506 in accordance with the address designation of the first PC 504 by using the QS (Quantizer Scale) value set at the time of header analysis of the processor 7, and IQ. At this time, the selector 508 selects the first PC 504 (timing t1).
[0041]
Further, the pixel operation unit 10 performs IDCT1 on the data in the buffer 200 by reading the IDCT1A and IDCT1B microprograms according to the address designation of the first PC 504. At this time, since the selector 508 selects the first PC 504, the address from the first PC 504 is specified in both the first instruction memory 506 and the second instruction memory 507 (timing t2).
[0042]
Next, the pixel operation unit 10 reads the IQ microprogram in the first instruction memory 506 in accordance with the address designation of the first PC 504 by using the QS (Quantizer Scale) value, thereby IQ-processing the data in the block Y1 of the buffer 200. At the same time, the second half of the IDCT processing is performed on the block Y0 by reading the IDCT2 microprogram in the second instruction memory 507 according to the address designation of the second PC 505. At this time, the selector 508 selects the second PC 505. The first PC 504 and the second PC 505 specify addresses independently (timing t3).
[0043]
Thereafter, similarly, the pixel operation unit 10 continues the processing in block units (after timing t4).
<1.3.2.3 Pixel read / write unit>
FIG. 10 is a block diagram illustrating a detailed configuration of the pixel read / write unit 11.
As shown in the figure, the pixel read / write unit 11 includes buffers 71 to 74 (hereinafter, buffers A to D), a half-pel interpolator 75, a synthesizer 76, selectors 77 and 78, and a read / write controller 79. Consists of
[0044]
The read / write control unit 79 performs motion compensation on the block data input via the buffer 201 using the buffers A to D, and transfers the final decoded image to the external memory 3 in units of two blocks. More specifically, the memory controller 6 is controlled so that a rectangular area corresponding to two blocks is read from the reference frame in the external memory 3 according to the motion vector set at the time of the header analysis of the processor 7. As a result, data of a rectangular area for two blocks indicated by the motion vector is stored in the buffer A or the buffer B. After that, the combining unit 76 performs half-pel interpolation of a rectangular area of two blocks according to the type of picture (I, P, or B picture). Further, by combining (adding) the block data input via the buffer 201 and the rectangular area after the half-pel interpolation, the pixel value of the block is calculated and stored in the buffer B. The final decoded block stored in the buffer B is transferred to the external memory 3 via the memory controller 6.
<1.3.3 Input / output processing unit>
The input / output processing unit 1001 switches a plurality of tasks sharing various data transfers without overhead in order to execute a large number of data inputs / outputs (data transfers) as described above, and furthermore, responds to data input / output requests. It is configured not to cause a delay. The overhead referred to here is the saving and restoring of the context that occurs at the time of task switching. That is, the input / output processor 5 is configured to eliminate the overhead caused by saving and restoring the instruction address and the register data of the program counter in the memory (stack area). Here, the detailed configuration will be described.
<1.3.3.1 IO processor>
FIG. 11 is a block diagram illustrating a configuration of the IO processor 5. In the figure, the IO processor 5 includes a state monitoring register 51, an instruction memory 52, an instruction reading circuit 53, an instruction register 54, a decoder 55, an operation execution unit 56, a general-purpose register set group 57, and a task management unit 58, and asynchronously. In order to cope with a plurality of events that occur, the system is configured to execute the task while switching the task at an extremely short cycle (for example, four instruction cycles).
[0045]
The status monitoring register 51 includes registers CR1 to CR3, and holds various status data (such as flags) for the IO processor 5 to monitor various input / output statuses. For example, the state monitoring register 51 includes a state of the stream input unit 1 (start code detection flag in the MPEG stream), a state of the video output unit 12 (a flag indicating a horizontal blanking period, a frame data transfer completion flag), and an audio output unit. 13 (transfer completion flag of audio frame data) and the state of data transfer between them and the buffer memory 2, external memory 3 and FIFO memory 4 (number of data transfer, data request flag to FIFO memory 4) Holds state data indicating such as
[0046]
More specifically, it includes the following flags and the like.
-Start code detection flag (hereinafter also referred to as flag 1)
This flag is set when the stream input unit 1 detects a start code in an MPEG stream.
-Horizontal blanking flag (flag 2)
This flag indicates a horizontal blanking period, and is set by the video output unit 12. It is set at a period of about 60 microseconds.
• Transfer completion flag of video frame data (flag 3)
This flag is set by the DMAC 5a when one frame of decoded image data is transferred from the external memory 3 to the video output unit 12.
• Transfer completion flag of audio frame data (flag 4)
This flag is set by the DMAC 5a when one frame of decoded audio data is transferred from the external memory 3 to the audio output unit 13.
• Data transfer completion flag (flag 5)
This flag is set when the number of compressed image data designated by the IO processor 5 from the stream input unit 1 to the buffer memory 2 is DMA-transferred by the DMAC 5a (when the terminal count is reached).
• DMA request flag (flag 6)
This flag is a flag indicating that there is data to be subjected to DMA transfer of the compressed image data or compressed audio data in the buffer memory 2 to the external memory 3 and is set by the IO processor 5 (from task 1 to task 2 described later). Request).
-Data request flag to video FIFO (flag 7)
This flag is a flag for requesting data transfer from the external memory 3 to the video FIFO in the FIFO memory 4, and is set when the compressed video data of the video FIFO becomes less than a predetermined amount. This flag is set at a period of about 5 to 40 microseconds.
• Data request flag to audio FIFO (flag 8)
This flag is a flag for requesting data transfer from the external memory 3 to the audio FIFO in the FIFO memory 4, and is set when the compressed audio data of the audio FIFO becomes less than a predetermined amount. This flag is set at a period of about 15 to 60 microseconds.
• Decoder communication request flag (flag 9)
This flag is a flag for requesting communication from the decoding processing unit 1002 to the input / output processing unit 1001.
-Host communication request flag (flag 10)
This flag is a flag for requesting communication from the host processor to the input / output processing unit 1001.
[0047]
The above-mentioned flags are constantly monitored, not interrupted, by each task executed by the IO processor 5.
The instruction memory 52 stores a plurality of task programs sharing a large number of data input / output (data transfer) controls. In this embodiment, six task programs of tasks 0 to 5 are stored.
・ Task 0 (Host I / F task)
This task is a task for performing communication with the host computer, that is, communication processing with the host computer via the host I / F unit 14 when the flag 10 is set. For example, start, stop, fast-forward playback, reverse playback, and the like of the decoding of the MPEG stream from the host processor are received, and the decoding status (error or the like) is notified. This process uses the flag 10 as a trigger.
・ Task 1 (purging task)
When a start code is detected by the stream input unit 1 (the flag 1), the task analyzes (parsing) the MPEG data input from the stream input unit 1 and extracts individual elementary streams. This is a program for transferring the extracted elementary stream to the buffer memory 2 by DMA transfer (the first half of the transfer path (1)). The types of elementary streams extracted here include compressed video data (also called video elementary streams), compressed audio data (also called audio elementary streams), and private data. When the elementary stream is stored in the buffer memory 2, the flag 6 is set.
・ Task 2 (stream transfer / audio task)
This task is a program that controls the following transfers (a) to (c).
[0048]
(A) DMA transfer of each elementary stream from the buffer memory 2 to the external memory 3 (the latter half of the transfer path (1)). This transfer is triggered by the flags 1 and 3 described above.
(B) DMA transfer of the compressed audio data from the external memory 3 to the audio FIFO of the FIFO memory 4 according to the data size (remaining amount) of the compressed audio data held in the audio FIFO (in the transfer path (2)) Transfer to audio FIFO). This data transfer is performed when the data size of the compressed audio data held in the audio FIFO becomes smaller than a certain amount. This transfer is triggered by the flag 8 described above.
[0049]
(C) DMA transfer of the decoded audio data from the external memory 3 to the buffer memory 2 and from the buffer memory 2 to the audio output unit 13 (the transfer path (4)). This transfer is triggered by the flag 2 described above.
・ Task 3 (Video supply task)
This task performs DMA transfer of the compressed video data from the external memory 3 to the video FIFO of the FIFO memory 4 according to the data size (remaining amount) of the compressed video data held in the video FIFO (the transfer path (2) described above). (A transfer to the video FIFO). This data transfer is performed when the data size of the compressed video data held in the video FIFO becomes smaller than a certain amount. This transfer is triggered by the flag 7 described above.
・ Task 4 (video output task)
This task is a program for processing the DMA transfer (the transfer path (4)) of decoded video data from the external memory 3 to the buffer memory 2 and from the buffer memory 2 to the video output unit 12. This transfer is triggered by the flag 2 described above.
・ Task 5 (Decoder I / F task)
This task is a program that processes a command from the decode processing unit 1002 to the IO processor 5. The commands include “getAPTS”, “getVPTS”, “getSTC”, and the like. The getVPTS (Video Presentation Time Stamp) is a command by which the decode processing unit 1002 requests the IO processor 5 to acquire the VPTS added to the compressed video data. The getAPTS (Audio Presentation Time Stamp) is a command by which the decoding processing unit 1002 requests the IO processor 5 to acquire the APTS added to the compressed audio data. getSTC (System Time Clock) is a command by which the decoding processing unit 1002 requests the IO processor 5 to acquire an STC. The IO processor 5 receiving these commands notifies the decode processing unit 1002 of the STC, VPTS, and APTS. The STC, VPTS, and APTS are used by the decoding processing unit 1002 to synchronize the decoding of audio and video, and to adjust the degree of decoding in units of frames. This process uses the flag 9 as a trigger.
[0050]
The instruction reading circuit 53 includes a plurality of program counters (hereinafter, abbreviated as PCs) each indicating an instruction fetch address, reads an instruction from the instruction memory 52 using a PC designated by the task management unit 58, and stores the instruction in the instruction register 54. . Specifically, the instruction reading circuit 53 has PCs 0 to 5 corresponding to the above tasks 0 to 5, and when the designation of the PC by the task management unit 58 is changed.
It is configured to switch the PC at high speed by hardware. With this configuration, the IO processor 5 saves the PC value of the current task in the memory at the time of the task switch and is released from the process of restoring the PC value of the next task from the memory.
[0051]
The decoder 55 decodes the instruction read from the instruction memory 52 and stored in the instruction register 54, and controls the operation execution unit 56 to execute the instruction. In addition, the decoder 55 performs a pipeline control of the entire IO processor 5 including at least three stages of an instruction reading stage of the instruction reading circuit 53, a decoding stage of the decoder 55, and an execution stage of the operation execution unit 56.
[0052]
The operation execution unit 56 includes an ALU (Arithmetic Logical Unit), a multiplier, a BS (Barrel Shifter), and the like, and executes an operation specified by the instruction under the control of the decoder 55.
The general-purpose register set group 57 includes six register sets corresponding to tasks 0 to 5 (one register set includes four 32-bit registers and four 16-bit registers). A register set having a total of 24 32-bit registers and 24 16-bit registers and corresponding to the task being executed is used. As a result, the IO processor 5 saves all the current register data in the memory at the time of the task switch and is released from the process of restoring the register data of the next task from the memory.
[0053]
The task management unit 58 performs task switching by switching the PC of the instruction reading circuit 53 and the register set of the general-purpose register set group 57 every predetermined number of instruction cycles. In this embodiment, the predetermined number is four. In addition, since the IO processor 5 processes one instruction in one instruction cycle, the task management unit 58 switches the task every four instructions without generating the overhead. As a result, response delay to various input / output requests generated asynchronously is suppressed. In other words, a response delay to an input / output request results in a maximum of only 24 instruction cycles.
<1.3.1.1.1 Instruction reading circuit>
FIG. 12 is a block diagram showing a detailed configuration example of the instruction reading circuit 53.
[0054]
In the figure, the instruction reading circuit 53 includes a task-specific PC storage unit 53a, a current IFAR (Instruction Fetch Address Register) 53b, an incrementer 53c, a next IFAR 53d, a selector 53e, a selector 53f, and a DECAR (DECode Address Register) 53g. At the time of switching, the instruction read address is switched without overhead.
[0055]
The task-specific PC storage unit 53a has six address registers corresponding to tasks 0 to 5, and holds a program count value for each task. Each program count value is a restart address of the corresponding task. At the time of task switching, under the control of the task management unit 58 and the decoder 55, the program count value is read from the address register corresponding to the task to be executed next, and the program of the address register corresponding to the task currently being executed is read. The count value is updated to a new restart address. At this time, the task to be executed next and the current task are respectively specified by the task management unit 58 by a “nexttaskid (rd addr)” signal (hereinafter also referred to as a task ID) and a “taskid (wr addr)” signal.
[0056]
The program count values corresponding to tasks 0, 1, and 2 are shown in PC0, PC1, and PC2 of FIG. In the figure, (0-0) represents instruction 0 of task 0, and (1-4) represents instruction 4 of task 1. For example, PC0 is read when task 0 is restarted (instruction cycle t0), and is updated to the address of the instruction (0-4) when switching to the next task (instruction cycle t4).
[0057]
The loop circuit including the incrementer 53c, the next IFAR 53d, and the selector 53e is a circuit that updates the instruction read address selected by the selector 53e. The address output from the selector 53e is shown as IF1 in FIG. In the figure, for example, when switching from task 0 to task 1, the selector 53e selects the instruction (1-0) address read from the task-specific PC storage unit 53a in cycle t4, and selects the next address in cycles t5 to t7. Select the incremented instruction address from IFAR 53d.
[0058]
The current IFAR 53b holds the selected output IF1 of the selector 53e with a delay of one cycle, and outputs it to the instruction memory 52 as an instruction read address. In other words, it holds the instruction read address of the currently active task. The instruction read address of the current IFAR 53b is shown in IF2 of FIG. As shown in the figure, IF2 indicates an instruction address of a different task every four instruction cycles.
[0059]
The DECAR 53g holds the address of the instruction held in the instruction register 54. That is, it indicates the instruction being decoded. DEC in FIG. 13 shows the address held in DECAR 53g. EX in FIG. 13 indicates an instruction address being executed.
The selector 53f selects a branch address when a branch instruction is executed or an interrupt occurs, and otherwise selects the address of the next IFAR 53d.
[0060]
With the provision of such an instruction reading circuit 53, the IO processor 5 performs four-stage (IF1, IF2, DEC, EX) pipeline processing as shown in FIG. The IF1 stage is a stage for selecting and updating a plurality of program count values. The IF2 stage is a stage for reading an instruction. <1.3.2.1.2 Task management unit>
FIG. 14 is a block diagram showing a detailed configuration of the task management unit 58. In the figure, the task management unit 58 is broadly divided into a slot manager that manages task switching timing and a scheduler that manages the order of tasks.
[0061]
The slot manager has a counter 58a, a latch 58b, a comparator 58c, and a latch unit 58d, and outputs a task switching signal (chgtaskex) for instructing task switching to the instruction reading circuit 53 every four instruction cycles.
Specifically, the latch 58b is two flip-flop (FF) circuits that hold the lower two bits of the output of the counter 58a. The counter 58a outputs 3 bits obtained by incrementing the 2-bit output value of the latch 58b by +1 every clock indicating an instruction cycle. As a result, the counter 58a repeatedly outputs 1, 2, 3, and 4. The comparator 58c outputs a task switching signal (chgtaskex) to the instruction reading circuit 53 and the scheduler when the output value of the counter 58a matches the constant 4.
[0062]
The scheduler includes a task round management unit 58e, a priority encoder 58f, and a latch 58g. Each time a task switching signal (chgtaskex) is output, the scheduler updates a task id, and outputs a current task id and a task id to be executed next. Is output to the instruction reading circuit 53.
Specifically, both the latch unit 58d and the latch 58g hold the current task id in an encoded format (3 bits). The value of the encoded format represents the task id.
[0063]
When the task switching signal (chgtaskex) is input, the task round management unit 58e refers to the latch unit 58d and outputs the task id to be executed next in a decoded format (6 bits). In the decoded format (6 bits), one bit corresponds to one task, and the bit position represents the task id.
The priority encoder 58f converts the task id output from the task round management unit 58e from a decoded format to an encoded format. Both the latch unit 58d and the latch 58g hold the encoded task id one cycle later.
[0064]
With this configuration, when the task switching signal (chgtaskex) is output from the comparator 58c, the task round management unit 58e sets the id of the task to be executed next from the priority encoder 58f as the “nexttaskid (rd addr)” signal. The current task id is output from the latch 58e as a "taskid (wr addr)" signal.
<1.4 Description of operation>
The operation of the video / audio processing device 1000 according to the first embodiment configured as described above will be described.
[0065]
In the input / output processing unit 1001, an MPEG stream asynchronously input from the stream input unit 1 is temporarily stored in the external memory 3 via the buffer memory 2 and the memory controller 6 under the control of the input / output processor 5, and The data is stored in the FIFO memory 4 via the controller 6. At this time, the IO processor 5 supplies the compressed moving image data and the compressed audio data to the FIFO memory 4 by executing the above tasks 2 (b) and 3 according to the remaining amount. As a result, a fixed amount of compressed moving image data and compressed audio data are supplied to the FIFO memory 4 without excess and deficiency, so that the decoding processing unit 1002 can separate from asynchronous input / output and exclusively use the decoding processing. it can. The processing up to this point is performed by the input / output processing unit 1001 independently and in parallel with the decoding processing unit 1002.
[0066]
On the other hand, in the decoding processing unit 1002, the MPEG stream data held in the FIFO memory 4 is subsequently decoded by the processor 7, the code conversion unit 9, the pixel operation unit 10, and the pixel read / write unit 11. FIG. 15 is an explanatory diagram showing the decoding operation after the FIFO memory 4.
In the figure, the horizontal axis represents the time axis, and the header analysis of approximately one macroblock and the decoding of each block are shown. The vertical direction indicates that the decoding of each block is executed in a pipeline manner in each unit of the decoding processing unit 1002.
[0067]
As shown in the figure, the processor 7 repeats the header analysis of the compressed video data and the decoding process on the compressed audio data in a time-division manner. That is, the processor 7 analyzes the header of one macroblock, notifies the code conversion unit 9, the pixel operation unit 10, and the pixel read / write unit 11 of the analysis result, and then starts decoding the macroblock to the code conversion unit 9. Instruct. Thereafter, the processor 7 performs a decoding process on the compressed audio data until an interrupt signal is notified from the code conversion unit 9. The decoded audio data is temporarily held in the internal memory 8 and further DMA-transferred to the external memory 3 by the memory controller 6.
[0068]
Further, the code conversion section 9 receives a macroblock decoding start instruction from the processor 7 and stores the macroblock in the buffer 200 for each block in the macroblock. At this time, the code conversion unit 9 changes the order of the write address to the buffer 200 according to the scan type of the block notified when the processor 7 analyzes the header. That is, the order of the write addresses is changed between the zigzag scan and the alternate scan. Thus, the pixel operation unit 10 does not need to change the order of the read addresses, and can always read in the same order of the read addresses regardless of the scan type. The code conversion unit 9 repeats the above operation until the six blocks in the macro block have been subjected to the VLD processing, and writes the blocks into the buffer 200. When the VLD of six blocks is completed, an interrupt is generated in the processor 7. This interrupt signal is a macro block decode end signal End Of Macro Block (EOMB). The code conversion unit 9 generates the EOMB by detecting the block end signal End Of Block (EOB) of the sixth block.
[0069]
The pixel operation unit 10 performs IQ and IDCT on the block data stored in the buffer 200 in block units as shown in FIG. 9 in parallel with the code conversion unit 9, and stores the processing result in the buffer 201.
The pixel read / write unit 11, in parallel with the pixel operation unit 10, based on the block data in the buffer 201 and the motion vector notified by the header analysis by the processor 7 as shown in FIG. Of a rectangular area from the block and block synthesis. The result of block synthesis is stored in the external memory 3 via the FIFO memory 4.
[0070]
The above is the operation when the block is not a skip macro block. In the case of a skip macro block, the code conversion unit 9 and the pixel operation unit 10 do not operate, and only the pixel read / write unit 11 operates. If there is a skipped macroblock, the image is the same as the rectangular area in the reference frame, so that the image is copied to the external memory 3 by the pixel read / write unit 11 as a decoded image.
[0071]
In this case, an interrupt signal from the code converter 9 to the processor 7 is generated as follows. That is, a signal indicating that the processor 7 has transmitted a control signal for starting a motion compensation operation to the pixel reading / writing unit 11, a signal indicating that the pixel reading / writing unit 11 can perform the motion compensation operation, and a skip macro block. Is obtained, and an interrupt signal is input to the processor 7 as a logical sum of the logical product and the EOMB signal.
[0072]
As described above, according to the video / audio processing device of the first embodiment of the present invention, the MPEG stream input processing from the storage medium or the communication medium, and the display image data and the audio data to the display device and the audio output device are performed. The input / output processing unit 1001 shares output processing and processing for supplying a stream to the decoding processing unit 1002, and the decoding processing unit 1002 shares decoding processing of compressed video data and compressed audio data. As a result, the decoding processing unit 1002 is released from the processing that occurs asynchronously and can exclusively use the decoding processing. As a result, since a series of processing of inputting, decoding, and outputting the MPEG stream is efficiently executed, full decoding of the MPEG stream (without dropped frames) can be realized without using a high-speed operation clock.
[0073]
Further, it is desirable that the present video / audio processing device be integrated into an LSI on one chip. In this case, the full decoding can be performed with an operation clock of 100 MHz or less (actually, 54 MHz). In this regard, recent high-performance CPUs whose operation clocks exceed 100 MHz or 200 MHz enable the full decoding as long as the image size is small, but on the other hand, the manufacturing cost is high. On the other hand, the present video / audio processing apparatus is superior in terms of manufacturing cost and full decoding.
[0074]
Furthermore, the decoding processing unit 1002 of the present video / audio processing apparatus shares roles as follows.
In other words, the processor 7 is in charge of header analysis that requires a wide variety of condition judgments for both compressed video data and compressed audio data, and is also responsible for decoding audio compressed data. Since a large amount of routine calculation is required for block data of compressed video data, dedicated hardware (firmware) such as a code conversion unit 9, a pixel operation unit 10, and a pixel read / write unit 11 performs decoding processing. In charge of As shown in FIG. 15, the code conversion unit 9, the pixel operation unit 10, and the pixel read / write unit 11 are pipelined. In the pixel operation unit 10, IQ and IDCT can be processed in parallel. The pixel read / write unit 11 implements access to a reference frame in units of two blocks. As a result, the efficiency of the compressed audio decoding process is improved, so that the hardware dedicated to video decoding can obtain high processing performance without using a high-speed clock. More specifically, a processing ability equivalent to or higher than that of the related art was obtained with a clock of about 50 to 60 MHz without using a high-speed clock exceeding 100 MHz. Therefore, it is not necessary to use a high-speed element, and the manufacturing cost can be reduced.
[0075]
Further, since the basic unit of video decoding is a macroblock unit in the processor 7, a block in the code conversion unit 9 and the pixel operation unit 10, and two blocks in the pixel read / write unit 11, the capacity of the buffer buffer in video decoding is minimized. It becomes possible.
<2 Second Embodiment>
The video and audio processing apparatus according to the present embodiment is configured to perform a compression function (hereinafter, referred to as an encoding process) and a graphics function in addition to a decoding function of the compressed stream data.
<2.1 Configuration of video and audio processing device>
FIG. 16 is a block diagram illustrating a configuration of a video and audio processing device according to the second embodiment of the present invention.
[0076]
The video / audio processing device 2000 includes a stream input / output unit 21, a buffer memory 22, a FIFO memory 24, an input / output processor 25, a memory controller 26, a processor 27, an internal memory 28, a code conversion unit 29, a pixel operation unit 30, a pixel read / write It comprises a unit 31, a video output unit 12, an audio output unit 13, a buffer 200, and a buffer 201. The video / audio processing device 2000 has the following functions in addition to the functions of the video / audio processing device 1000 shown in FIG. That is, a compression function for video data and audio data and a graphics function for drawing polygon data are added.
[0077]
Therefore, in the video / audio processing apparatus 2000, the components having the same names as those in FIG. 4 have exactly the same functions, and further have a function of performing a compression function and a graphics function. Hereinafter, the description of the same points as in FIG. 4 will be omitted, and the description will focus on the different points.
The stream input / output unit 21 is different in that it is bidirectional. That is, when the MPEG data is transferred from the buffer memory 22 under the control of the input / output processor 25, the transferred parallel data is converted into serial data and output to the outside as an MPEG data stream.
[0078]
The difference is that the buffer memory 22 and the FIFO memory 24 are also bidirectional.
The input / output processor 25 controls the data transfer along the routes (5) to (8) in addition to controlling the data transfer along the routes (1) to (4) shown in the first embodiment. .
(1) Stream input / output unit 21 → buffer memory 22 → memory controller 26 → external memory 3
(2) External memory 3 → memory controller 26 → FIFO memory 24
(3) External memory 3 → memory controller 26 → buffer memory 22 → video output unit 12
(4) External memory 3 → memory controller 26 → buffer memory 22 → audio output unit 13
(5) External memory 3 → memory controller 26 → internal memory 28
(6) External memory 3 → memory controller 26 → pixel read / write unit 31
(7) FIFO memory 24 → memory controller 26 → external memory 3
(8) External memory 3 → memory controller 26 → buffer memory 22 → stream input / output unit 21
The paths (5) and (6) are the paths of the original data when the video data and the audio data are encoded, and (7) and (8) show the paths of the MPEG stream after compression.
[0079]
First, the encoding process will be described. It is assumed that data to be encoded is stored in the external memory 3. The video data in the external memory 3 is transferred to the pixel read / write unit 31 by controlling the memory controller 26 by the pixel read / write unit 31.
The pixel read / write unit 31 performs a process of writing video data to the second buffer 201 and a process of generating a difference image. The difference image generation processing includes motion detection (calculation of a motion vector) in units of blocks and generation of a difference image. Therefore, the pixel read / write unit 31 includes therein a motion detection circuit that detects a motion vector by searching in a rectangular area similar to the encoding target block and a reference frame. Instead of the motion detecting circuit, a motion estimating circuit for estimating a motion vector to be coded by using a motion vector of an already calculated block of an adjacent frame may be provided.
[0080]
The pixel operation unit 25 receives the difference image data in block units, and performs DCT, IDCT, quantization processing (hereinafter, Q processing), and IQ. The video data thus quantized is stored in the buffer 200.
The code conversion unit 29 receives the quantized data from the buffer 200 and performs variable length code processing (VLC). The variable-length coded data is stored in the first-in first-out memory 24, is stored in the external memory 3 through the memory controller 26, and the processor 27 adds header information for each macroblock.
[0081]
The video data in the external memory 3 is transferred to the internal memory 28 via the memory controller 26. The processor 27 compresses the audio data in the internal memory 28 by a process of adding header information for each macroblock and a time division.
As described above, the encoding process is performed in a path reverse to that of the first embodiment.
[0082]
Next, the graphics processing will be described. The graphics processing is a three-dimensional image generation processing performed by a combination of rectangular figures called polygons. In this apparatus, processing for generating pixel data inside the polygon from pixel data at the vertex coordinates of the polygon is performed.
First, the vertex data of the polygon is stored in the external memory 3.
[0083]
The vertex data is stored in the internal memory 28 by the processor 27 controlling the memory controller 26. The processor 27 reads the vertex data from the internal memory 28, performs preprocessing of DDA (Digital Difference Analysis), and writes the data to the FIFO memory 24.
The code conversion unit 29 reads vertex data from the FIFO memory 24 according to the instruction of the pixel operation unit 30 and transfers the data to the pixel operation unit 30.
[0084]
The pixel operation unit 30 performs a DDA process and transmits the result to the pixel read / write unit 31. The pixel read / write unit 31 performs a Z-buffer process or an α-blending process according to an instruction from the processor 27 and writes image data to the external memory 3 via the memory controller 26.
<2.1.1 Pixel operation unit>
FIG. 17 is a block diagram illustrating a configuration of the pixel operation unit 30.
[0085]
In the figure, the same components as those of the pixel operation unit 10 shown in FIG. 7 are denoted by the same reference numerals, and the description thereof will be omitted.
The difference is that the pixel operation unit 30 has three execution units (501a to 501c) in the pixel operation unit 10 shown in FIG. That is, an instruction register 309 and a distribution unit 310 are added.
[0086]
The reason why the execution units 501a to 501c have three surfaces is to improve the calculation performance. Specifically, in the graphics processing, the color images RGB are independently processed in parallel. In the IQ and Q processes, three multipliers 502 are used to increase the speed. In the IDCT, time is reduced by using a plurality of multipliers 502 and a plurality of adders / subtractors 503. In the IDCT, there is an operation called a butterfly operation, and since there is a dependency between all data that is the source of the operation, a data line 103 for performing inter-unit communication of the execution units 501a to 501c is provided.
[0087]
The first instruction memory 506 and the second instruction memory 507 store microprograms for DCT, Q processing, and DDA in addition to IDCT and IQ. FIG. 18 shows an example of the storage contents of the first instruction memory 506 and the second instruction memory 507. Compared to FIG. 8, a Q processing microprogram, a DCT microprogram, and a DDA microprogram are added.
[0088]
The instruction pointer holding units 308a to 308c are provided corresponding to the execution units 501a to 501c, and each have a conversion table that converts an address input from the first program counter and outputs the converted address to the instruction register unit 309. The converted address indicates the register number of the instruction register unit 309. Further, the instruction pointer holding units 308a to 308c hold the later-described modify flags and output the same to the instruction execution units 501a to 501c.
[0089]
Regarding the conversion table, the instruction pointer holding units 308a, 308b, and 308c, for example, when the input addresses are 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, and 12, respectively, The converted address is output.
Instruction pointer holding unit 308a: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12
Instruction pointer holding unit 308b: 2, 1, 4, 3, 6, 5, 8, 7, 10, 9, 12, 11
Instruction pointer holding unit 308c: 4, 3, 2, 1, 8, 7, 6, 5, 12, 11, 10, 9
As shown in FIG. 23, the instruction register unit 309 includes a plurality of registers for holding microinstructions, three selectors, and three output ports. The three selectors select the microinstruction of the register specified by the conversion address (register number) input from the instruction pointer units 308a, 308b, 308c. The three output ports are provided corresponding to the selectors, and output the microinstructions selected by the selectors to the execution units 501a to 501c via the distribution unit 310. Three selectors and output ports are provided to supply different micro-instructions to three adder / subtracters 503 (or three multipliers 502) at the same time. In this embodiment, it is assumed that the three output ports are selectively supplied to one of the three adders / subtractors 503 and the three multipliers 502 via the distribution unit 310.
[0090]
For example, the instruction register unit 309 includes registers R1 to R16 (register numbers 1 to 16). The microprogram stored in the registers R1 to R16 represents a matrix operation process required in DCT and IDCT, and is stored so as to perform the same process in any of the three register numbers. That is, in the microprogram having the above three execution orders, the order of some microinstructions whose execution order is interchangeable is changed. This is because the execution units 501a to 501c execute the microprogram in parallel, so that resource interference such as competition of register (not shown) access between the execution units 501a to 501c is avoided. The matrix operation processing includes multiplication, transposition, and transfer of an 8 × 8 matrix.
[0091]
Next, the microinstruction stored in each register of the instruction register unit 309 is in a mnemonic format.
"Op Ri, Rj, dest, (modify flag)"
Is written. However, the microinstruction of the instruction register unit 309 is only “op and Ri, Rj and (modify flag)”. The “dest” part is specified from the instruction memories 506 and 507. Designated from the partial instruction pointer holding units 308a to 308c of "(modify flag)".
[0092]
Here, “op” is an operation code indicating a multiplication instruction, an addition / subtraction instruction, a transfer instruction, and the like, and “Ri, Rj” is an operand. The multiplication instruction is an instruction executed by each of the multipliers 502 in the three execution units 501a to 501c. The addition instruction and the transfer instruction are instructions executed by each of the multipliers 502 in the three execution units 501a to 501c. It is.
“Dest” indicates the storage location of the operation result. This “dest” is specified not from the register of the instruction register unit 309 but from the instruction memory 506 (for a multiplication instruction) or the instruction memory 507 (for an addition / subtraction instruction or a transfer instruction). This is to make the microprogram of the instruction register unit 309 common to the execution units 501a to 501c. If the transfer destination is designated by a register, it is necessary to prepare a separate microprogram for each of the execution units 501a to 501c, and the capacity of the microprogram is increased several times.
[0093]
The "modify flag" is a flag indicating whether addition or subtraction is performed in an addition / subtraction instruction. The "modify flag" is separately specified not from the register of the instruction register unit 309 but from the instruction pointer holding units 308a to 308c. This is because the constant matrix used for the matrix operation in DCT and IDCT includes a row (or column) in which all elements are “1” and a row (or column) in which all elements are “−1”. By designating the "modify flag" from 308a to 308c, the same microprogram of the instruction register unit 309 can be shared.
[0094]
When the three micro-instructions input from the instruction register unit 309 are addition / subtraction instructions, the distribution unit 310 determines the “op and Ri, Rj” part and the “dest” input from the instruction memory 506. The part and the “modify flag” input from the instruction pointer units 308 a to 308 c are distributed to the three adders / subtractors 503, and the micro instruction of the instruction memory 506 is simultaneously distributed to the three multipliers 502. When the three micro-instructions input from the instruction register unit 309 are multiplication instructions, the distribution unit 310 compares the “op and Ri, Rj” portions with “dest” input from the instruction memory 506. Is distributed to the three multipliers 502, and the microinstruction in the instruction memory 507 is distributed to the three adder / subtracters 503. In other words, the micro instruction supplied to the three adder / subtractors 503 by the distributor 310 is one micro instruction supplied from the instruction memory 507 for the instruction common to the three adder / subtractors 503, and the three adder / subtractor For the different addition / subtraction instructions at 503, three microinstructions from the instruction register unit 309 are supplied respectively. Similarly, the microinstruction supplied to the three multipliers 502 is a microinstruction supplied from the instruction memory 506 for an instruction common to the three multipliers 502, and an instruction for a multiplication instruction different in the three multipliers 502. The micro instruction from the register unit 309 is supplied to each.
[0095]
According to such a configuration of the pixel operation unit 30, the storage capacity of the instruction memory 506 and the instruction memory 507 can be reduced.
Assuming that the pixel operation unit 30 does not include the instruction pointer holding units 308a to 308c, the instruction register unit 309, and the distribution unit 310, the instruction memory 506 and the instruction memory 507 are all stored in the three execution units 501a to 501c. To supply different microinstructions for it, three microinstructions must be stored in parallel.
[0096]
FIG. 22 shows an example of the contents stored in the instruction memory 506 and the instruction memory 507 in the case where the instruction pointer holding units 308a to 308, the instruction register unit 309, and the distribution unit 310 are not provided. In the figure, a 16-step microprogram is stored, and one microinstruction has a 16-bit length. In this case, the instruction memory 506 and the instruction memory 507 require a total storage capacity of 1536 bits (16 steps × 16 bits × 3 × 2) because three micro-instructions are recorded in parallel.
[0097]
On the other hand, FIG. 23 shows an example of the storage contents of the instruction pointer holding units 308a to 308c and the instruction register unit 309 in the pixel operation unit 30 of the present embodiment. Also in this figure, a 16-step microprogram is stored, and one microinstruction has 16 bits. In the figure, the instruction pointer holding units 308a to 308c store 16 register numbers (4-bit length), respectively, and the instruction register unit 309 stores 16 microinstructions. In this case, the storage capacity of the instruction pointer holding units 308a to 308c and the instruction register unit 309 may be 448 bits (16 steps × (12 + 16)). As described above, in the pixel operation unit 30, the storage capacity of the microprogram can be significantly reduced. Actually, since "dest" and "modify flag" are separately issued, a recording capacity or a circuit corresponding to that is required. The instruction memories 506 and 507 designate "dest" in the microinstructions and issue multiplication instructions and addition / subtraction instructions common to the execution units 501a to 501c. I didn't even delete it. If the instruction register unit 309 is provided with six output ports, the instruction memory 506 and the instruction memory 507 can be deleted.
[0098]
In FIG. 23, the instruction pointer holding units 308a to 308c output the conversion address (register number) when the value of the first program counter is 0 to 15, but the present invention is not limited to this. For example, the conversion address may be output when the value of the first program counter is 32 to 47. In this case, an appropriate offset value may be added to the value of the first program counter. As a result, an arbitrary address string indicated by the first program counter can be converted into a conversion address.
[0099]
With the above configuration, in the present embodiment, not only decoding processing of compressed video data and compressed audio data, but also encoding processing of video and audio data and graphics processing based on polygon data are possible. Further, the processing efficiency is improved by the parallel operation of the plurality of execution units. In addition, by changing the order of some of the micro-instructions in the instruction register units 308a to 308c, resource interference among a plurality of execution units can be avoided, so that the processing efficiency is further improved.
[0100]
In the above embodiment, the configuration having three execution units is shown because it is advantageous in that each of the RGB colors can be calculated independently. Furthermore, the number of execution units may be any number as long as it is three or more.
Further, in the above embodiment, it is desirable that each of the video and audio processing devices 1000 and 2000 be implemented as a one-chip LSI. Further, the external memory 3 has been described as being external to the chip, but it may be configured to be built in one chip.
[0101]
In the above embodiment, the stream input / output unit 1 (or the stream input / output unit 21) stores the MPEG stream (or the video / audio data) in the external memory. May be configured.
Furthermore, in the above embodiment, the IO processor 5 performs the task switching every four instruction cycles, but may perform the task switching every plural instruction cycles other than the four instruction cycles. Also, the number of instruction cycles for task switching may be weighted in advance for each task and set to a different number of instruction cycles. Also, the number of instruction cycles for each task may be weighted according to the priority and the urgency.
[0102]
【The invention's effect】
A video and audio processing apparatus according to the present invention is a video and audio processing apparatus that externally inputs and decodes a data stream including compressed audio data and compressed video data, and outputs the decoded data to an output device. Input / output processing means for performing input / output processing occurring in the memory; and decoding processing means for performing decoding processing mainly for decoding a data stream stored in a memory in parallel with the input / output processing; The video data decoded by the means and the decoded audio data are stored in a memory, and the input / output processing includes inputting the data stream asynchronously input from the outside, storing the data stream in a memory, and storing the data stream in a memory. Supplying the decoded data stream to the decoding processing means, an external display device and an audio output device, respectively. To match the output rate from the memory, and is configured to perform and outputting them as input and output processing.
[0103]
According to this configuration, in addition to the input / output processing means and the decoding processing means operating in parallel in a pipeline, the asynchronous processing and the decoding processing are shared between the input / output processing means and the decoding processing means. The processing means can be released from the processing that occurs asynchronously and can exclusively use the decoding processing. As a result, the video and audio processing apparatus efficiently executes a series of processing of stream data input, decode, and output, thereby enabling full decoding of stream data (without dropping frames) without using a high-speed operation clock. ing.
[0104]
Further, the decoding processing means performs a sequential processing mainly on a condition determination on the data stream, including a header analysis of the compressed audio data and the compressed video data and a decoding of the compressed audio data. The sequential processing means performs a standard processing in parallel with the sequential processing. The routine processing may include a routine processing means for decoding the compressed video data excluding the header analysis of the compressed video data.
[0105]
According to this configuration, the processing efficiency can be greatly improved by eliminating the coexistence of the sequential processing having different processing characteristics and the routine processing suitable for the parallel processing in one unit. In particular, the processing efficiency of the routine processing means can be improved. This is because, in the present video / audio processing apparatus, since the standard processing means is released from the asynchronous processing and the sequential processing described above, it can exclusively use various standard operations required for decoding the compressed video data. As a result, high processing performance can be obtained without using a high-speed operation clock.
[0106]
Further, the input / output processing means includes an input means for inputting an asynchronous data stream from outside, a video output means for outputting decoded video data to an external display device, and an audio data decoded to an external audio output device. And a processor for executing the first to fourth tasks stored in the instruction memory while switching, wherein the first task is a program for transferring a data stream from an input unit to the memory. The second task is a program for supplying a data stream from the memory to the decoding processing means; the third task is a program for outputting decoded video data from the memory to a video output unit; The task is a program that outputs decoded audio data from the memory to an audio output unit. It may be.
[0107]
Here, the processor includes a program counter unit having at least four program counters corresponding to the first to fourth tasks, and an instruction memory for storing each task program using an instruction address indicated by one program counter. An instruction fetch unit that fetches an instruction, an instruction execution unit that executes the instruction fetched by the instruction fetch unit, and controls the instruction fetch unit to sequentially switch the program counter every time a predetermined number of instruction cycles elapse. It may be configured to have a task control unit.
[0108]
According to this configuration, regardless of the range of the input rate and the input cycle of the stream data determined by the external device, and the output rate and the output cycle of the video data and the audio data determined by the external display device and the external audio output device, respectively. In addition, there is an effect that a response delay to an input / output request is extremely small.
Further, the video and audio processing apparatus of the present invention includes: an input unit for inputting a data stream including compressed audio data and compressed video data; and a sequential processing mainly based on condition determination for the data stream. A sequential processing means for analyzing header information added to a predetermined block unit in the inside and decoding compressed audio data in a data stream; and a routine processing mainly including routine operations, using a result of the header analysis. Routine processing means for decoding the compressed video data in the data stream in units of predetermined blocks in parallel with the sequential processing, wherein the sequential processing unit performs routine processing when header analysis of the predetermined block is completed. To start decoding of the predetermined block, and when receiving a notification of the end of decoding of the predetermined block from the routine processing means, the next predetermined block is notified. It may be configured to start the header analysis.
[0109]
According to this configuration, the sequential processing means is responsible for header analysis which requires a wide variety of condition judgments for both compressed video data and compressed audio data, and is also responsible for decoding audio compressed data. On the other hand, the routine processing means is responsible for a large amount of routine operation on block data of the compressed video data. Due to such role sharing, the sequential processing means performs overall audio decoding, which requires a smaller amount of computation than video decoding, analyzes the header of compressed video data, and controls the routine processing means. Under the control, the routine processing unit performs a routine operation exclusively, so that efficient processing without waste can be realized. Therefore, the processing capability can be obtained without operating at a high frequency, and the manufacturing cost can be reduced. In addition, the sequential processing means sequentially performs overall audio decoding, header analysis of the compressed video data, and control of the standard processing means, so that it can be constituted by one processor.
[0110]
Further, the routine processing means performs a variable length decoding of the compressed video data in the data stream in accordance with an instruction of the sequential processing means, and performs a predetermined operation on the video block obtained by the variable length decoding. Computing means for performing inverse quantization and inverse discrete cosine transform, and synthesizing means for restoring video data by performing motion compensation processing by synthesizing a video block after inverse discrete cosine transform and a decoded block. ,
The sequential processing unit includes an obtaining unit that obtains header information that has been subjected to variable-length decoding by the data conversion unit, an analyzing unit that analyzes the obtained header information, and a notification that notifies a parameter obtained as an analysis result to the standard processing unit. Means, an audio decoding means for decoding the compressed audio data in the data stream inputted by the input means, and an operation of the audio decoding means when receiving an interrupt signal notifying the completion of decoding of the predetermined block from the routine processing means. The information processing apparatus may be configured to include: a control unit that instructs the data conversion unit to start variable-length decoding of the video block when the acquisition unit is stopped and the acquisition unit is activated and the notification unit notifies the notification.
[0111]
According to this configuration, the sequential processing unit performs header decoding after performing header analysis on a predetermined block basis such as a macroblock, and starts decoding the header of the next block when decoding of the predetermined block is completed by the standard processing unit. . As described above, the sequential processing means repeats the header analysis and the audio decoding in a time-division manner, so that it can be realized at low cost by one processor. Further, since the routine processing means does not need to perform a wide variety of condition judgment processing, it can be made into dedicated hardware (or hardware and firmware) at low cost.
[0112]
Here, the arithmetic unit further includes a first buffer having a storage area corresponding to one block, and the data conversion unit includes a variable length decoding unit that performs variable length decoding of compressed video data in a data stream; First address table means for storing a first address sequence in which addresses of storage areas of one buffer are arranged in zigzag scan order; and second address storing means for storing a second address sequence in which addresses of storage areas of the first buffer are arranged in alternate scan order. The configuration may include an address table unit and a writing unit that writes block data obtained by variable length decoding of the variable length decoding unit into the first buffer according to one of the first address sequence and the second address sequence.
[0113]
According to this configuration, the writing unit can write the block data in the storage area of the first buffer in response to both the zigzag scan and the alternate scan. Therefore, when reading the block data from the storage area of the first buffer, the arithmetic means does not need to change the order of the read addresses, and can always read in the same order of the read addresses regardless of the scan type.
[0114]
Further, the analysis unit may calculate a quantization scale and a motion vector based on the header information, and the notifying unit may notify the calculation unit of the quantization scale and notify the synthesis vector of the motion vector. Good.
According to this configuration, the calculation of the motion vector can be assigned to the sequential processing means, and the synthesizing means can routinely perform the motion compensation processing using the calculated motion vector. .
[0115]
Further, the arithmetic means designates a first and a second control storage section for storing a microprogram, a first program counter for designating a first read address in the first control storage section, and a second read address, respectively. A first program counter, a selector for selecting one of the first read address and the second read address and outputting the selected read address to the second control storage unit, a multiplier and an adder; An execution unit for performing inverse quantization and inverse discrete cosine transform in block units by microprogram control by the unit;
May be provided.
[0116]
According to this configuration, the microprogram (firmware) does not need to perform a wide variety of condition determination processing, and only realizes routine processing, so that the program size is small and easy to create, which is suitable for cost reduction. . Moreover, the multiplier and the adder can be operated independently and in parallel using two program counters.
[0117]
Further, when the second read address is selected by the selector, the execution unit performs the processing using the multiplier and the processing using the adder independently and in parallel, and the first read address is selected by the selector. At this time, the processing using the multiplier and the processing using the adder may be performed in conjunction with each other.
According to this configuration, the idle time of the multiplier and the adder can be reduced, and the processing efficiency can be improved.
[0118]
Here, the arithmetic unit further includes a first buffer for storing the video block from the data conversion unit, and a second buffer for storing the block subjected to the inverse discrete cosine transform by the execution unit, and The storage unit stores a microprogram for performing an inverse quantization process and a microprogram for performing an inverse discrete cosine transform, and the second control storage unit includes a microprogram for performing an inverse discrete cosine transform, and a video block subjected to an inverse discrete cosine transform. Is transferred to the second buffer, and the execution means executes processing for transferring the inverse discrete cosine transformed video block to the second buffer and processing for inversely quantizing the next video block in parallel. And the process of performing inverse discrete cosine transform of the inversely quantized video block may be performed in conjunction with the multiplier and the adder.
[0119]
According to this configuration, since the inverse quantization process and the transfer process to the second buffer are performed in parallel, the processing efficiency can be improved.
The input means further inputs polygon data, the sequential processing means further analyzes the polygon data to calculate the vertex coordinates of the polygon and the inclination of the edge, and the routine processing means further calculates the polygon data. The image data of the polygon may be generated according to the vertex coordinates and the inclination.
[0120]
According to this configuration, the sequential processing unit is in charge of analyzing the polygon data, and the standard processing unit is in charge of the standard image data generation processing. The video / audio processing apparatus can perform graphics processing for efficiently generating image data from polygon data.
Here, the first and second control storage units further store a microprogram for performing scan conversion by the DDA algorithm, and the execution unit further performs the above based on the vertex coordinates and the inclination calculated by the sequential processing means. The scan conversion may be performed by microprogram control.
[0121]
According to this configuration, the generation of the image data can be easily realized by the scan conversion microprogram in the first and second control storage units.
Further, the synthesizing means generates a differential block representing a differential image from the video data to be further compressed, the second buffer holds the further generated differential image, and the first control storage unit further performs a discrete cosine transform. The second control storage unit further stores a microprogram for performing discrete cosine transform and a microprogram for transferring the video block subjected to discrete cosine transform to the first buffer. The means further performs discrete cosine transform and quantization on the difference block held in the second buffer and transfers the result to the first buffer. The data conversion means further performs variable length coding on the block in the first buffer. And the sequential processing means further performs header information on the predetermined block which has been variable-length coded by the data conversion means. The may be configured to add.
[0122]
According to this configuration, the routine processing unit is responsible for quantization and discrete cosine transform as routine processing, and the sequential processing unit is responsible for processing that requires a condition determination (addition of header information). In this case, the present video / audio processing apparatus can efficiently execute the encoding process from image data to compressed video data without using a high-speed clock.
Further, the arithmetic means designates a first and a second control storage section for storing a microprogram, a first program counter for designating a first read address in the first control storage section, and a second read address, respectively. A first program counter, a selector for selecting one of the first read address and the second read address and outputting the selected one to the second control storage unit, a multiplier and an adder; A plurality of execution units for performing inverse quantization and inverse discrete cosine transform in block units by microprogram control by a storage unit are provided, and each execution unit is configured to share and process partial blocks obtained by dividing blocks. May be.
[0123]
According to this configuration, since a plurality of execution units execute the operation instructions in parallel, a large amount of routine operations can be efficiently executed in parallel at the pixel level.
Further, the arithmetic means is further provided corresponding to each execution unit, each conversion table is a plurality of address conversion table holding a conversion address partially changed the address order corresponding to a predetermined address sequence, An instruction register group consisting of a plurality of registers for storing individual microinstructions constituting a microprogram for realizing a predetermined operation in association with a translation address; and a first and second control storage units and a plurality of execution units. A switching unit that switches a microinstruction output from the first control storage unit or selector to each execution unit to a microinstruction in an instruction register and outputs the microinstruction to a plurality of execution units. (2) If the read address is an address in the predetermined address string, the address is converted according to each of the address conversion tables. It is converted to. The instruction register group may be configured to output a micro instruction corresponding to each translation address output from the translation table.
[0124]
According to this configuration, while the plurality of execution units execute the microprogram in parallel, resource interference such as access competition between the execution units can be avoided, and the processing can be performed more efficiently.
Here, each of the conversion tables should be added or subtracted in accordance with a microinstruction output indicating addition and subtraction in the register while the first program counter outputs the first read address in the predetermined address string. Outputting a flag indicating a perception to the plurality of execution units, wherein each execution unit performs addition and subtraction according to the flag, and the flag is set according to a microinstruction of the second control storage unit. Is also good.
[0125]
According to this configuration, since the conversion table specifies whether to perform addition or subtraction by a microinstruction, the same microprogram can be shared in two ways, so that the total capacity of the microprogram can be further reduced, The hardware scale can be reduced, and the cost can be reduced.
The second control storage unit further stores a micro instruction execution result storage location along with the micro instruction output in the register while the first program counter outputs the first read address in the predetermined address sequence. May be output to the plurality of execution units, and each execution unit may store the execution result according to the storage location information.
[0126]
According to this configuration, since the storage destination information can be specified separately from the microprogram in the instruction register group, the microprogram can be shared in different processing, for example, partial processing of matrix operation. As a result, the total capacity of the microprogram can be further reduced, and the hardware scale can be reduced, and the cost can be reduced.
[Brief description of the drawings]
FIG. 1 is an explanatory diagram of a decoding process by a video / audio decoder according to a first conventional technique.
FIG. 2 is an explanatory diagram of a decoding process by a two-chip decoder according to a second conventional technique.
FIG. 3 is a block diagram illustrating a schematic configuration of an image processing apparatus according to the first embodiment of the present invention.
FIG. 4 is a block diagram illustrating a configuration of an image processing apparatus according to the first embodiment of the present invention.
FIG. 5 is a diagram showing an MPEG stream in a hierarchical manner and showing operation timing of each section of the image processing apparatus.
FIG. 6 is a diagram illustrating an analysis of a macroblock header by a processor 7 and control contents for other units.
FIG. 7 is a block diagram illustrating a configuration of a pixel operation unit 10.
FIG. 8 shows an example of a microprogram stored in a first instruction memory 506 and a second instruction memory 507.
FIG. 9 is a diagram showing operation timings of the pixel operation unit 10;
FIG. 10 is a block diagram showing a detailed configuration of a pixel read / write unit 11;
FIG. 11 is a block diagram showing a configuration of an IO processor 5.
FIG. 12 is a block diagram showing a detailed configuration example of an instruction reading circuit 53.
FIG. 13 is a time chart showing the operation timing of the IO processor 5;
FIG. 14 is a block diagram illustrating a configuration of a task management unit.
FIG. 15 is an explanatory diagram showing a decoding operation after the FIFO memory 4;
FIG. 16 is a block diagram illustrating a configuration of an image processing apparatus according to a second embodiment of the present invention.
FIG. 17 is a block diagram illustrating a configuration of a pixel operation unit 30.
FIG. 18 shows an example of contents stored in a first instruction memory 506 and a second instruction memory 507.
FIG. 19 is a block diagram illustrating a configuration of a code conversion unit 9;
FIG. 20 shows a block storage area for storing 8 × 8 spatial frequency data and a zigzag scan route.
FIG. 21 shows a block storage area for storing 8 × 8 spatial frequency data and a route of an alternate scan.
FIG. 22 shows an example of the contents stored in the instruction memory 506 and the instruction memory 507 when the instruction pointer holding units 308a to 308, the instruction register unit 309, and the distribution unit 310 are not provided.
FIG. 23 shows an example of storage contents of the instruction pointer holding units 308a to 308c and the instruction register unit 309.
[Explanation of symbols]
1 Stream input section
2 Buffer memory
3 External memory
4 FIFO memory
5 I / O processor
5a DMAC
6 Memory controller
7 processor
8 Internal memory
9 Code converter
10. Pixel operation unit
12 Video output section
13 Audio output unit
14 Host I / F section
1000 video / audio processing device
1001 Input / output processing unit
1002 decode processing unit
1003 sequential processing unit
1004 Standard processing unit

Claims

A video processing apparatus that outputs data including video data generated by decoding a data stream including compressed video data input from the outside to the outside,
A processor that performs an input / output process that is affected by external factors, including a process of externally inputting the data stream and a process of outputting data including the generated video data to the outside,
Independently of the processor, in parallel with decoding processing means for generating data including video data by decoding a data stream supplied from the processor,
The processor comprises:
An instruction memory for storing a plurality of task programs representing the contents of the input / output processing;
A program counter unit having a program counter corresponding to each of the plurality of task programs;
An instruction fetch unit that fetches an instruction from the instruction memory based on an instruction address indicated by one program counter;
An instruction execution unit that executes the instruction fetched by the instruction fetch unit;
A task control unit that controls the instruction fetch unit to sequentially switch a program counter to be used every time a predetermined number of instruction cycles elapses,
The processor executes the plurality of task programs and a register set while switching the set,
The processor further executes a task program for analyzing the MPEG data to extract individual elementary streams,
The decoding processing means,
Routine processing mainly including routine operations, in which variable length decoding of header information added to each block in the data stream and compressed video data in the data stream, and compression of the variable length decoded video data in block units are performed. A standard processing means for performing a process of decoding
It is a sequential processing mainly based on a condition judgment, and comprises a sequential processing means for performing a time-divisional analysis processing of header information subjected to variable length decoding processing and a decoding processing of compressed audio data in a data stream,
A video processing device for decoding each of the extracted elementary streams.

The processor inputs the data stream from the outside, stores the data stream in the memory, reads the data stream stored in the memory, and supplies the data stream to the decoding processing unit according to the progress of the decoding operation of the decoding processing unit,
The decoding processing means stores data including the generated video data in the memory,
The video processing device according to claim 1, wherein the processor outputs the video data stored in the memory to the outside.

The routine processing means performs a variable length decoding process on the header information and the compressed video data, and after the variable length decoding process on the block of the compressed video data ends, analyzes the header information of the next block to the sequential processing unit. To start,
When the header information to be analyzed is variable-length coded, the sequential processing means instructs the fixed-form processing means to perform variable-length decoding of the header information, and obtains the variable-length-decoded header information. After the analysis and the acquisition of the header information of the block are completed, the fixed-length processing unit starts variable-length decoding of the compressed video data of the block, and uses the analyzed header information to perform the variable-length decoding. The video processing apparatus according to claim 2 , wherein an instruction is given to decode the compressed video data.

The routine processing means,
Data conversion means for performing variable length decoding of the compressed video data in the data stream according to the instruction of the sequential processing means,
Arithmetic means for performing inverse quantization and inverse discrete cosine transform by performing a predetermined operation on block data obtained by variable-length decoding,
4. The image processing apparatus according to claim 3 , further comprising: synthesizing means for synthesizing the block data after the inverse discrete cosine transform and the rectangular image of the decoded frame stored in the memory to restore video data corresponding to the block. The video processing device according to the above.

The arithmetic unit further includes a first buffer having a storage area corresponding to one block,
The data conversion means,
A variable-length decoding unit for performing variable-length decoding of the compressed video data in the data stream;
First address table means for storing a first address sequence in which addresses of the storage area of the first buffer are arranged in zigzag scan order;
Second address table means for storing a second address sequence in which addresses of the storage area of the first buffer are arranged in an alternate scan order;
5. The writing device according to claim 4 , further comprising a writing unit that writes block data obtained by the variable length decoding of the variable length decoding unit to the first buffer according to one of the first address sequence and the second address sequence. 6. Video processing device.

The writing means,
Table address generating means for sequentially generating table addresses for the first address table means and the second address table means;
Of the addresses of the first address string and the second address string respectively output from the first table means and the second table means to which the table address is input,
Address selection means for selecting one,
6. The video processing apparatus according to claim 5 , further comprising address output means for outputting the selected address to said first buffer.

The processor comprises:
An input means for inputting an asynchronous data stream from outside;
Video output means for outputting decoded video data to an external display device,
Audio output means for outputting decoded audio data to an external audio output device;
A processor that executes the first to fourth tasks stored in the instruction memory while switching the tasks,
The first task is a program for transferring a data stream from an input unit to the memory,
The second task is a program for supplying a data stream from the memory to a decoding processing unit,
The third task is a program that outputs decoded video data from the memory to a video output unit,
3. The video processing device according to claim 2, wherein the fourth task is a program that outputs decoded audio data from the memory to an audio output unit.

The video processing device according to claim 7 , wherein the program counter unit has at least four program counters corresponding to the first to fourth tasks.

The processor further includes:
A register unit having at least four register sets corresponding to the first to fourth tasks,
9. The video processing device according to claim 8 , wherein the task control unit switches a register set to be used by the instruction execution unit simultaneously with switching of a program counter.

The task control unit includes:
A counter for counting the number of instruction cycles according to a clock signal each time the program counter is switched;
10. The video processing device according to claim 9 , further comprising: a switching instruction unit configured to control the instruction fetch unit to switch a program counter when the count value of the counter reaches the predetermined number.

A video and audio processing device that inputs, decodes, and outputs a data stream including compressed audio data and compressed video data,
A processor that performs input / output processing for storing a data stream that is asynchronously input due to external factors in a memory,
A fixed-length process mainly including a fixed-size operation, the variable-length decoding process of the header information added to the block unit in the data stream stored in the memory and the compressed video data in the data stream, and the variable-length decoded compression Standard processing means for performing processing of decoding video data in block units;
Sequential processing means for performing sequential processing mainly on condition determination, and performing time-divisional analysis processing of header information subjected to variable-length decoding processing and decoding processing of compressed audio data in a data stream stored in a memory; With
The audio data decoded by the sequential processing means, and the video data decoded by the standard processing means are stored in the memory,
The output processing is further decoded audio data, reads the video data from the memory, respectively, include an external display device, the output process tailored to each output rate of the audio output device,
The processor comprises:
An instruction memory for storing a plurality of task programs representing the contents of the input / output processing;
A program counter unit having a program counter corresponding to each of the task programs,
An instruction fetch unit that fetches an instruction from the instruction memory based on an instruction address indicated by one program counter;
An instruction execution unit that executes the instruction fetched by the instruction fetch unit;
A task control unit that controls the instruction fetch unit to sequentially switch a program counter to be used every time a predetermined number of instruction cycles elapses,
The processor executes the plurality of task programs and a register set while switching the set,
The processor further executes a task program for analyzing the MPEG data to extract individual elementary streams,
The video / audio processing apparatus, wherein the decoding processing means decodes each of the extracted elementary streams.

The routine processing means performs a variable length decoding process on the header information and the compressed video data, and after the variable length decoding process on the block of the compressed video data ends, analyzes the header information of the next block to the sequential processing unit. To start,
When the header information to be analyzed is variable-length coded, the sequential processing means instructs the fixed-form processing means to perform variable-length decoding of the header information, and obtains the variable-length-decoded header information. After the analysis and the acquisition of the header information of the block are completed, the fixed-length processing unit starts variable-length decoding of the compressed video data of the block, and uses the analyzed header information to perform the variable-length decoding. The video / audio processing apparatus according to claim 11 , wherein an instruction is given to decode the compressed video data.

The routine processing means,
Data conversion means for performing variable length decoding of the compressed video data in the data stream according to the instruction of the sequential processing means,
Arithmetic means for performing inverse quantization and inverse discrete cosine transform by performing a predetermined operation on block data obtained by variable-length decoding,
13. The image processing apparatus according to claim 12 , further comprising: synthesizing means for restoring video data corresponding to the block by synthesizing the block data after the inverse discrete cosine transform and the rectangular image of the decoded frame stored in the memory. The video / audio processing apparatus according to the above.

The arithmetic means further includes a first buffer having a storage area corresponding to one block,
The data conversion means,
A variable-length decoding unit for performing variable-length decoding of the compressed video data in the data stream;
First address table means for storing a first address sequence in which addresses of the storage area of the first buffer are arranged in zigzag scan order;
Second address table means for storing a second address sequence in which addresses of the storage area of the first buffer are arranged in an alternate scan order;
14. The writing device according to claim 13 , further comprising: a writing unit that writes block data obtained by variable length decoding of the variable length decoding unit into the first buffer according to one of a first address sequence and a second address sequence. Video and audio processing device.

The writing means,
Table address generating means for sequentially generating table addresses for the first address table means and the second address table means;
Address selecting means for selecting one of an address of the first address string and an address of the second address string output from the first table means and the second table means to which the table address is input, respectively;
15. The video / audio processing apparatus according to claim 14 , further comprising address output means for outputting the selected address to said first buffer.

The processor comprises:
An input means for inputting an asynchronous data stream from outside;
Video output means for outputting decoded video data to an external display device,
Audio output means for outputting decoded audio data to an external audio output device;
A processor that executes the first to fourth tasks stored in the instruction memory while switching the tasks,
The first task is a program for transferring a data stream from an input unit to the memory,
The second task is a program for supplying a data stream from the memory to the decoding processing means,
The third task is a program that outputs decoded video data from the memory to a video output unit,
12. The video / audio processing apparatus according to claim 11 , wherein the fourth task is a program for outputting decoded audio data from the memory to an audio output unit.

17. The video and audio processing device according to claim 16 , wherein the program counter unit has at least four program counters corresponding to the first to fourth tasks.

The processor further includes a register unit having at least four register sets corresponding to the first to fourth tasks,
18. The video / audio processing apparatus according to claim 17 , wherein the task control unit switches a register set to be used by the instruction execution unit simultaneously with switching of a program counter.

The task control unit includes:
A counter for counting the number of instruction cycles according to a clock signal each time the program counter is switched;
19. The video / audio processing apparatus according to claim 18 , further comprising: a switching instruction unit configured to control the instruction fetch unit to switch a program counter when a count value of the counter reaches the predetermined number. .

The routine processing means,
Data conversion means for performing variable length decoding of the compressed video data in the data stream according to the instruction of the sequential processing means,
Arithmetic means for performing inverse quantization and inverse discrete cosine transform by performing a predetermined operation on block data obtained by variable-length decoding,
19. The image processing apparatus according to claim 18 , further comprising: synthesizing means for synthesizing the block data after the inverse discrete cosine transform and a rectangular image of the decoded frame stored in the memory to restore video data corresponding to the block. The video / audio processing apparatus according to the above.

The arithmetic means further includes a first buffer having a storage area corresponding to one block,
The data conversion means,
A variable-length decoding unit for performing variable-length decoding of the compressed video data in the data stream;
First address table means for storing a first address sequence in which addresses of the storage area of the first buffer are arranged in zigzag scan order;
Second address table means for storing a second address sequence in which addresses of the storage area of the first buffer are arranged in an alternate scan order;
21. The writing device according to claim 20 , further comprising: a writing unit that writes block data obtained by variable-length decoding of the variable-length decoding unit into the first buffer according to one of a first address sequence and a second address sequence. Video and audio processing device.

The writing means,
Table address generating means for sequentially generating table addresses for the first address table means and the second address table means;
Address selecting means for selecting one of an address of the first address string and an address of the second address string output from the first table means and the second table means to which the table address is input, respectively;
22. The video / audio processing apparatus according to claim 21 , further comprising address output means for outputting the selected address to said first buffer.

The analysis means calculates a quantization scale and a motion vector based on the header information,
21. The video and audio processing apparatus according to claim 20 , wherein the notifying unit notifies the calculating unit of a quantization scale and the synthesizing unit of a motion vector.

The calculating means includes:
A first control storage unit and a second control storage unit each storing a microprogram;
A first program counter for designating a first read address in the first control storage unit;
A second program counter for designating a second read address;
A selector for selecting one of a first read address and a second read address and outputting the selected one to the second control storage unit;
An execution unit having a multiplier and an adder, and having a block unit for performing inverse quantization and inverse discrete cosine transform by microprogram control by the first control storage unit and the second control storage unit. 24. The video / audio processing apparatus according to claim 23 , wherein:

When the second read address is selected by the selector, the execution unit performs processing using a multiplier and processing using an adder independently and in parallel, and the first read address is selected by the selector. 25. The video / audio processing apparatus according to claim 24 , wherein the processing using the multiplier and the processing using the adder are performed in conjunction with each other.

The calculating means further comprises:
A first buffer for holding a video block from the data conversion means,
A second buffer for holding the block subjected to the inverse discrete cosine transform by the execution unit,
The first control storage unit stores a microprogram for performing an inverse quantization process and a microprogram for performing an inverse discrete cosine transform,
The second control storage unit stores a microprogram for performing inverse discrete cosine transform and a microprogram for transferring the video block subjected to inverse discrete cosine transform to the second buffer,
The execution means executes a process of transferring the inverse discrete cosine transformed video block to the second buffer and a process of inversely quantizing a next video block in parallel, and executes the inversely quantized video block. 26. The video / audio processing apparatus according to claim 25 , wherein the process of performing the inverse discrete cosine transform is performed in conjunction with the multiplier and the adder.

The synthesizing unit further generates a difference block representing a difference image from the video data to be compressed,
The second buffer holds a further generated difference image,
The first control storage unit further stores a microprogram for performing a discrete cosine transform and a microprogram for performing a quantization process,
The second control storage unit further stores a microprogram for performing discrete cosine conversion and a microprogram for transferring the video block subjected to discrete cosine conversion to the first buffer,
The execution means further executes discrete cosine transform and quantization on the difference block held in the second buffer, and transfers the result to the first buffer.
The data conversion unit further performs variable length coding on the block of the first buffer,
27. The video / audio processing apparatus according to claim 26 , wherein the sequential processing unit further adds header information to a predetermined block that has been subjected to variable-length coding by the data conversion unit.

The calculating means includes:
A first control storage unit and a second control storage unit each storing a microprogram;
A first program counter for designating a first read address in the first control storage unit;
A second program counter for designating a second read address;
A selector for selecting one of a first read address and a second read address and outputting the selected one to the second control storage unit;
A plurality of execution units each including a multiplier and an adder, and performing inverse quantization and inverse discrete cosine transform in block units by microprogram control by the first control storage unit and the second control storage unit; Prepare,
24. The video / audio processing apparatus according to claim 23 , wherein each of the plurality of execution units shares and processes a partial block obtained by dividing the block.

The calculating means further comprises:
A plurality of address conversion tables provided corresponding to each of the plurality of execution units, each conversion table holding a conversion address in which the address order is partially changed corresponding to a predetermined address sequence;
An instruction register group consisting of a plurality of registers for storing individual microinstructions constituting a microprogram for realizing a predetermined operation in association with a translation address;
A microinstruction provided between the first control storage unit and the second control storage unit and the plurality of execution units, and output from the first control storage unit or the selector to each of the plurality of execution units. A switching unit that switches to a microinstruction in an instruction register and outputs the microinstruction to the plurality of execution units,
When the first read address or the second read address is an address in the predetermined address string, the address is converted into a conversion address by each of the plurality of address conversion tables, and the instruction register group is a conversion table. 29. The video / audio processing apparatus according to claim 28 , wherein micro-instructions corresponding to the respective conversion addresses output from are output.

Each of the conversion tables may further include whether to add or subtract while the first program counter outputs the first read address in the predetermined address string, with a microinstruction output indicating addition and subtraction in the register. Is output to the plurality of execution units,
Each of the plurality of execution units performs addition and subtraction according to the flag,
The video / audio processing apparatus according to claim 29 , wherein the flag is set in accordance with a micro instruction of the second control storage unit.

The second control storage unit is further configured to store a microinstruction execution result in accordance with a microinstruction output in the register while the first program counter outputs a first read address in the predetermined address string. Output to the plurality of execution units,
30. The video / audio processing apparatus according to claim 29 , wherein each of the plurality of execution units stores an execution result according to storage destination information.