JP4290775B2

JP4290775B2 - Video data processing method and apparatus

Info

Publication number: JP4290775B2
Application number: JP23539897A
Authority: JP
Inventors: クリフ・リーダー; ジャエ・チォル・ソン; アムジャド・クレシ; ル・ングイェン
Original assignee: Samsung Electronics Co Ltd
Current assignee: Samsung Electronics Co Ltd
Priority date: 1996-08-19
Filing date: 1997-08-15
Publication date: 2009-07-08
Anticipated expiration: 2017-08-15
Also published as: CN1189058A; KR100262453B1; CN1523895A; KR19980018215A; TW436710B; JPH1093961A; DE19735880A1; CN1145362C

Description

【０００１】
【発明の属する技術分野】
本発明は、コンピュータによるデータ処理に関し、特にコンピュータによるビデオデータ処理に関する。
【０００２】
【従来の技術】
通常、コンピュータはシステムデータを圧縮したり復元するために使用されて来た。システムデータには、停止及び／または動画像のイメージを含むビデオデータが含まれる。また、システムデータには、オーディオデータ、例えば動画像のサウンドトラックが含まれる。ビデオデータの高速処理が可能な方法及び回路を提供することが好ましい。
【０００３】
【発明が解決しようとする課題】
従って、本発明の目的は、ビデオデータの高速処理ができる方法及び回路を提供することにある。
【０００４】
【課題を解決するための手段】
いくつかの実施例において、本発明によるコンピュータシステムは、同時に動作可能な３個の処理器、すなわちスカラー処理器、ベクトル処理器及びビットストリーム処理器を含む。ビデオデータをエンコーディングまたはデコーディングすることにおいて、ベクトル処理器は、単一命令多重データ(Single Instruction Multiple Data：SIMD）処理器により効率的に行われる動作を遂行する。このような動作としては、１）離散余弦変換(Discrete Cosine Transform：DCT）のような線形データ変換、２）モーション補償などがある。ビットストリーム処理器は、ワードまたは半ワード(half-words)より特定のビット上における動作を含む動作を遂行する。このような動作としては、例えば、ＭＰＥＧ−１、ＭＰＥＧ−２、Ｈ．２６１、Ｈ．２６３に使われるハフマン(huffman)及びＲＬＣエンコーディングとデコーディングなどがある。スカラー処理器は、ハイレベルビデオ処理（例えば、ピクチャーレベル処理）を遂行し、ベクトル及びビットストリーム処理器の動作を同期化させ、外部装置のインターフェースを制御する。
いつくかの実施例において、コンピュータシステムは、多数のデータストリームを同時に処理することができる。その結果、コンピュータシステムの使用者は、２個以上の会合や画像会議も可能である。ビットストリーム処理器では、多様なビットストリームが実時間的に同時にエンコーディングまたはデコーディングされるように文脈(contexts)を切換することができるので、多重データストリームの同時処理が可能になる。
【０００５】
いくつかの実施例において、スカラー及びベクトル処理器は、各処理器が単一算術命令またはブール(boolean)命令を遂行するようにプログラムされ得る点から見るとプログラム可能である。ビットストリーム処理器は単一算術命令またはブール(boolean)命令を遂行するようにプログラムされ得ない点から見ると、プログラム不能である。むしろ、ビットストリーム処理器は、１セットのビデオデータに対し、全体的なビデオデータ処理動作を遂行するようにプログラムされ得る。ビットストリーム処理器が、単一算術命令またはブール命令を遂行するためにプログラムされないようにすることで、ビットストリーム処理器が高速で動作することができる。スカラー及びベクトル処理器がプログラム可能にすることにより、ビデオデータエンコーディング及びデコーディングの標準から変更されたシステムを採択することが容易である。
【０００６】
【発明の実施の形態】
図１は、マルチメディア処理器１１０を含むメディアカード１００を示している。この実施の形態において、マルチメディア処理器１１０は、その仕様がカリフォルニアサンホセに常住する三星半導体株式会社で製作されるタイプＭＳＰ−１ＥＸ（商標名）処理器である。処理器ＭＳＰ−１ＥＸは下記の付録Ａに記述されている。
処理器１１０は、ローカルバス１０５を通してホストコンピュータシステム（図示せず）と通信する。いくつかの実施例において、バス１０５は、３２ビット、３３ＭＨｚＰＣＩバスである。処理器１１０から出力されるデジタルビデオデータは、Ｄ／Ａ（デジタル／アナログ）変換器１１２に結合される。ビデオ部分だけでなく、デジタルビデオデータは、オーディオ部分、例えば映画のサウンドトラックを含むことができる。変換器１１２の出力は、アナログデータを処理するＴＶセット（図示せず）または他のシステムに結合され得る。いくつかの実施例において、処理器１１０はＡ／Ｄ（アナログ／デジタル）変換器（図４から図６参照）から出力されるデジタルビデオデータを受信するための入力ポートを含む。
【０００７】
処理器１１０は、コーデック（ＣＯＤＥＣ）１１４に連結される。ＣＯＤＥＣ１１４は、デープレコーダ（図示されていない）または他の装置からアナログオーディオデータを受信する。ＣＯＤＥＣ１１４は、電話線（図示されていない）からアナログ電話データを受信する。ＣＯＤＥＣ１１４は、アナログデータをデジタル化してから、これを処理器１１０へ伝送する。ＣＯＤＥＣ１１４は、処理器１１０からデジタルデータを受信し、このデータをアナログ形態に変換し、必要によってこのアナログデータを伝送する。
処理器１１０はバス１２２によりメモリ１２０に連結される。図１において、メモリ１２０はＳＤＲＡＭ(synchronous DRAM）であり、バス１２２は６４ビット、８０ＭＨｚバスである。他の実施例では、他のメモリ、バス幅、及びバス速度が使用される。非同期メモリ及びバスがいくつかの実施例に使用される。
カード１００のいくつかの実施例は、Ｌｅ．Ｎｇｕｙｅｎを出願人とし、本出願と同日付で出願された“Multiprocessor Operation in a Multimedia Signal Processor”という発明の名称を有する、米合衆国特許出願明細書（弁理士参照番号：Ｍ−４３６４ＵＳ）に記載されており、その全体的な内容は、本発明で参照として引用される。
【０００８】
図２は、処理器１１０の一実施例によるブロック図である。処理器１１０は、スカラー処理器２１０、ベクトル処理器(VP)２２０及びビットストリーム処理器(BP)２４５を含む。いくつかの実施例において、処理器２１０は４０ＭＨｚで動作し、公知の標準ＡＲＭ７命令語セットを支援する３２ビットＲＩＳＣ処理器である。ベクトル処理器２２０は８０ＭＨｚで動作し、２８８ビットベクトルレジスタを備えた単一命令多重データ（ＳＩＭＤ）処理器である。ＶＰ２２０の一実施例は、Ｓｏｎｇらを出願人とし、本出願と同日付で出願された“Efficient Context Saving and Restoring in a Multitasking Computing System Environment”という発明の名称を有する、米合衆国特許出願明細書（弁理士参照番号：Ｍ−４３６５ＵＳ）に記載されており、その全体的な内容は、本発明で参照として引用される。処理器２１０、２２０は、単一算術命令またはブール命令または、これらの命令のシーケンスを遂行するようにプログラムできる。
【０００９】
いくつかの実施例において、ビデオデータ処理を高速で行うためにビットストリーム処理器２４５は、単一算術命令またはブール命令を遂行するためにプログラムされないように設計される。特に、ＢＰ２４５はＡＤＤ、ＯＲ、“ＡＤＤＡＮＤＡＣＣＵＭＵＬＡＴＥ”等のような単一命令を遂行するようにプログラムされ得ない。かえって、ＢＰ２４５は付録Ａの１０章に記述されているビデオデータ処理動作を遂行するようにプログラムされる。これと同時に、スカラー処理器２１０とベクトル処理器２２０は、単一算術またはブール命令を遂行するようにプログラムされ得る。従って、処理器１１０は、ビデオ標準から変形を図ることができる。
【００１０】
図２の図示のとおり、スカラー処理器２１０とベクトル処理器２２０は、キャッシュサブシステム２３０に連結される。キャッシュサブシステム２３０は、バス（IOBUS；240)とバス(FBUS;250)に連結される。いくつかの実施例において、ＩＯＢＵＳ２４０は３２ビット、４０ＭＨｚバスであり、ＦＢＵＳ２５０は６４ビット、８０ＭＨｚである。
ＩＯＢＵＳ２４０は、ビットストリーム処理器２４５、インタラプトコントローラー２４８、全二重通信(full-duplex)ＵＡＲＴユニット２４３と、４個のタイマー２４２に連結される。ＦＢＵＳ２５０は、メモリバス１２２(図１参照）に連結されたメモリコントローラー２５８に連結される。ＦＢＵＳ２５０は、ＰＣＩバス１０５に連結されたＰＣＩバスインターフェース回路２５５に連結される。また、ＦＢＵＳ２５０は、ビデオＤ／Ａ１１２（図１参照）、ＣＯＤＥＣ１１４と場合によってビデオＡ／Ｄ変換器（図４から図６の図示と同様である）をインターフェイスする回路を含む、装置インターフェース回路２５２（“Customer ASIC”とも呼ばれる）に連結される。また、処理器１１０はメモリデータ移動器２９０を含む。
【００１１】
処理器１１０は、多数個のデータストリームが同時に処理可能である。例えば、処理器１１０の使用者が２個以上の会合と画像会議をする場合、処理器１１０は使用者が多数個の会合が視聴できるように、ビデオ及びオーディオ処理を遂行する。多重ビデオデータストリームを処理するために、処理器１１０は文脈切換を支援する。これはＢＰ２４５が多重データストリームの間を切換することを意味する。画像会議において、各データストリームは、遠く離れている別個の会合から送られることもできる。その代案として、使用者が映像会議に参加し、同時に画像会議または映画上映を視聴することができるように、付加的なデータストリームが映画チャンネルから送られることができる。文脈切換は、実施例の１０．１２節に記述されている。文脈が切り換えられると、スカラー処理器２１０は現在の文脈を貯蔵し、他の文脈を処理するためにＢＰ２４５を初期化させる。
【００１２】
ＢＰ２４５は、下記のようなビデオデータフォーマットすなわち、
１．ＩＳＯ／ＩＥＣ標準１１１７２（１９９２年）に記述されているＭＰＥＧ−１；
２．文書ＩＳＯ／ＩＥＣＪＴＣ１／ＳＣ２９Ｎ０９８１Ｒｅｖ（１９９５年３月３１日）に記述されるいるＭＰＥＧ−２；
３．“ＩＴＵ−Ｔ勧告Ｈ．２６１”（１９９３年３月）に記述されているＨ．２６１；及び
４．“ドラフトＩＴＵ−Ｔ勧告Ｈ．２６３”（１９９６年５月２日）に記述されているＨ．２６３
を処理することができる。
【００１３】
ビデオデータは、スカラー処理器２１０、ベクトル処理器２２０及びビットストリーム処理器２４５に分けられて処理されることによって、高速処理が実現される。さらに詳しくには、ベクトル処理器２２０は、線形変換（ＤＣＴまたは逆ＤＣＴ）とモーション補償を遂行する。このような動作は、ベクトル処理器に適する。なぜなら、これらの動作は、時々にデータのいろんな部分に対して遂行される同一の命令を必要とするからである。ビットストリーム処理器２４５は、ハフマンデコーディング及びエンコーディングとジグザグビットストリーム処理を遂行する。スカラー処理器２１０は、ビデオ及びオーディオ逆多重化と同期化及びＩ／Ｏインターフェーシング作業を遂行する。
エンコーディング及びデコーディング動作の例は、実施例１の１０．６．１及び１０．６．２節に現れている。エンコーディング動作において、圧縮されないデジタルデータが、バス１０５を通してフレームメモリ１２０またはホストシステム（図示せず）から到着する。いくつかの実施例において、装置インターフェース回路２５２は、ビデオＡ／Ｄ変換器を含み、圧縮されていないデータが変換器から到着する。ベクトル処理器２２０は、量子化、ＤＣＴ及びモーション補償を遂行する。ビットストリーム処理器２４５は、ＶＰ２２０の出力を受信し、ＧＯＢ(Group of Blocks)及びスライスを生成する。特に、ＢＰ２４５はハフマン及びＲＬＣエンコーディングとジグザグビットストリーム処理を遂行する。スカラー処理器２１０は、ＢＰ２４５の出力を受信し、ピクチャー階層符号化(picture layer coding)、ＧＯＰ(group of pictures)符号化及び、シーケンス階層符号化を遂行する。その後、スカラー処理器２１０は、オーディオ及びビデオデータを多重化し、符号化されたデータをバス（１０５または１２２）を通して、貯蔵装置またはネットワークに伝送する。ネットワークへの伝送は、いくつかの実施例のネットワークに連結された装置インターフェース回路２５２への伝送を含む。
【００１４】
デコーディングにおいて、処理は逆に遂行される。スカラー処理器２１０は、システムデータをビデオ及びオーディオ成分に逆多重化し、ビデオデータのシーケンス階層、ＧＯＰ及びピクチャー階層デコーディングを遂行する。その結果、生成されたＧＯＢまたはスライスは、ビットストリーム処理器２４５に供給される。処理器２４５はジグザグ処理とハフマン及びＲＬＣデコーディングを遂行する。ＶＰ２２０は、ＢＰ２４５の出力を受信し逆量子化、ＩＤＣＴ及びモーション補償を遂行する。ＶＰ２２０は、必要なら（例えば、ピクチャーイメージのエッジを平坦化させる場合）、任意の前処理を遂行し、復元されたデジタルピクチャーを装置インターフェース回路２５２または貯蔵装置に供給する。スカラー処理器２１０、ベクトル処理器２２０とビットストリーム処理器２４５は、多くのブロックのデータに対して並列に動作することができる。
スカラー処理器２１０がピクチャー層及び上位層を処理することにより、処理器内部の通信を減少させる。これは、ピクチャー層及び上位層が、制御及びＩ／Ｏ機能のためにスカラー処理器２１０では使用されるが、ベクトル処理器２２０及びビットストリーム処理器２４５では使用されない情報を含んでいるからである。このような情報の例としては、フレームを装置インターフェース回路２５２に伝送するために、スカラー処理器２１０で使用されるフレームレートが挙げられる。
【００１５】
図３は、ビットストリーム処理器２４５の一実施例によるブロック図である。図３に示された信号は、実施例１の１０．５節に記述されている。この信号は、ビットストリーム処理器２４５とＩＯＢＵＳ２４０（図２参照）との間のインターフェースを提供する。ＢＰ２４５において、これらの信号は、ＳＲＡＭ３２０を含むＩＯＢＵＳインターフェースユニット３１０により処理される。また、ＢＰ２４５は、ＶＬＣＦＩＦＯユニット３３０、ＶＬＣＬＵＴＲＯＭ３４０、制御ステートマシン３５０と、レジスタファイルとＳＲＡＭを含むＢＰコアユニット３６０を含む。図３のブロックは、実施例１の１０．４節に記述されている。ＲＯＭ３４０は、４個の標準、すなわちＭＰＥＧ−１、ＭＰＥＧ−２、Ｈ．２６１及びＨ．２６３に対しハフマンエンコーディング及びデコーディング時に使用されるルックアップテーブルを含む。テーブルに貯蔵される情報量が膨大であるにも拘わらず、ＲＯＭ３４０は７６８＊１２ビットの小さいサイズを有する。小さいサイズはテーブルを共有し、実施例１の４節に記述されているような他の技術により実現される。
【００１６】
本発明を特定の好ましい実施例に関連し図示し説明したが、本発明はそれに限定されず、特許請求の範囲により備えられる精神や分野から離脱しない限度内で、本発明が多様に改造及び変化され得ることが、当業界で通常の知識を有する者なら容易に分かる。特に、本発明は任意の回路、クロックレートまたはこれらの実施例のタイミングにより限定されるものでない。
【００１７】
【実施例１】
ＭＳＰ−１ＥＸシステム仕様
第１章技術的な概要
本章は、ハードウェア及びソフトウェア設計者が見せあげるマルチメディア信号処理器（“ＭＳＰ−ｘ”）の技術的な概要を説明する。
１．１機能
マルチメディア信号処理器（ＭＳＰ−ｘ）は、パーソナルコンピュータ及び注文者製品応用のための広範囲な集積機能を提供するために、一群の単一チップＶＬＳＩ装置を形成する。
ＭＳＰ群は、最適の費用／性能ために、計算に対する単一命令多重データ（ＳＩＭＤ）モデルを適用する強力なベクトル処理器の構造に基づいたものである。その特性は下記のとおりである。
＊完全なプログラム可能性
＊ＡＲＭ命令語セット構造に基づく。
＊集積された４０ＭＨｚＡＲＭ７ＲＩＳＣＣＰＵコア
＊高性能デジタル信号処理のための８０ＭＨｚベクトル処理器
＊９ビット整数ＡＬＵ動作のための２．５６Ｇｏｐｓ
＊１６ビット整数掛け算−累積動作のための２．５６Ｇｏｐｓ
＊３２ビットＩＥＥＥ浮動小数点加算のための６４０Ｍｆｏｌｐｓ
＊３２ビットＩＥＥＥ浮動小数点掛け算＆加算のための１２８０Ｍｆｌｏｐ
＊選択的な注文型化またはグラフィックス機能のための未使用の１０Ｋｇａｔｅｓ
＊０．６５μｍ３．３ｖ／５ｖＣＭＯＳ技術に基づく。
＊１２８ピン−１２８ピンパッケージ
ＭＳＰは初期に４個の主要機能を支援する。
＊ビデオ
＊オーディオ／サウンド
＊遠距離通信
＊２Ｄ／３Ｄグラフィックス（選択）
１．１．１ビデオ
＊全機能がファームウェアでプログラム可能である。
＊実時間ＭＰＥＧ−１デコーディング及びエンコーディング
＊実時間ＭＰＥＧ−２デコーディング
＊ほぼ実時間的なＭＰＥＧ−２エンコーディング
＊実時間Ｈ．３２４デコーディング及びエンコーディング
＊任意のスクリーンサイズまたは解像度に対するイメージスケーリング
＊ＲＧＢとＹＵＶ間の色空間変換
＊ピクチャー輪郭強調及び雑音減少のためのイメージフィルタリング
＊４／３フールダウン変換
１．１．２オーディオ／サウンド
＊全機能がファームウェアでプログラム可能である。
＊実時間ＭＰＥＧ−１オーディオデコーディング及びエンコーディング
＊実時間ＭＰＥＧ−２オーディオデコーディング及びエンコーディング
＊実時間Ｈ．３２０及びＨ．３２４オーディオデコーディング及びエンコーディング
＊実時間Ｇ．７２８及びＧ．７２３音声コーディング
＊実時間サウンドブラスターエミュレーション
＊ウェーブテーブル合成
＊ＦＭ合成
１．１．３遠距離通信
１．１．３．１モデム
＊標準非同期ＣＯＭポートインターフェース（ＮＳ１６５５０ＡＵＡＲＴ互換可能）
＊２８．８Ｋから２．４ＫｂｐｓまでのＶ．３４
＊４８００、９６００無符号化及び９６００ｂｐｓトレリス符号化に対するデータレートを有するＣＣＩＴＴ−Ｖ．３２ｂｉｓ
＊ＨａｙｅｓＡＴ命令語セットの互換性
＊呼出進行モニタ
＊Ｖ．２５ｂｉｓオートダイアル
＊ＤＴＭＦ及びパルスダイアリング
＊非同期エラー復旧プロトコル
＊Ｖ．４２エラー訂正
１．１．３．２ファクシミリ
＊９６００ｂｐｓまたは７２００ｂｐｓのＶ．２９
＊４８００ｂｐｓまたは２４００ｂｐｓのＶ．２７
＊呼出進行モニタ
＊ＤＴＭＦ及びパルスダイアリング
＊Ｇ３トランスファーら
＊Ｔ．４／Ｔ．３０動作
１．１．３．３電話応答
＊電話機セットまたはマイクロフォンを通して挨拶の言葉録音
＊受信された電話に対し自動応答し、予め録音されたメッセージに応信
＊電話をかけた相手からのメッセージ録音
＊電話をかけた相手が残したメッセージ再生
１．１．４．２Ｄ／３Ｄグラフィックス（選択）
＊ＢＩＴＢＬＴ
＊２Ｄライン＆多角形ドローイング及びシェージング
＊３Ｄポイント、ライン及び三角形に対する幾何学及び採光計算
＊テクスチャーマッピングで３Ｄカラー計算
＊ブレンディング
【００１８】
１．２ハードウェアの構造
１．２．１概要
ＭＳＰ−１マルチメディアコプロセッサ群は、集積度レベル、費用及び性能を含む多様な要求事項を満足させるように設計する。ＭＳＰ−１処理器を含むブロック図は図４の図示のとおりである。
ＭＳＰ−１群は、下記のようなピン−アウトオプションを行う。
＊ＭＳＰ−１は、外部ＳＤＲＡＭを使用せず、エントリ−レベルとして使用されるように設計される。
＊ＭＳＰ−１ＥＸは、外部ＳＤＲＡＭとインターフェーシングのための３２ビットメモリを含む。
＊ＭＳＰ−１Ｆは、外部ＳＤＲＡＭとインターフェーシングのための６４ビットメモリを含む。
＊ＭＳＰ−１Ｇは、集積されたＳＶＧＡコントローラー、高速化した３Ｄグラフィックス加速が加えられたＲＡＭＤＡＣを含む。
図５は、ＭＳＰ−１Ｅ処理器を含むシステムのブロック図である。
１．２．２外部コーデック
図６は、外部コーデックと共にＭＳＰ−１処理器を含むシステムのブロック図である。
【００１９】
１．２．２．１ＭＳＰ−１ＥＸの材料目録
次は、ＭＳＰ−１ＥＸに対して提示された材料目録である。
＊ＭＳＰ−１ＥＸ
＊５１２Ｋ×３２ビット同期ＤＲＡＭ
＊ＮＴＳＣ／ＰＡＬエンコーダー（三星のＫＳ０１１９）
＊オーディオ＆遠距離通信ＣＯＤＥＣ（アナログデバイス社のＡＤ１８４３）
＊その他（キャパシタ、抵抗、増幅器、コネクタ等）
＊プリントされた回路基板
【００２０】
１．３マイクロ構造
１．３．１概要
基本的にＭＳＰマイクロ構造は、非常に強力なＤＳＰコアと、注文者社により規定されたメモリ＆Ｉ／Ｏサブシステムとから構成される（図２参照）。ＤＳＰコアは、下記のことを含む。
＊４０ＭＨｚで動作し、一般的な処理のために使用される３２ビットＡＲＭ７ＲＩＳＣＣＰＵ
＊８０ＭＨｚで動作し、信号処理のために使用されるベクトル処理器
＊８０ＭＨｚで動作し、２ＫＢ命令キャッシュ、５ＫＢデータキャッシュ及び１６ＫＢＲＯＭキャッシュを有する共有されたキャッシュサブシステム。データキャッシュは、ハードウェアまたはソフトウェアにより制御され得る。
＊８０ＭＨｚで動作し、内部の多くのＦＢＵＳ周辺機器とインターフェースする高速６４ビットバス（ＦＢＵＳ）
＊４０ＭＨｚで動作し、内部の多くのＩＯＢＵＳ周辺機器とインターフェースする低速３２ビットバス（ＩＯＢＵＳ）
内部のＦＢＵＳ周辺機器は下記のものを含む。
＊３２ビット３３ＭＨｚＰＣＩバスインターフェース
＊６４ビットＳＤＲＡＭメモリコントローラー
＊８チャンネルＤＭＡコントローラー
＊注文者ＡＳＩＣロジックブロック。注文者ＡＳＩＣロジックブロックは、多様なアナログコーデックに対するインターフェースと、注文者の規定したＩ／Ｏ装置を含む合計１０Ｋｇａｔｅｓを提供する。インターフェースロジックは、三星のＫＳ０１１９ＮＴＳＣエンコーダー及び、アナログデバイス社のＡＤ１８４３コーデックを支援する。
【００２１】
＊ホスト(Pentlure)メモリからＭＳＰローカルＳＤＲＡＭメモリまでのデータをＤＭＡすることに使用されるメモリデータ移動器
＊ビデオビットストリームを処理するビットストリーム処理器
＊１６４５０ＵＡＲＴシリアルライン
＊８２５４−互換可能なタイマー
＊８２５９−互換可能なインタラプトコントローラー
また、ＭＳＰはソフトウェアで制御される初期化及び、インタラプトのために使用される特殊なレジスタ（ＭＳＰ制御レジスタ）を含む。
【００２２】
１．４ＭＳＰ−１ＥＸピン説明
１．４．１合計：２５６ピン
１．４．２ＰＣＩバスインターフェース（５３ピン）
ＣＬＫクロック入力ピン
ＲＳＴＬ入力ピンリセット、アクチブロー
ＡＤ[31:0] アドレス及びデータバスピン
Ｃ＿ＢＥ０Ｌコントロール＆バイト０イネーブルピン、アクチブロー
Ｃ＿ＢＥ１Ｌコントロール＆バイト１イネーブルピン、アクチブロー
Ｃ＿ＢＥ２Ｌコントロール＆バイト２イネーブルピン、アクチブロー
Ｃ＿ＢＥ３Ｌコントロール＆バイト３イネーブルピン、アクチブロー
ＰＡＲパリティピン
ＦＲＡＭＥＬサイクルフレームピン、アクチブロー
ＩＲＤＹＬ開始者準備ピン、アクチブロー
ＴＲＤＹＬターゲット準備ピン、アクチブロー
ＳＴＯＰＬ停止トランザクションピン、アクチブロー
ＬＯＣＫＬロックトランザクションピン、アクチブロー
ＩＤＳＥＬ初期化装置選択入力ピン
ＤＥＶＳＥＬ装置選択ピン、アクチブロー
ＲＥＱＬバス要請ピン、アクチブロー
ＧＮＴＬバス承認ピン、アクチブロー
ＰＥＲＲＬパリティエラーピン、アクチブロー
ＳＥＲＲＬシステムエラーピン、アクチブロー
ＩＮＴＡＬインタラプトピン、アクチブロー
１．４．３その他（６ピン）
ＴＣＫＪＴＡＧテストクロック入力ピン
ＴＤＩＪＴＡＧテストデータ入力ピン
ＴＤＯＪＴＡＧテストデータ出力ピン
ＴＭＳＪＴＡＧテストモード選択入力ピン
ＴＲＳＴＬＪＴＡＧテストリセット入力ピン
ＣＬＫクロック入力。これは４０ＭＨｚクロック入力ピンである。
１．４．４ＫＳ０１１９ＮＴＳＣ／ＰＡＬエンコーダーインターフェース（２４ピン）
ＳＦＲＳ３ワイヤーホストインターフェースのためにＫＳ０１１９に出力されるフレーム同期
ＳＣＬＫＫＳ０１１９に出力されるシリアルクロック
ＳＤＡＴシリアルデータＩ／Ｏ
ＢＧＨＳＭＳＰに入力される水平同期信号
ＢＧＶＳＭＳＰに入力される垂直同期信号
ＭＳＳＥＬマスタ選択
ＰＤ[15:0] ＫＳ０１１９に出力されるピクセルデータ
ＢＧＣＬＫＫＳ０１１９に出力されるピクセルクロック
ＰＲＯＭＣＳＬＢＩＯＳＰＲＯＭチップ選択
【００２３】
１．４．５ＡＤ１８４３オーディオ＆遠距離通信コーデックインターフェース（６ピン）
Ａ４３ＳＣＬＫシリアルクロック入／出力。ＳＣＬＫはバスマスタ（ＢＭ）ピンがＨＩに駆動される場合、クロックをシリアルバスに対する出力として供給し、ＢＭインがＬＯに駆動される場合、クロックを入力として受け入れる両方向信号である。
Ａ４３ＳＤＦＳシリアルデータフレーム同期入／出力。ＳＤＦＳはバスマスタ（ＢＭ）ピンがＨＩに駆動される場合、フレーム同期信号をシリアルバスに対する出力として供給し、ＢＭピンがＬＯに駆動される場合、フレーム同期信号を入力として受け入れる両方向信号である。
Ａ４３ＳＤＩＭＳＰから出力されるＡＤ１８４３に対するシリアルデータ入力。全制御及び再生トランスファーは、１６ビット長さのＭＳＢである。
Ａ４３ＳＤＯＡＤ１８４３から出力されＭＳＰに入力されるシリアルデータ出力。全ステータス＆制御レジスタ読出及び再生トランスファーは、１６ビット長さのＭＳＢである。
【００２４】
１．４．６メモリバスインターフェース（８７ピン）
ＲＡＳ１Ｌ出力ピン（アクチブロー）。これはＭＡ[11:0]からのローアドレスを、選択されたＳＤＲＡＭバンクの内部ローアドレスバッファにラッチするローアドレスストローブである。
ＣＡＳ１Ｌ出力ピン（アクチブロー）。これはＭＡ[11:0]からのコラムアドレスを、選択されたＳＤＲＡＭバンクの内部コラムアドレスバッファにラッチするコラムアドレスストローブである。
ＭＷＥＬ出力ピン（アクチブロー）。これはＳＤＲＡＭに対する記入イネーブルである。
ＭＡＩ[11:0] 出力ピン。ＳＤＲＡＭに対し多重化されたロー及びコラムアドレス信号。
ＭＤ[63:0] 入／出力ＳＤＲＡＭデータピン
ＭＡ２３出力ピン。メモリアドレスビット＜２３＞
ＭＡ２４出力ピン。メモリアドレスビット＜２４＞
ＤＱＭ出力ピン。クロック以降、ＳＤＲＡＭデータをハイインピーダンスにし、出力をマスクさせる。（このピンは、同期ＤＲＡＭインターフェースためにのみ使用される。）
ＭＣＫＥ出力ピン。次のクロックサイクルから動作を中止させるために、ＳＤＲＡＭシステムクロックをマスクさせる。
ＭＣＳ０Ｌ出力ピン（アクチブロー）。下位３２ビットに対するＳＤＲＡＭチップ選択
ＭＣＳ１Ｌ出力ピン（アクチブロー）。上位３２ビットに対するＳＤＲＡＭチップ選択
ＭＲ．ＤＹＨ出力ピン。ＳＤＲＡＭ準備信号
ＭＥＭＣＬＫ出力ピン。これはＳＤＲＡＭに対するクロック出力ピンである。
１．４．７電源
ＶＤＤ３．３ボルト電源ピン
ＶＣＣ５ボルト電源ピン
ＶＳＳ接地ピン
ＭＳＰ−１ＥＸピン指定
【００２５】
【表１】

【表２】

【表３】

【表４】

【表５】

【表６】

【表７】

【表８】

【００２６】
１．５ファームウェア構造
１．５．１概要
ＭＳＰは、ベクトル化されたＤＳＰファームウェアライブラリー（ベクトル処理器により実行される）の非常に最適化された結合及びシステム管理機能（ＡＲＭ７により実行される）を通して強力で且つ開放的な応用環境を多く提供する。
ＭＳＰは信号処理開発とホスト応用開発とを分離することによって、スケール可能な性能、費用効果的なマルチメディア＆通信、便利な使用及び容易な取扱い等を提供する。また、応用開発と維持費用を減少させる。
１．５．２ファームウェア構造
ＭＳＰファームウェアシステム構造は、図７の図示のとおりである。陰影領域はＭＳＰシステム要素を表し、残りの余白は内在するＰＣ応用及び動作システムを表す。
１．５．２．１ＭＯＳＡ（マルチメディア動作システムの構造）
ＭＳＰの実時間動作システムカーネルは“ＭＯＳＡ”といい、これはマイクロソフトの実時間カーネルＭＭＯＳＡのサブセットである。
ＭＯＳＡは実時間的で、丈夫であり、マルチタスキングが可能であり、先買権のある動作システムであって、ＭＳＰ上で具現されたマルチメディア応用のために活用される。これは下記のような主要機能を遂行する。
＊ホストウインド９５とウインドズＮＴとのインターフェーシング
＊ホストから選択された応用ファームウェアのダウンローディング
＊ＡＲＭ７及びベクトル処理器で遂行するためＭＳＰタスクのスケジューリング
＊メモリ＆Ｉ／Ｏ装置を含むすべてのＭＳＰシステム資源の管理
＊ＭＳＰタスク間の通信の同期化
＊ＭＳＰ関連のインタラプト、例外及びステータス条件のリポーティング
ＭＯＳＡはＡＲＭ７上で排他的に動作する。
より詳細なものは、ＭＭＯＳＡ実時間カーネル仕様を参照する。
１．５．２．２マルチメディアライブラリモジュール
マルチメディアライブラリーモジュールは、データ圧縮、ＭＰＥＧビデオ＆オーディオ、音声コーディング及び合成、サウンドブラスター互換可能なオーディオ等のような機能を遂行するボード範囲のモジュールを提供する。各モジュールは、ＭＳＰ環境で最適化され、マルチタスキング環境で遂行するように設計される。
【００２７】
１．５．３テレコムライブラリー
１．５．３．１概要
適切なＤＳＰファームウェアと共に、ＭＳＰはインタセプトされる音声応用を支援し、かかってくる電話呼出に応答し、ハードディスク上にメッセージを貯蔵するように使用され得る。また、システムスピーカーは、半二重(half-duplex)スピーカーフォンをサービスするためにマイクロフォンを使用することができる。かかってくる電話及びかける電話の進行が感知され、システムで使用される。また、電話進行トーンは、プログラム制御の下で、選択された電話機の送受話器、システムスピーカー、ステレオヘッドフォンまたは、オーディオ出力チャンネルを通して聞ける。
【００２８】
１．６プログラミングモデル
１．６．１概要
ハードウェアの観点から見る時、ＭＳＰは２個のＣＰＵと、多数の集積された周辺装置を含む単一チップの解決案である。ソフトウェアの観点から見る時ＭＳＰはＰＣＩバス上に存在する高性能デジタル信号処理（ＤＳＰ）装置である。
ホストＣＰＵによるＭＳＰの制御は、下記のいずれか１つによって実現され得る。
＊ＰＣＩバスを通してＭＳＰ制御＆ステータスレジスタの読み取り／書込または
＊ホストシステムメモリに存在する共有データ構造
＊ＭＳＰローカルメモリに存在する共有データ構造
ＭＳＰプログラムの遂行は常にＡＲＭ７ＣＰＵから始まり、これは順次的にベクトル処理器にある第１従属的な実行ストリームを初期化させ得る。ＡＲＭ７ＣＰＵとベクトル処理器との間の制御同期化は、ＡＲＭ７の任意のコプロセッサ命令(STARTVP、INTVP、TESTVP）と、ベクトル処理器における特殊な命令(VJOIN、VINT)によって遂行される。ＡＲＭ７ＣＰＵとベクトル処理器との間のデータ伝送はＡＲＭ７で実行されるデータ移動命令によって遂行され得る。
ＡＲＭ７ＣＰＵは一般的に、大部分のインタラプト＆例外処理だけでなく、ホストインターフェース、資源管理、Ｉ／Ｏ装置処理を担当する。ベクトル処理器はすべてのデジタル信号処理及び、コプロセッサインタラプト（ＡＲＭ７でベクトル処理器により発生される）とハードウェアスタックオーバーフロー（ベクトル処理器で）のような任意の特殊なインタラプトを担当する。
また、ＭＳＰは多様なＩ／Ｏ装置に対してインターフェーシングするために集積された周辺機器を多く含む。すべての周辺装置のアドレスはメモリマッピングされ、よって標準メモリロード＆貯蔵命令（ＡＲＭ７ＣＰＵまたはベクトル処理器の中のいずれか１つにより）でアクセスされ得る。
【００２９】
１．６．２電源印加、リセット＆初期化
電源が印加された後、ＭＳＰは機能を正確に確認するために、自動にセルフ−テストシーケンスに入る。セルフ−テストシーケンスは下記のことを含む。
＊すべての内部ＭＳＰレジスタの初期化
＊ＭＳＰのすべての要素を確認するために、半導体チップのセルフ−テスト診断遂行
そして、セルフ−テストシーケンスは、＜ｔｄｓ＞秒近くまで持続されると予想される。セルフ−テストシーケンスの最後で、ＭＳＰは下記のものを含むＭＳＰファームウェアを遂行する準備をする。
＊ＭＳＰの初期化ソフトウェアのローディング及び実行
＊ＭＳＰの実時間動作システム過０カーネルＭＭＯＳＡのローディング及び実行
ＭＳＰは下記の３種類のリセットを支援する。
＊ＰＣＩバスによるハードウェア制御システムリセット
＊ＭＳＰ制御レジスタにあるＰＣＩシステムリセットビットによるソフトウェア制御システムリセット
＊ＭＳＰ制御レジスタにあるベクトル再開始、ビットによるソフトウェア制御再開始(restart)
【００３０】
１．６．３ＰＣＩ配列レジスタ
ＰＣＩバスに対するＩ／Ｏ装置であって、ＭＳＰはＰＣＩＲｅｖ２．１に定義され、表９に示されているような一セットの構成レジスタを含む。
【００３１】
ＰＣＩ配列レジスタ
【表９】

【００３２】
１．６．３．１装置＆ベンダー識別子レジスタ
より詳細なものはＰＣＩバス仕様Ｒｅｖ２．１参照。
１．６．３．２ステータス＆コマンドレジスタ
より詳細なものはＰＣＩバス仕様Ｒｅｖ２．１参照。
１．６．３．３クラスコード＆校正識別子レジスタ
より詳細なものはＰＣＩバス仕様Ｒｅｖ２．１参照。
ＭＳＰ−１ＥＸに関してクラスコードは０３に定義され、サブクラスは０である。
１．６．３．４その他のレジスタ
より詳細なものはＰＣＩバス仕様Ｒｅｖ２．１参照。
１．６．３．５ＭＳＰベースアドレスレジスタ（ＭＳＰＢＡＳＥ）
このレジスタはＭＳＰ装置のためのベースアドレスを貯蔵する。このアドレスはホストシステムソフトウェア(Windows 95/NT）により記入され、ＭＳＰハードウェアで使用されメモリをアドレッシングする。
１．６．３．６ＶＦＢベースアドレスレジスタ
このレジスタはＶＧＡ仮想フレームバッファのためのベースアドレスを貯蔵する。このアドレスはホストシステムソフトウェア(Windows 95/NT)により記入され、ＭＳＰハードウェアで使用されＶＧＡフレームバッファをエミュレーションする。
【００３３】
１．６．３．７拡張ＲＯＭベースアドレス
より詳細なものはＰＣＩバス仕様Ｒｅｖ２．１参照。
１．６．３．８インタラプトラインレジスタ
より詳細なものはＰＣＩバス仕様Ｒｅｖ２．１参照。
１．６．４ＡＲＭ７ＣＰＵ
ＡＲＭ７ＲＩＳＣＣＰＵはＭＳＰのマスタ処理器であって、３２ビットデータ経路を含んでおり、標準ＡＲＭ７命令セット構造からなる。またＡＲＭ７はベクトル処理器とインターフェースするために、特殊なコプロセッサ命令を含む。
１．６．５ベクトル処理器
ベクトル処理器は、ＭＳＰのＤＳＰエンジンであり、２８８ビットデータ経路を含んでおり、ＡＲＭ７に対しコプロセッサとして動作する。このような機能はベクトル処理器構造文書に記述されている。
ベクトル処理器２２０は８０ＭＨｚで動作し、６ステージのパイプラインすなわち、フェッチ(fetch)、デコード(decode)、流出人(issuer)、レジスタアクセス(register access)、実行(execute)及び記入(write)を含む。これはＤＳＰ関連処理のために最適化される。
【００３４】
１．６．６仮想メモリ管理
ＭＳＰ−１ＥＸは仮想メモリ管理を支援しない。
１．６．７インタラプト＆実行処理
ＭＳＰでインタラプト＆実行処理は大抵ＡＲＭ７により行われる。
内部のすべての入／出力装置インタラプトは、内部の８２５４インタラプトコントローラーに入って、これら間の優先順位を決め、最も高い優先順位のインタラプトを次の処理のためにＡＲＭ７に送る。
１．６．８物理的なメモリアドレスマップ
ＡＲＭ７及びベクトル処理器プログラムは、図８に示したような物理的なメモリによってメモリマッピングされたすべてのＭＳＰ入／出力装置を示す。
ＡＲＭ７（またはベクトル処理器）により示されるＭＳＰアドレスマップは、０から始まって４ＧＢまで拡張される。
２ＧＢから４ＧＢまでの領域のアドレスは、次の関係式に従って、０から２ＧＢまでのホスト（Ｐｅｎｔｉｕｍ）ＰＣＩアドレスにマッピングされる。
ホストＰＣＩアドレス：＝ＡＲＭ７アドレス−８０００００００(in hex)このようなマッピングによりＡＲＭ７（またはベクトル処理器）は、０から２ＧＢまでのホストＰＣＩメモリアドレスをアクセスするために、２ＧＢから４ＧＢまでのアドレスを使用することができる。ＡＲＭ７は２ＧＢ以上のホストＰＣＩメモリアドレスに対しては、アクセスできない。
また、ホスト（Ｐｅｎｔｉｕｍ）プログラムは、図９に示したような多少制限された物理的なメモリに従ってメモリマッピングされたすべての入／出力装置を示す。
ホスト（Ｐｅｎｔｉｕｍ）から見る時、
＊ＭＳＰ＿ＢＡＳＥはＭＳＰアドレスマップの始まりである。
＊ＭＳＰ＿ＢＡＳＥ＋７ＤＦＦＦＦＦはＭＳＰアドレスマップの最後である。
＊ＭＳＰアドレスマップは、１２８ＭＢの範囲のみで定義される。
【００３５】
ＭＳＰＩ／Ｏ装置アドレスマップ
【表１０】

【００３６】
１．６．９ＭＳＰホスト制御レジスタ
ＭＳＰ−１ＥＸは、ホスト（Ｐｅｎｔｉｕｍプロセッサ）による初期化及び、インタラプトのために使用される特殊なレジスタを含む。
【００３７】
【表１１】
ＭＳＰ制御レジスタ定義

【００３８】
ｂｉｔ＜０＞ＰＣＩシステムリセット。このビットはＭＳＰ関連のすべての内部／外部入出力装置を含む全体ＭＳＰシステムハードウェアを完全にリセットさせるために、ホスト（ＰＥＮＴＩＵＭ）で使用される。ＰＣＩシステムをリセットさせた後、ＭＳＰはＡＲＭ７、ベクトル処理器及びＩ／Ｏ措置に対するチップ上のすべてのセルフ−テスト診断実行を含む、標準リセットシーケンスを処理する。このようなリセットは、ハードウェアシステムリセットと同一な影響を及ぼす。
ｂｉｔ＜１＞ＡＲＭ７＆ベクトル処理器の再開始。このビットはＡＲＭ７とベクトル処理器を再開始させるために、ホスト（ＰＥＮＴＩＵＭ）で使用される。この再開始はＭＳＰが正常的なリセットシーケンスを全然処理せず、チップ上のセルフ−テスト診断を全く実行しないとの意味で、完全なＰＣＩシステムリセットと区別される。このビットが設定されると、ＡＲＭ７はアドレス０から実行を開始し、ベクトル処理器はアイドルモード(idle mode)に入る。この時、どのような内部または外部Ｉ／Ｏ装置も影響を受けない。
ｂｉｔ＜２＞ホト（ＰＥＮＴＩＵＭ）からのＭＳＰインタラプト要請。このビットはまＳＰを直接インタラプトするためにホスト（ＰＥＮＴＩＵＭ）で使用され、ＡＲＭ７をインタラプトするために使用される内部８２５９プログラム可能なインタラプトコントローラー（ＰＩＣ）の入力の中のいずれか１つに連結される。このビットは、ホスト（ＰＥＮＴＩＵＭ）により設定され、ＡＲＭ７によりクリアーされる。
ｂｉｔ＜３＞ＰＣＩホストインタラプト認知。このビットはＭＳＰが発生したＰＣＩホストインタラプト要請を認知するために、ホスト（ＰＥＮＴＩＵＭ）で使用される。このビットはホスト（ＰＥＮＴＩＵＭ）により設定され、ＡＲＭ７によりクリアーされる。
ｂｉｔ＜３１：４＞予約
【００３９】
１．６．１０ＭＳＰＡＲＭ７制御レジスタ
ＭＳＰ−１ＥＸは、ＡＲＭ７プロセッサによりホストをインタラプトすることに使用される特殊なレジスタを有する。
【００４０】
ＭＳＰＡＲＭ７制御レジスタ定義

【００４１】
ｂｉｔ＜０＞ＭＳＰからのＰＣＩホストインタラプト。このビットは、ＰＣＩバス上のＰＣＩＩＮＴＡ＃ピンのアクチブ確認を通しホストをインタラプトするためにＭＳＰで使用される。このビットはＡＲＭ７により設定され、ＰＣＩバスを通しホスト（ＰＥＮＴＩＵＭ）によりクリアーされる。
ｂｉｔ＜３１：１＞予約
１．６．１１ＭＳＰ内部μＲＯＭ
内部ＲＯＭは全体１６ＫＢｙｔｅからなり、次のことを含む。
＊μＲＯＭ初期化ソフトウェア
＊セルフ−テスト診断ソフトウェア
＊多様なシステム管理ソフトウェア
＊多様なライブラリーサブルーチン
＊命令及びデータ常数のためのキャッシュ
アドレスマップは、次の表に示したとおりである。
【００４２】
内部μＲＯＭアドレスマップ
【表１２】

【００４３】
【表１３】

【００４４】
１．６．１２ＭＳＰ内部ＳＲＡＭ
内部のＳＲＡＭはＭＳＰのベクトル＆制御＆ステータスレジスタ（ＶＣＳＲ）により決まれる選択事項によって、キャッシュまたはローカルメモリの機能を遂行する。
ローカルメモリモードにおいて、アドレス空間は位置＜ＭＣＰ＿ＢＡＳＥ＞：０４０００００から始まって、内部ＳＲＡＭ部にマッピングされる。
１．６．１３ＭＳＰ内部の周辺装置
また、ＭＳＰは２個の内部バス、すなわち６４ビット、８０ＭＨｚで動作するＦｂｕｓと、３２ビット、４０ＭＨｚで動作するＩＯｂｕｓ上に存在する多い周辺装置を有する。
Ｆｂｕｓ上の装置は次のことを含む。
＊外部の同期ＤＲＡＭのためのメモリコントローラー
＊仮想フレームバッファインターフェース
＊外部ＰＣＩバスのためのＰＣＩバスコントローラー
＊カストマーＡＳＩＣインターフェース
＊８チャンネルＤＭＡコントローラー
＊メモリデータ移動器（ホストメモリとＳＤＲＡＭ間のデータ伝達のため）
＊ＫＳ０１２２ＣＯＤＥＣシリアルライン
＊ＫＳ０１１９ＣＯＤＥＣシリアルライン
＊ＡＤ１８４３ＣＯＤＥＣシリアルライン
一方、ＩＯｂｕｓ上の装置は下記のことを含む。
＊８２５４−互換可能なプログラマブルインターバルタイマー
＊８２５９−互換可能なプログラマブルインタラプトコントローラー（８レベル）
＊１６４５０−互換可能なＵＡＲＴシリアルライン
＊ＭＰＥＧビットストリームデコーディング＆エンコーディングのためのビットストリーム処理器
このような周辺装置等のレジスタアドレスマップは、表に示したとおりである。
【００４５】
【表１４】

【表１５】

【表１６】

【表１７】

【００４６】
内部周辺装置レジスタアドレスマップ
１．６．１４ＩＯＢＵＳ周辺装置
１．６．１４．１８２５４−互換可能なプログラマブルインターバルタイマ
ＭＳＰは、下記のような機能を有するソフトウェアとして使用するために、標準８２５４−互換可能なプログラマブルインターバルタイマーを含む。
＊３個の独立的な１６ビットカウンタを有する。
＊６個のプログラマブルカウンタモードを支援する。
すべてのカウンターは、制御ワードレジスタに記入するものと初期カウントによりプログラムされる。
＊制御ワードレジスタ
このレジスタは、タイマーに対する多様な制御情報を有する。このレジスタのビット定義は、表に示したとおりである。
【００４７】
【表１８】
制御ワードレジスタ

【００４８】
＊ステータスレジスタ
このレジスタは、タイマーに対するステータス情報を有する。
＊カウンター０、１、２
この３個のレジスタは、主にタイマーによりカウンティングする素子である。各カウンタは１６ビット幅を有し、プリセットが可能で、ＢＣＤモードの各２進数でカウントダウンする。このレジスタの入力、ゲート及び出力は、制御ワードレジスタに貯蔵されたＭＯＤＥＳの選択により構成される。この３個のカウンタは完全に独立的である。
【００４９】
１．６．１４．２８２５９−互換可能なプログラマブルインタラプトコントローラー（ＰＩＣ）
ＭＳＰプログラマブルインタラプトコントローラーは、すべてのｘ８６−基盤パーソナルコンピュータにおいて非常に一般的な標準８２５９であり、その機能は次のことを含む。
＊８個レベルの優先順位を支援する。
＊プログラマブルインタラプトモード
＊個別的な要請マスク能力
ＭＳＰ−１ＥＸにおいて、８個レベルのインタラプト入力は、多様なＩ／Ｏ装置に対し下記のとおり割当てられる。
＊レベル０（最も高い）は８２５４タイマーに割当てられる。
＊レベル１は、仮想フレームバッファ（ＶＦＢ）に割当てられる。
＊レベル２は、ＤＭＡコントローラーを含むカストマＡＳＩＣロジックブロックに割当てられる。
＊レベル３は、ビットストリーム処理器に割当てられる。
＊レベル４は、ＰＣＩバスインターフェースに割当てられる。
＊レベル５は＜ｔｂｄ＞に割当てられる。
＊レベル６は＜ｔｂｄ＞に割当てられる。
＊レベル７は、１６５５０ＵＡＲＴに割当てられる。
インタラプトコントローラーの出力は、ＡＲＡＭ７ＲＩＳＣＣＰＵのインタラプト要請ライン（ｎＦＩＱ）に結合される。
＊レジスタ説明
ここには、下記のようなＰＩＣの動作を初期化することに使用される３個の８ビットレジスタがある。
＊初期化コマンドワード１（ＩＣＷ１）
＊初期化コマンドワード２（ＩＣＷ２）：ＭＳＰ−１ＥＸには使用しない。
＊初期化コマンドワード３（ＩＣＷ３）：ＭＳＰ−１ＥＸには使用しない。
＊初期化コマンドワード４（ＩＣＷ４）
また、下記のようなＰＩＣ動作を制御することに使用される３個の８ビットレジスタがある。
＊動作制御ワード１（ＯＣＷ１）
＊動作制御ワード２（ＯＣＷ２）
＊動作制御ワード３（ＯＣＷ３）
これらのすべてのレジスタは、アドレス部分（ｂｉｔ＜０＞）とデータの部分の両方に特殊にエンコーディングされる。より詳細なことは、標準８２５９仕様を参照する。
【００５０】
８２５９レジスタ説明
【表１９】

【００５１】
１．６．１４．３１６４５０−互換可能なＵＡＲＴシリアルライン
ＭＳＰは、外部シリアルＩ／Ｏ装置とのインターフェースとして使用される１６４５０−互換可能なＵＡＲＴシリアルラインを含む。より詳細なことは、標準１６４５０仕様を参照する。
１．６．１４．４ビットストリーム処理器
ビットストリーム処理器は、ビデオビットストリームデータを処理する特殊化されたロジックブロックであり、この機能は下記のことを含む。
＊可変長さハフマンデコーディング及びエンコーディング
＊ジグザグ貯蔵フォーマットのビデオデータのアンパッキング及びパッキング
＊多様なビット−レベル処理
ビットストリーム処理器は、同時的な処理ユニットとして動作し、ベクトル処理器またはＡＲＭ７によりソフトウェアで制御される。より詳細なことは、ビットストリーム処理器部分を参照する。
１．６．１５ＦＢＵＳ周辺装置
ＦＢＵＳ周辺装置は下記のとおりである。
＊カストマＡＳＩＣロジックインターフェース
＊８個チャンネルＤＭＡコントローラー
＊三星のＫＳ０１１９に対するビデオエンコーダーシリアルラインインターフェース
＊アナログデバイス社のＡＤ１８４３に対するオーディオ＆テレコムシリアルラインインターフェース
１．６．１５．１ＡＳＩＣインターフェースロジックインターフェース
この節は、外部のすべてのＣＯＤＥＣと、カストマが規定したＡＳＩＣロジックブロックに対するインターフェースロジックを含む。このブロックのすべてはハードウェアで具現され、プログラム−可視(program-visible)レジスタは備えない。より詳細なことはＡＳＩＣインターフェース部分を参照する。
１．６．１５．２ＤＭＡコントローラー
ＭＳＰ−１ＥＸは、下記のような機能を有するチップ上のＤＭＡコントローラーを備える。
＊８個の独立的なＤＭＡチャンネル
＊個別的なＤＭＡチャンネルに対するイネーブル／ディスエーブル制御
＊メモリトランスファーまたは逆トランスファーに対するＩＯ装置
＊アドレス増加及び減少
より詳細なことをは、ＡＳＩＣインターフェース部分を参照する。
【００５２】
１．６．１５．３メモリデータ移動器
また、ＭＳＰ−１ＥＸは、特殊なメモリデータ移動器を備える。このメモリデータ移動器は、ホスト（ＰＥＮＴＩＵＭ）メモリと、ＭＳＰローカルＳＤＲＡＭメモリ間でデータを移動させるために使用される。メモリデータ移動器は、基本的に下記のようなレジスタを含む特殊なＤＭＡコントローラーである。
＊ＭＳＰ現在アドレスレジスタ：この３２ビットレジスタは、メモリデータトランスファーの初期にＳＤＲＡＭメモリアドレスを定義する。このレジスタはＡＲＭ７により記入または読出でき、初期値はＡＲＭ７によりロードされなければならない。アドレスはデータトランスファーサイズに基づいて増加される。
＊ホスト現在アドレスレジスタ：この３２ビットレジスタは、メモリデータトランスファーの初期にホストメモリアドレスを定義する。このレジスタはＡＲＭ７により記入または読出でき、初期値はＡＲＭ７によりロードされなければならない。アドレスはデータトランスファーサイズに基づいて増加される。
＊ＭＳＰ停止アドレスレジスタ：この３２ビットレジスタは、メモリデータトランスファーの最後にＳＤＲＡＭメモリアドレスを定義する。このレジスタはＡＲＭ７により記入または読出でき、ＭＳＰ現在アドレスレジスタと比較し使用される。もし、これらがマッチングすると、メモリデータ移動器はＭＳＰのＥｎｄ−Ｏｆ−Ｐｒｏｃｅｓｓ信号を発生する。
＊ホスト停止アドレスレジスタ：この３２ビットレジスタは、メモリデータトランスファーの最後に、ホストメモリアドレスを定義する。このレジスタはＡＲＭ７により記入または読出でき、ホスト現在アドレスレジスタと比較し使用される。もし、これらがマッチングすると、メモリデータ移動器はホストのＥｎｄ−Ｏｆ−Ｐｒｏｃｅｓｓ信号を発生する。
＊ステータスレジスタ：このレジスタは、メモリデータ移動器と関連したステータス情報を含む。ビットエンコーディングは下記のとおりである。
＜０＞：ＭＳＰＥＯＰ。このビットは、メモリデータ移動器がＭＳＰの停止アドレスに到達したか否かを決定する。もし、ＡＲＭ７がソース現在アドレスレジスタを初期化すると、ＡＲＭ７は００８０００００(hex)にリセットされる。このビットはＡＲＭ７により読出のみが遂行され、記入は遂行されてはいけない。
＜１＞：ＨＯＳＴＥＯＰ。このビットは、メモリデータ移動器がホストの停止アドレスに到達したか否かを決定する。もし、ＡＲＭ７がホスト現在アドレスレジスタを初期化すると、ＡＲＭ７は８００００００(hex)にリセットされる。このビットはＡＲＭ７により読出のみが遂行され、記入は遂行されてはいけない。
＊制御レジスタ：このレジスタは、メモリデータ移動器と関連した情報を含む。このビットエンコーディングは下記のとおりである。
＜０＞：方向。このビットはデータトランスファーの方向を決定する。このビットが“０”（ディフォールト）の場合、データトランスファーの方向はホスト（ＰＥＮＴＩＵＭ）メモリからＭＳＰＳＤＲＡＭメモリであり、このビット“１”の場合、データトランスファーの方向は、ＳＤＲＡＭからホストメモリである。このビットはＡＲＭ７により記入されなければならない。
＜１＞：インタラプトイネーブル。このビットはメモリデータ移動器が、データトランスファーの最後にＡＲＭ７をインタラプトするか否かを決定する。このビットはＡＲＭ７により記入されなければならない。
＜２＞：ＤＭＡイネーブル。このビットはメモリデータ移動器が動作するようにイネーブルさせる。このビットはＡＲＭ７により記入されなければならない。
＜３＞：データトランスファーサイズ。このビットが“０”（省略時）の場合、各メモリのデータトランスファーサイズは３２バイトであり、“１”の場合は６４バイトである。このビットはＡＲＭ７により記入されなければならない。
【００５３】
１．６．１５．４ＫＳ０１１９ビデオエンコーダーシリアルラインインターフェース
ＫＳ０１１９ビデオエンコーダーシリアルラインインターフェースは、下記のことを含む。
＊コーデックからの読出データを含むダブル−バッファ受信データバッファレジスタ
＊コーデックへの記入データを含むダブル−バッファ伝送データバッファレジスタ
＊シリアルラインに対する多様な制御＆ステータス情報を含む制御＆ステータスレジスタ
【００５４】
【表２０】
ＫＳ０１１９ビデオエンコーダーシリアルラインインターフェースレジスタ

【００５５】
制御＆ステータスレジスタのビットエンコーディングは下記のとおりである。
ｂｉｔ＜０＞：受信データのフール状態である。このビットはシリアルラインが、ＫＳ０１１９ＣＯＤＥＣから８ビットのデータを受信した場合に設定される。もしインタラプトイネーブル（ｂｉｔ＜７＞）が設定されると、インタラプト要請もＡＲＭ７に発生される。
ｂｉｔ＜１＞：伝送データバッファが空いている状態である。このビットはシリアルラインがＫＳ０１１９にデータを送るように準備されている場合に設定される。もしインタラプトイネーブル（ｂｉｔ＜７＞）が設定されると、インタラプト要請もＡＲＭ７に発生される。
ｂｉｔ＜７＞：インタラプトイネーブル。このビットはＡＲＭ７にインタラプト要請をイネーブルさせるために使用される。
１．６．１５．５ＡＤ１８４３オーディオ＆テレコムシリアルラインインターフェース
ＡＤ１８４３シリアルラインインターフェースは下記のことを含む。
＊コーデックから読出されたデータを含む一セットのダブル−バッファリングされたレジスタ
＊コーデックに記入しようとするデータを含む一セットのダブル−バッファリングされたレジスタ
＊シリアルラインに対する多様な制御＆ステータス情報を含む制御＆ステータスレジスタ
より詳細なことは、ＡＤ１８４３コーデックインターフェース部分を参照する。
１．６．１６命令性能
表２１は、毎サイクルが１２．５ｎｓであるベクトル処理器サイクルカウントにおける命令性能を示す。外部メモリバス幅は６４ビットで、４０ＭＨｚのページモードクロックを有すると仮定する。すべての命令性能は、３２バイトベクトルモードに与えられる。規則は下記のとおりである。
＊ラス（ｒａｓ）：外部メモリが第１アクセスをすることに要求されるサイクル数。一般に７５ｎｓまたは６個のサイクルを必要とする。
＊待ち時間（ｌａｔｅｎｃｙ）：第１命令を実行するためのサンプル数。
＊レート（ｒａｔｅ）：類似した連続的な命令実行の間に存在するサイクル数。
待ち時間がレートと同一な場合、１つの数字のみが使用される。
【００５６】
命令実行性能
【表２１】

【表２２】

【表２３】

【００５７】
第２章ＤＳＰコア
本章は、ハードウェア及びソフトウェアデザイナーが示しているＤＳＰコアの仕様に関して記述している。
２．１概要
ＤＳＰコアは、ＭＳＰにおいて基礎的な要素であり、すべての演算に対して単独に責任を担う。このＤＳＰコアは次のように構成される。
＊４０ＭＨｚで動作し、実時間ＯＳ、インタラプト及び例外処理、入出力装置管理等のような、汎用データ処理用として使用する３２ビットＡＲＭ７ＲＩＳＣＣＰＵ。
＊８０ＭＨｚで動作し、離散余弦変換、ＦＩＲフィルタリング、くりこみ、ビデオのモーション推定等のようなデジタル信号処理用として使用されるベクトル処理器。このベクトル処理器はＡＲＭ７により初期化され、ＡＲＭ７と同時的に動作可能で、特殊な制御命令によりＡＲＭ７と同期される。
＊８０ＭＨｚで動作し、ＡＲＭ７のための１ＫＢの命令キャッシュと１ＫＢのデータキャッシュ、ベクトル処理器のための１ＫＢの命令キャッシュと４ＫＢのデータキャッシュ、ＡＲＭ７及びベクトル処理器のための共有の１６ＫＢの集積された命令＆データキャッシュＲＯＭから構成されるキャッシュサブシステム。ベクトル処理器用のデータキャッシュは、ハードウェアまたはソフトウェアによって制御され得る。キャッシュサブシステムは、３２ビットデータバスを通してＡＲＭ７とインターフェースし、１２８ビットデータバスを通してベクトル処理器とインターフェースする。
＊ビットストリーム処理器、インタラプトコントローラー、タイマー及びＵＡＲＴのような多様な内部周辺機器等とインターフェースする３２ビット、４０ＭＨｚの入力＆出力バス（ＩＯＢＵＳ）。
＊ＰＣＩバスコントローラー、メモリコントローラー、ＤＭＡコントローラー及びカストマＡＳＩＣロジックブロックとインターフェースする６４ビット、８０ＭＨｚの高速入／出力バス（ＦＢＵＳ）。
ＤＳＰコアのブロック図は、図１０の図示のとおりである。
【００５８】
２．２ＡＲＭ７ＲＩＳＣＣＰＵ
２．２．１概要
ＡＲＭ７ＲＩＳＣＣＰＵは、汎用の３２ビットＲＩＳＣプロセッサコアである。このＡＲＭ７ＲＩＳＣＣＰＵは、標準コプロセッサインターフェースを通しベクトル処理器とインターフェースし、実時間ＯＳ、ＩＯ装置インタラプト処理及びホストＣＰＵとの通信のように、大部の非演算的な集中機能を処理することに使用される。
ＡＲＭ７ＣＰＵは下記のような特性を有する。
＊電力敏感性応用に理想的な極めて静的な動作。
＊低電力消費：０．６ｍＡ／ＭＨｚ＠３Ｖ製作。
＊高性能：２５ＭＩＰｓ＠４０ＭＨｚ（４０ＭＩＰｓピーク）＠３Ｖ。
＊大小サイズのＥｎｄｉａｎ動作モード
＊実時間応用のための高速インタラプト応答（４０ＭＨｚで２２クロックサイクル）
＊簡単かつ強力な命令セット。
＊約６ｍｍ²の非常にコンパクトなレイアウト。
２．２．２レジスタ
ＡＲＭ７は３１個の汎用レジスタと６個のステータスレジスタ、すなわち合計３７個のレジスタを有する。プログラマーには、常に１６個の汎用レジスタと１つ或いは２つのステータスレジスタが提供される。ユーザー、スーパーバイザー、ＩＲＱ、ＦＩＱ、Ａｂｏｒｔ及びＵｎｄｅｆｉｎｅｄのようなすべてのプロセッサモードで、Ｒ０とＲ１５は直接にアクセス可能である。
Ｒ１５を除いたすべてのレジスタは汎用に使用され、データまたはアドレス値を維持させることに使用される。Ｒ１５はプログラムカウンター（ＰＣ）を維持する。ステータスはレジスタのＣＰＳＲ−現在プログラムステータスレジスタは、ＡＬＵフラグと現在モードビットを有している。
Ｒ１４はサブルーチンリンクレジスタとして使用され、ブランチ及びリンク命令が遂行された時、１セットのＲ１５データを受信する。他の場合は、Ｒ１４は汎用レジスタとしても使用され得る。
【００５９】
汎用レジスタ及びプログラムカウンター
【表２４】

【００６０】
【表２５】
プログラムステータスレジスタ

【００６１】
２．２．３例外
例外は、命令処理途中で発生する非正常的な条件をいい、これは制御流れの変更を招来する。ＡＲＭ７例外動作の７タイプに関し、上位優先順位から下位優先順位に列挙すると下記のとおりである。
＊リセット(reset)（最上位優先順位）
＊取消し(abort)（データ）
＊ＦＩＱ
＊ＩＲＱ
＊取消し(abort)（プリフェッチ）
＊定義されていない命令トラップ、ソフトウェアインタラプト（最下位優先順位）
【００６２】
【表２６】
例外ベクトルテーブル

【００６３】
２．２．４命令セット
すべてのＡＲＭ７命令は条件的に実行されるが、これはＡＲＭ７命令がＣＰＳＲレジスタにあるＮ、Ｚ、Ｃ、Ｖフラグ値によって実行されるかもしくは実行されないことを意味する。
ＡＲＭ７命令は、下記のような多様なカテゴリーに分けられる。
＊ブランチ及びリンクされたブランチ（Ｂ、ＢＬ）
＊データプロセッシング（ＡＮＤ、ＥＯＲ、ＳＵＢ、ＲＳＢ、ＡＤＤ、ＡＤＣ、ＳＢＣ、ＲＳＣ、ＴＳＴ、ＴＥＱ、ＣＭＰ、ＣＭＮ、ＯＲＲ、ＭＯＶ、ＢＩＣ、ＭＶＮ）
＊ＰＳＲトランスファー（ＭＲＳ、ＭＳＲ）
＊掛け算及び掛け算−累算（ＭＵＬ、ＭＬＡ）
＊シングルデータトランスファー（ＬＤＲ、ＳＴＲ）
＊ブロックデータトランスファー（ＬＤＭ、ＳＴＭ）
＊シングルデータスワップ（ＳＷＰ）
＊ソフトウェアインタラプト（ＳＷＩ）
＊コプロセッサデータ動作（ＣＤＰ）（これは一グループの命令である。）
＊コプロセッサデータトランスファー（ＬＤＣ、ＳＴＣ）
＊コプロセッサレジスタトランスファー（ＭＲＣ、ＭＣＲ）
２．３ベクトル処理器
２．３．１概要
ベクトル処理器は、最大性能のために、単位命令多重データ（ＳＩＭＤ）構造を利用する強力なデジタル信号処理器であって、非常に優れた性能を実現させるために、多重データ要素上で並列に動作するパイプラインされたＲＩＳＣエンジンから構成される。多重データ要素は、５７６ビットベクトルでパッキングされ、これは下記のようなレートで計算され得る。
＊１２．５ｎｓ−サイクル毎に３２個の８／９ビット固定小数点算術演算または
＊１２．５ｎｓ−サイクル毎に１６個の１６ビット固定小数点算術演算または
＊１２．５ｎｓ−サイクル毎に８個の３２ビット固定小数点または浮動小数点算術演算
２．３．２実行パイプライン
ベクトル処理器は、命令を実行させるために図１１の図示のとおり、６段階のパイプラインを利用する。大部の３２ビットスカラー演算が、サイクル当り１つの命令比率でパイプラインされる一方、大部の５７６ビットベクトル演算は、２個のサイクル毎に１つの命令比率でパイプラインされる。すべてのロード＆貯蔵(Loads＆Stores)は算術演算と重なり、別途でロード＆貯蔵ハードウェアにより独立的に実行される。
設計の複雑度と性能を調和させるために、ベクトル処理器は資源及びデータ従属性をチェックするためのハードウェアインターロックを順序とは関係なく使用し、命令等を発生するか実行することができる。この特徴は、ロード及び貯蔵によって、データキャッシュが紛失される期間の性能を、特に大幅改善する。
【００６４】
２．３．３ハードウェアマイクロ構造
ベクトル処理器は、図１２の説明のとおり、４個の主機能ブロックから構成される。
＊命令語取出ユニット（ＩＦＵ）
＊命令語デコーダー＆発行器
＊命令語実行データ経路
＊ロード＆貯蔵ユニット（ＬＳＵ）
命令語取出ユニットは、命令語の先取り（ｐｒｅｆｅｔｃｈ）及び、ブランチとジャンプのような命令語のサブルーチンに対する流れを制御するプロセッシングを担当する。ＩＦＵは現在実行ストリームに対してプリフェッチされた命令語からなる１６個のエントリキューと、ブランチターゲットストリーム対してプリフェッチされた命令語からなる８個のエントリーキューを有する。ＩＦＵはサイクルごとに命令語キャッシュから８個の命令語が受信できる。
命令語デコーダー＆発行器は、すべての命令語に対するデコーディング及びスケージュリングを担当する。たとえ発行器は、実行資源とオペランドデータ有効性によって、非順次的な大部の命令語のスケジュールが可能であるが、デコーダーはサイクル当り１つの命令語を処理することができ、常にＩＦＵから順次的に到着する命令語を処理することができる。
ベクトル処理器は１２．５ｎｓ／ｃｙｃｌｅで動作する多数個の２８８ビットデータ経路（図１３参照）を通してその性能の大部分を実現し、この場合次のことを含む。
＊サイクル当り２個の読出及び２個の記入を支援することができる４個ポートを有するレジスタファイル
＊８回の３２ビット掛け算（整数また浮動小数点フォーマット）、１６回の１６ビット掛け算及び３２回の８ビット掛け算の中のいずれか１つの演算時ごとに、１２．５ｎｓを生成する８個の３２×３２並列掛け算器
＊８回の３６ビットＡＬＵ演算（整数または浮動小数点フォーマット）、１６回の１６ビットＡＬＵ演算または３２回の８ビットＡＬＵ演算の中のいずれか１つの演算時ごとに、１２．５ｎｓを生成する８個の３６ビットＡＬＵ
ロード＆貯蔵ユニットは、それぞれ図１４の説明のように、２８８ビット幅を有する別個の読出＆記入データバスを通して、データキャッシュとインターフェースするために設計されたものである。
【００６５】
２．３．４インタラプト＆例外
ベクトル処理器は、次の２つの特殊条件のみを認識する。
＊ＡＲＭ７プログラムによって実行されるＣＰＩＮＴ（コプロセッサインタラプト）命令語
＊ベクトル処理器プログラムによって実行される、サブルーチン命令語へのネストされたジャンプ(nested jump)＆掛け算の結果のハードウェアスタックオーバーフローベクトル処理器がこれらの２個の特殊条件を処理する、より詳細な方法に対しては、ベクトル処理器構造文書を参照すること。
ＭＣＰから発生されるその他のインタラプト及び例外条件は、ＡＲＭ７のみによって処理される。
【００６６】
２．４キャッシュサブシステム
２．４．１概要
キャッシュ制御ユニット（ＣＣＵ）はＡＲＭ７コア、ベクトル実行ユニット（ＬＳＵ、ＩＦＵ）、メモリ（ＭＣＵ、ＰＣＩ、ＤＭＡ、ＣＯＤＥＣ）及びＩＯデバイス（ＢＰ、ＵＡＲＴ、タイマー、インタラプトコントローラー）とインターフェースする。ＣＣＵは高速（８０ＭＨｚ）のＦＢＵＳと、低速（２０ＭＨｚ）のＩＯＢＵＳとインターフェースする。ＣＣＵは事実上、すべての内部ＣＰＵコアユニットと周辺ＩＯデバイスと間の中央データ伝送ユニットである。ＭＳＰチップにおいてＣＣＵの詳細な説明に関しては、ＭＳＰ−１Ｅシステムスぺックのブロック図（ｐｐ．１−１０）を参照すること。
非常に高性能のキャッシュサブシステムを支援するために、ＣＣＵの設計はすべての読出及び記入動作を支援するプロトコルに基づいたトランザクション(transaction)を使用する。メモリをアクセスする必要がある任意のユニットは、ＣＣＵ制御ユニットでリクエスト(request)を発生させ得る。制御ユニットにあるアービタ(arbiter)は、固定された優先順位に基づいてリクエストを承認し、リクエスター(requestor)で‘transaction_id’を回信する。リクエスターはこの‘transaction_id’を貯蔵し、データが実際に到着した場合に、回信されたデータが認識できるようにする。ＣＣＵ制御が１つのユニット（キャッシュミス（cache miss)が発生した場合、多いサイクルを必要とする場合もある）からのリクエストを処理する間、他のユニットから新しいリクエストが、他の‘transaction_id’と共に次のサイクルで承認される場合もある。リクエストをペンディング(pending)させるこのような方法では、他のユニットからの連続的なリクエストを遮断させることが発生されないので、高性能の実験が可能になる。現在、ＣＣＵは１つのサイクルで１つの読出リクエストと、１つの記入リクエストを同時にアクセプトし承認することができる。
【００６７】
メモリに対するインターフェースユニット（ＦＢＵＳ）は、４個エントリーのアドレスキューと、１個エントリーのライト−バック(write-back)ラッチからなる。最善の状態で、ＦＢＵＳはＡＲＭ命令語キャッシュからの１つのペンディングリフィール（読出）リクエスト、ＶＥＣ命令語キャッシュからの１つのペンディングリフィール（読出）リクエスト、ＶＥＣデータキャッシュからの１つの記入リクエストと、ダーティ（dirty)キャッシュラインにより、ＶＥＣデータキャッシュからの１つのライト−バックリクエストを支援することができる。
また、キャッシュメモリ自体は、高性能のために最適化される。ＭＳＰキャッシュシステムは、チップ上(on-chip)のキャッシュＳＲＡＭとキャッシュＲＯＭとを有する。キャッシュＳＲＡＭは、ＡＲＭＣＰＵとベクトルコアまたは命令語とデータ間のスラッシング(thrashing)を防止するため、４個の相互に異なるバンクからなる。キャッシュＲＯＭは、ＡＲＭ７とベクトルコアのために高速及び高密度のデータ貯蔵領域を提供する。例え、タグ(tag)がキャッシュＲＯＭに対して変更されることはないが、有効ビットの使用が不可能になり、データが外部メモリから返還される。要すれば、チップ上のキャッシュメモリは、次のようなブロックを含む。
＊１ＫＢの直接マッピングされた命令語キャッシュと、１ＫＢの直接マッピングされつつＡＲＭ７に対する３２ビットデータバスインターフェースを有するライト−バックデータキャッシュ
＊１ＫＢの直接マッピングされ、ベクトル命令語フェッチユニットに対する２５６ビットバスインターフェースを有する命令語キャッシュ
＊４ＫＢの直接マッピングされ、ベクトル実行ユニットに対する２５６ビットバスインターフェースを有するライト−バックデータキャッシュ。データキャッシュはデュアルポートからなり、８０ＭＨｚのサイクルごとに２５６ビットの読出データを提供し、２５６ビットの記入データを支援することができる。
＊４ＫＢＶＥＣデータキャッシュは、ソフトウェアの制御下で、スクラッチ−パッド(scratch-pad)演算により形成できる。
＊ＡＲＭ７及びベクトル処理器で使用するために共有または集積された命令語＆データＲＯＭキャッシュ。ＡＲＭ７に対するインターフェースは、その命令語キャッシュと同一な３２ビットバスを通して、ベクトル処理器に対するインターフェースは、その命令語キャッシュと同一な２５６ビットを通してなる。
＊５個のポート：
−ＡＲＭ７のための読出／記入ポート
−ベクトル処理器の命令語取出ユニットのための読出ポート
−ベクトル処理器のロード／貯蔵ユニットのための読出／記入ポート
−ベクトル処理器のＩＯＢＵＳのための読出／記入ポート
−ＦＢＵＳのための読出／記入ポート
＊ＡＲＭ７ＣＰＵ命令語キャッシュのための３２×２５６ビットＳＲＡＭ（〜１ＫＢ）
＊ＡＲＭ７ＣＰＵデータキャッシュのための３２×２５６ビットＳＲＡＭ（〜１ＫＢ）
＊ベクトル処理器データキャッシュのための１２８×５６ビットＳＲＡＭ
（〜４ＫＢ）
＊ベクトル処理器命令語キャッシュのための３２×２５６ビットＳＲＡＭ
（〜１ＫＢ）
＊データ＆命令語キャッシュのための５１２×２５６ビットＳＲＡＭ
（〜１６ＫＢ）
ベクトルデータキャッシュの制御は、ハードウェア制御またはソフトウェア制御によって遂行される。
【００６８】
２．４．２キャッシュサブシステム構造
図１５は、ＭＳＰキャッシュシステムのブロック図であり、次のブロックＩＤＣ（Instruction Data Cache)、キャッシュＲＯＭ、ＣＣＵ＿ＤＡＴＡ＿ＤＰ、ＣＣＵ＿ＡＤＲ＿ＤＰ、ＣＣＵ＿ＣＴＬ及びＣＣＵ＿ＳＭとから構成される。それぞれのサブブロックは、さらに詳細なことは後述する。
２．４．２．２ＩＤＣ
命令語及びデータキャッシュ（ＩＤＣ；図１６参照）は、チップ上のＳＲＡＭメモリであり、命令語及びデータキャッシュアクセスを提供するために使用される。このキャッシュは、１つのアレーに対し４個のバンク：ＡＲＭ＿ＩＣ（１ＫＢ）、ＡＲＭ＿ＤＣ（１ＫＢ）、ＶＥＣ＿ＩＣ（１ＫＢ）及びＶＥＣ＿ＤＣ（４ＫＢ）から構成される。任意のサイクルで、このキャッシュは１つの読出リクエストと１つの記入リクエストをアクセプトする。タグＲＡＭは、２個の読出ポートを有する。読出ポートアドレスと記入ポートアドレスとは、ヒットまたはミス条件に対し、内部キャッシュタグと比較される。データＲＡＭは、読出ポートアドレスによりアクセスされる１つの読出ポートのみを有する。また、タグＲＡＭとデータＲＡＭとは、相互に異なるセットの記入アドレスを使用し記入される。従って、キャッシュアレーををアクセスするためには、４セットのキャッシュバンク選択信号と、３セットのラインインデックスを必要とする。
ＩＤＣは下記のような特性を有する。
＊ライト−バック規則に直接マッピングされる。
＊キャッシュラインサイズは６４Ｂであるが、データ幅は３２Ｂであり、これはＭＳＰチップのベクトルデータ幅のサイズに該当する。
＊各ラインは２個の有効ビットを有するが、１つはハイベクトルのためのものであり、他の１つはローベクトルのためのものである。また、データキャッシュはそれぞれのデータに対して１個づつすなわち、２個のダーティビットを有する。
＊ＡＲＭ＿ＩＣ、ＡＲＭ＿ＤＣ及びＶＥＣ＿ＩＣのためのタグサイズは２２ビット（アドレスビット１０〜ビット３１）であり、ＶＥＣ＿ＤＣのためのタグサイズは２０ビット（アドレスビット１２〜ビット３１）である。
＊ＡＲＭ＿ＩＣ、ＡＲＭ＿ＤＣ及びＶＥＣ＿ＩＣのためのラインインデックスビットは５ビット（アドレスビット５〜ビット９）であり、ＶＥＣ＿ＤＣのためのラインインデックスビットは、７ビット（アドレスビット５〜ビット１１）である。
＊ＶＥＣ＿ＤＣ（４ＫＢ）は、ソフトウェアの制御下でスクラッチ−パッドに再形成され得る。
＊Ｖ＿ＣＬＥＡＲ信号は、キャッシュライン有効ビットのすべてを、一回に全体的にリセットさせることに使用する。後でＶ＿ＣＬＥＡＲは個別的なバンクのみを選択的にリセットさせ得る。
【００６９】
２．４．２．３データ経路パイプライン
図１７参照。
２．４．２．４アドレス経路パイプライン
アドレス処理パイプラインに対するデータ経路は、図１８の図示のとおりである。
ＣＣＵＡＤＤＲＥＳＳＤＰ
２．４．３インターフェース
２．４．３．１データタイプ
ＣＣＵはテーブル１５に説明されている多数個のリクエスティングユニットからの相異なるデータタイプを処理する。
【００７０】
相異なるデータタイプを処理する場合のＣＣＵ動作
【表２７】

【００７１】
２．４．３．２ＡＲＭインターフェース
ＡＲＭ７ＣＰＵコアが、ＭＳＰチップの周波数の１／２（４０ＭＨｚ）で動作する反面、ＣＣＵはＭＳＰチップの周波数（８０ＭＨｚ）で動作する。この２個クロック間の同期化は設計時に重要である。一般的に、クロック発生器ユニットは、ＣＬＫ１の上昇エッジでＭＣＬＫを切換する。また、ＡＲＭ７に連結された全体的なリセット信号は、ＣＬＫ１とＭＣＬＫがローの場合に解除(de-assert)される。このような方法によって、２個のユニットは適切に同期化される。ＡＲＭ７は命令語とデータ用として１つの入力バス（ＡＲＭ＿ＤＡＴＡ＜３１：０＞のみを有するが、ＭＳＰチップは専用の命令語キャッシュ（ＡＲＭ＿ＩＣ、１ＫＢ）とデータキャッシュ（ＡＲＭ＿ＤＣ、１ＫＢ）とを備える。ＣＣＵは、ＡＲＭ＿ＮＯＰＣを使用し、この二種類のリクエストを区別することができる。
性能を更に向上させるために、ＣＣＵはメインキャッシュとＡＲＭ７コアとの間に位置するマイクロ命令語キャッシュ（ＵＩ＿ＣＡＣＨＥ、３２Ｂ）とマイクロデータキャッシュ（ＵＤ＿ＣＡＣＨＥ、３２Ｂ）を付加する。このキャッシュは、それぞれ連続的なコードとデータとからなっている８ワードを有する。これらのマイクロキャッシュは、その自体のタグ（２７ビット）、タグ比較器と有効ビットを有する。有効ビットはシステムリセット期間の間すべてがクリアされる。
ＡＲＭ７マイクロキャッシュは、実際のキャッシュよりはかえってプリフェッチバッファの役割を遂行する。ＡＲＭ７読出の期間の間、アドレス（ＡＲＭ＿Ａ、３１：０＞）は常にタグに比較される。ヒットはＡＲＭ＿ＤＡＴＡ＜３１：０＞を通して命令語またはデータをリードバック(read back)する。その後１つのマイクロキャッシュはアドレス、データタイプ及び他の制御情報と共にリクエストをＣＣＵに送る。ＣＣＵのアービタロジックは、すべてのユニットからのリクエストが読出リクエストを作ることを承認する。現在、承認を得ることにおいて、ＡＲＭ７は他のブロックに対して最上位の優先順位を有する。その理由は、ＡＲＭ７のマイクロキャッシュがミスを持たない限り、ＡＲＭ７がリクエストを作る場合が殆どないからである。しかし、ＣＣＵは多数個のサイクルリクエストまたはアドレスキュー充足条件を提供するために、内部のホールドサイクルを有することができる。この期間の間、外部のリクエストは全然承認されない。
ＡＲＭ７からの記入は、アドレスがＵＤ＿ＴＡＧをヒットする場合、常にＵＤ＿ＣＡＣＨＥを無効化させる。ライト−スルー(write-through)またはライト−バック(write-back)キャッシュとしてＵＤ＿ＣＡＣＨＥを設計することにおいて、何等の試みも行っていない。ＵＤ＿ＣＡＣＨＥ記入ヒット時に無効化させることにより、ＡＲＭ＿ＤＣとＵＤ＿ＣＡＣＨＥ間のデータを一致させることができる。
ＣＣＵはＡＲＭ＿ＩＣまたはＡＲＭ＿ＤＣに読出または記入リクエストを送る間にａｒｍ＿ｎｗａｉｔを制御する。一般的に、ＣＣＵは記入期間の間には、ａｒｍ＿ｎｗａｉｔをホールドさせない。一応、記入リクエストがｃｃｕ＿ｗｒｉｔｅ＿ｈｏｌｄ２を見ないで承認されると、ＡＲＭ７はただ次のサイクルからＡＲＭ＿ＤＡＴＡ＜３１：０＞にあるデータを持ってくる。ＣＣＵはデータを貯蔵するために、内部の記入バッファを有する。ＡＲＭ７は命令語を実行し続けることができる。しかし、ＣＣＵはたとえデータがメインキャッシュにあるとしても、常に１つのサイクルに対してａｒｍ＿ｎｗａｉｔをホールドさせる。もし、読出リクエストがメインキャッシュをミスした場合、データが外部のメインメモリから返還されるまで、更に多いサイクルがホールドされる。図１９に図示したＡＲＭ＿ＣＣＵインターフェース状態のマシンは、ＣＣＵがａｒｍ＿ｎｗａｉｔを制御する条件を説明する。
【００７２】
図１９において：
ＳＴＡＲＴ：リクエストがなく、または読出データが返還されるか、ホールドせず記入リクエストが発生された場合の状態マシンのためのスタート状態。
ＨＯＬＤ：ＣＣＵは読出または記入のためのＡＲＭ７リクエストを承認し、ホールド信号で承認を取り消す。
ＴＡＧ：ＣＣＵは読出アドレスでタグをチェッキングする。
ＭＩＳＳ：読出アドレスは１つのミスを有し、ｃｃｕはリフィールリクエストを外部のｄｒａｍに送る。
ＤＡＴＡ：読出データが返還され、ＣＣＵは返還されたデータをマイクロデータキャッシュへ送る。
２．４．３．３ＦＢＵＳインターフェース
ＣＣＵ＿ＦＢＵＳインターフェース状態マシン（Ｆ＿ＳＭ）は、図２０の図示のとおりである。図２０において：
ＩＤＬＥ：アイドル状態
ＲＥＱ：読出または記入リクエストをＦＢＵＳアービタに送る。
ＧＲＴ１：承認サイズが８Ｂより大きい。
ＧＲＴ２：承認サイズが１６Ｂより大きい。
ＧＲＴ３：承認サイズが２４Ｂより大きい。
ＧＲＴ４：最後のサイクルに対する駆動データ
データ受信状態マシン（Ｄ＿ＳＭ）は、図２１の図示の通りである。図２１において、
ＩＤＬＥ：アイドル状態
ＯＮＥ：Ｆｄａｔａ＜６３：０＞から第１の８Ｂデータを受信する。
ＴＷＯ：Ｆｄａｔａ＜６３：０＞から第２の８Ｂデータを受信する。
ＴＨＲＥＥ：Ｆｄａｔａ＜６３：０＞から第３の８Ｂデータを受信する。
ＦＯＵＲ：Ｆｄａｔａ＜６３：０＞から第４の８Ｂデータを受信する。
ＲＥＦＩＬＬ：データをリクエストに返還する前、ＩＤＣをリフィールする。
ＲＤＹ：データをリクエスターに返還する準備をする。
【００７３】
２．４．４読出及び記入動作
読出及び記入状態マシンは、図２２の図示のとおりである。
２．４．４．１読出動作
ＭＳＰでＩＤＣ(Instruction and Data Cache)は３段のパイプラインサイクル：リクエストサイクル、タグサイクル及びデータサイクルで動作する。キャッシュヒット状況で、ＩＤＣは毎サイクルで命令語またはデータの返還が可能である。
キャッシュコントローラーユニット（ＣＣＵ）は、キャッシュＳＲＡＭアクセスのためにＡＲＭ７、ベクトル処理器ユニット、ＦＢＵＳとＩＯＢＵＳ間の仲裁を担当する。ＣＣＵはこの４個のマスタからのバスリクエストを監視し、特定のＩＤ番号を有する勝者にバスを承認する。ＣＣＵはまたキャッシュをアクセスしタグを比較するために、キャッシュアドレスバスと読出／記入制御信号を発生する。
キャッシュヒットがある場合、仲裁から勝ったバスマスタは、読出／記入動作のためにキャッシュをアクセスすることができる。キャッシュミスがある場合、ＣＣＵはメインメモリから返還される紛失データを待たずリクエストを発生させてから、バスマスタを助けてやる。それで、キャッシュミスを有するバスマスタは、ＩＤ番号を維持すべきである。以降、リクエストされたデータがキャッシュにあると、ＣＣＵはＧＲＡＮＴ信号を同一なＩＤ番号を有するデータを紛失したバスマスタに送る。このバスマスタはデータをアクセプトするかまたは無視する。
キャッシュミスが発生した場合、メインメモリからデータを受けるために、ラインフェッチが遂行される。ラインサイズは６４バイトに定義され、従ってＣＣＵはメインメモリからキャッシュにデータを供給するために、８回の連続的なメモリアクセス（毎回６４ビット）を実行する。
＊リクエストサイクル：
ＣＣＵはＣＬＫ１で多数個のユニット（ＡＲＭ、ＩＦＵ、ＬＳＵ、ＩＯ）から読出リクエストをアクセプトする。リクエスターは、ＣＬＫ１の初期に、リクエスト信号(1su_req)と読出／記入信号(1su_rw)を表示する。ＣＬＫ１の終わりでＣＣＵはｃｃｕ＿ｇｒａｎｔ＿ｉｄ［９：０］を駆動することによって、この読出リクエスト中の１つを承認する。ｃｃｕ＿ｇｒａｎｔ＿ｉｄ［９：６］がリクエスターのｕｎｉｔ＿ｉｄと整合されると、リクエストが承認される。リクエスターはｃｃｕ＿ｇｒａｎｔ＿ｉｄ［５：０］が、リクエストと関連したｔｒａｎｓａｃｔｉｏｎ＿ｉｄであるので、ｃｃｕ＿ｇｒａｎｔ＿ｉｄ［５：０］をラッチしなければならない。
リクエストが承認されると、リクエスターはアドレス（1su_adr[31:0])とＣＬＫ２でキャッシュオフ動作(1sh_ccu_off)及びデータタイプ(1su_vec_type[1:0]、1su_data_type[2:0]）のような他の制御情報をＣＣＵに送る。
ＣＬＫ２の終わりでccu_rd_hold_2が表示されなければ、リクエストは完全にＣＣＵに送られ、リクエストされたデータはしばらく後で返還される。しかし、ccu_rd_hold_2が表示されると、ＣＬＫ１で承認されたリクエストは取り消しつつ、リクエスターは続いてアドレスと制御情報を送る。以前のすべてのgrant_id情報がまだ有効であるので、次のサイクルでは同一な読出リクエストを更に発生させる必要がない。ccu_rd_hold_2はＣＬＫ２でＣＣＵによって解除されるまで、ＣＬＫ１で一定に維持される。
ccu_rd_hold_2はタイミング臨界信号であって、リクエスターでＣＣＵが現在サイクルで他のことを処理することに忙しくて、承認されたリクエストはまだ処理されていないことを知らせることに使用される。
＊タグサイクル
リクエストが承認され、後でリクエストサイクルで取り消されなかった場合、リクエストはキャッシュアクセスのタグ比較段階に入る。ＣＣＵはタグ読出のためのラインを選択するために、アドレス(1su_adr[11:5])とバンク選択信号（リクエスター）を使用する。タグヒット信号(ccu_1su_hit_2)は、ＣＬＫ２の終端で知られる。データはヒット状況のために次のサイクルで復帰される。読出ポートタグが出力され、ＣＬＫによりラッチされる。
また、アドレスキューステータスは、このサイクルで評価される。タグミスと‘almost_full_address_queue’は、‘ccu_rd_hold_2'信号を表示する。ＣＣＵ状態マシンは、或る新しい読出リクエストも処理しないが、中止されたタグ比較を更に試みる。
それぞれのキャッシュライン（６４Ｂ）は２つのベクトルを含むので、タグヒットを得るためにアクセスされたベクトルの有効ビットが有効でなければならない。２倍のベクトル（６４Ｂ）データの読出のためには、タグヒットを得るために、２つの有効ビットが有効でなければならない。cc_off動作は常にタグミスを誘発させ、リクエストはアドレスキューに掲示される。
＊データサイクル
これはＣＣＵがデータをリクエスターに復帰するサイクルである。データはＣＬＫ１で駆動される下位１６Ｂと、ＣＬＫ２で駆動される上位１６Ｂと共に、ccu_dout[127:0]上に乗せられる。６４Ｂデータリクエストの場合、伝送を終結させるために、１つの付加的なサイクルが使用される。
ＣＣＵはデータが次のＣＬＫ１で復帰されることをリクエスターに知らせるために、常にccu_data_id[9:0]をＣＬＫ２の初期の１／２サイクルで駆動する。リクエスターは適切な戻りデータのために、常にccu_data_id[9:0]を比較する。また、戻りデータの指示子としてタグヒットが使用される。
もし、タグサイクルでタグミスがあり、アドレスキューが充満でなければ、ＣＣＵはＣＬＫ１で紛失されたアドレス、ｉｄ情報及び他の制御情報を、４個エントリーアドレスキューに掲示しつつ、キャッシュラインフェッチを始める。現在、それぞれのアドレスキューは、大略６９ビットの情報を含む。ＣＬＫ２でメモリアドレスラッチがロードされ、ＦＢＵＳリクエストが次のＣＬＫ１で発生される。
【００７４】
２．４．４．２記入動作
ＩＤＣで記入動作は、３段のパイプラインサイクル：リクエストサイクル、タグサイクル及びデータ記入サイクルで動作する。記入アドレスヒット状況で、ＩＤＣは毎サイクルでキャッシュデータアレーにデータを記入することができる。
＊リクエストサイクル：
ＣＣＵはＣＬＫ１で多数個のユニット（ＡＲＭ、ＬＳＵ、ＩＯ）から記入リクエストをアクセプトする。リクエスタはＣＬＫ１の初期にリクエスト信号(1su_req)、読出／記入信号(1su_rw)とベクトルタイプ(1su_vec_type[1:0])を表示する。ＣＬＫ１の終わりでＣＣＵは、この記入リクエストの中のいずれか１つを承認する。相互に異なるユニットに対する記入承認は、承認信号(ccu_1su_wr_grant)を直接リクエトユニットに表示することにより実現される。返還されるデータが全然ないので、リクエストユニットがＣＣＵからtransaction_idを受信する必要はない。ＣＬＫ２で、リクエスタはアドレス(1su_adr[31:0])、cc_off信号(1su_ccu_off)及びデータタイプ(1su_data_type[2:0])を供給しなければならない。読出の場合も同様に、たとえ、リクエストは承認されたが、現在サイクルで処理されなかったことをリクエスタに知らせるために、ＣＣＵはＣＬＫ２の終端の近くでccu_wr_hold_2を表示する。リクエスタはccu_wr_hold_2が解除されるまでアドレス、cc_off信号とデータタイプ情報を駆動し続ける。以降、次のサイクルでリクエスタは記入データをccu_dout[127:0]に供給する。
＊タグサイクル
リクエストが承認され、後でリクエストサイクルで取り消されなかった場合、リクエストはキャッシュアクセスのタグ比較段階に入る。このサイクルは記入ポートアドレスタグを比較する。ＣＣＵはキャッシュ用ラインを選択するために、アドレス(1su_adr[11:5])とバンク選択信号（リクエスター）を使用する。タグヒット信号(ccu_1su_hit_2)は、ＣＬＫ２の終わりへ知られる。cc_off記入は、常にタグミスを誘発させ、記入データは外部の記入のためにＦＢＵＳ上に乗せられる。
リクエスタはＣＬＫ１の下位１６ＢとＣＬＫ２の上位１６Ｂにより、ccu_din[143:0]にデータを駆動し始める。６４Ｂデータ転送の場合、データを駆動するためにリクエスタは、１つの付加的なサイクルを取る。ＣＣＵはこのデータをホールドするために、内部の記入データラッチを有する。この記入がキャッシュをヒットさせるか（実際にデータをキャッシュに記入するために、１つまたは２つのサイクルが使用される）、キャッシュをミスさせる場合（データを記入するために、最も少ないサイクルが使用される）、リクエスタは記入が完了されたことと見なす。
＊データ記入サイクル
このサイクルは、キャッシュヒット状況のために、ＣＣＵが実際のデータをキャッシュに記入するサイクルである。タグサイクルでタグミスがある場合、ＣＣＩはこれをデータタイプによって相異するように処理する。
データタイプが３２Ｂで、ラインがクリーン(clean)の場合（２つのベクトルもクリーン）、ＣＣＵはただ現在のラインを、新しいタグと新しいデータをオーバーライトする。また、アクセス中のベクトルを有効及びダーティなものと表示する反面、同一なラインの他のベクトルは無効なものに置いておく。
データタイプが３２Ｂより少ない場合、このサイクルは部分的にデータ記入が行われる。この部分データは、一時的なレジスタに貯蔵される。ＣＣＵは紛失された半ライン（３２Ｂ)をメモリからフェッチしてからロードし、キャッシュに戻す。その後、部分データは適切なバイトイネーブル信号と共にキャッシュラインに記入される。
ダーティキャッシュラインを有するすべての記入ミスに対して、ＣＣＵはまずダーティラインをコピーする。ダーティデータがまだ使用されていないので、ＣＣＵは承認ロジックにホールドを表示し、新しい読出または記入リクエストが承認されないようにする。その後、ダーディキャッシュラインデータをフェッチするために、ダーティラインを使用し内部の読出が始まる。結局、ライトバックアドレス及びデータはメモリに供給される。
【００７５】
２．４．５プログラミングモデル
キャッシュサブシステムのすべては、ロード＆貯蔵命令語を使用したハードウェアで制御されるので、ソフトウェア−可視(software-visible)レジスタを必要としない。
２．４．６ＩＤＣ及びＲＯＭアドレスフォーマットは図２３の図示のとおりである。
【００７６】
第３章ＩＯＢＵＳ説明
本章は、ハードウェアデザイナーが示すＩＯＢＵＳの仕様に関して記述したものである。
３．１概要
ＩＯＢＵＳは、システムで使用される低速の標準的な周辺装置のために設計されたものである。このバスは、ＭＳＰキャッシュ制御ユニット（ＣＣＵ）、ビットストリーム処理器（ＢＳＰ）とタイマー／インタラプトコントローラーと、ＵＡＲＴのようなすべての他のＩＯ周辺装置等間のメインインターフェースの役割を果たす。バスのフォーマットは、インテル社のＩＯバスと非常に類似している。バスアービタ制御ロジックは、リクエストに対しバスを常に監視し、ラウンド−ロビン(round-robin)システムを用い、適切なリクエスト−承認を発生させる。潜在的なバスマスタは、常にバス−リクエストを表示し、バスを占有する前にバス−承認が表示されることを待つ。バスマスタは、常にプロトコルによる期間の間、アドレスと制御ラインとを駆動する。
【００７７】
ＩＯＢＵＳは全体的に４０ＭＨｚで動作する同期バスである。ＭＳＰＩＯＢＵＳ上でのすべての承認は、リクエストがアクチブにサンプリングされてから第１番目のサイクルで発生する。このバスは４個のサイクル（４個のバースト）に対し、１６バイト伝送まで処理可能である。これはバスマスタによりリクエストされた伝送サイズをバスアービタに知らせる２個のサイズビットを使用することによって実現される。
ＩＯＢＵＳは３２ビットアドレスとデータマルチプレクサーを有する。アドレスは常にデータの以前に現れる。ＩＯＢ＿ＡＬＥ（アドレスラッチイネーブル）信号はアドレスをラッチするために、受信装置により使用される。たとえ、８ビットデバイスがバスに連結されても、すべてのバスアクセシングは３２ビット伝送を仮定する。正常的な規則によると、８ビットデバイスは、バスの下位８ビット［７：０］を使用し、１６ビットデバイスはバスの下位１６ビット［１５：０］を使用する。もし１６ビットデバイスが８ビットデバイスとの通信を願うと、８ビットデバイスがデータを探してラッチできるように、バスの下位８ビットに正確なデータを載置すべきである。同一期間に多数個のリクエストがある場合、承認されないリクエスタは、ＩＯＢＵＳアービタが承認するまで、常にそのリクエストをホールドさせなければならない。このようなシステムにおいて許容されたリクエストに対し多い“バス−アクセシングサイクル”すなわち、４＊３２ビット伝送（最大１６バイト）がある。ブロック伝送は、常にそれぞれ多数個の３２ビット伝送に分けられる。
すべてのバス承認は、ＩＯＢＵＳアービタにより発生される。しかし、常にアドレス（有効時）を監視し、目的地に適切なチップ選択（次のクロックサイクルに対して）を発生させる並列デコーディングロジックがある。チップ選択は、常にただ１つサイクルに対して有効であり、以降アドレスがすべての読出及び記入リクエストのために表示される。それぞれのＩＯＢＵＳノードは入力として専用のチップ選択を有する。ピン説明及びタイミング図を参照すること。
２ビットサイズ情報は、バスアービタから承認されてからマスタによって発生され、以降２個のバスサイクルに対して有効である。ＣＳがバス伝送サイクルを決定するために表示されると、選択されたスレーブはサイズ情報を獲得しなければならない。また、読出または記入時、ＩＯＢＵＳアービタは新しいリクエストを探し始める前、バスサイクルが終了されることを判断するための伝送サイズのトラックを維持する。バースト−バス伝送時（読出または記入時）データ間には差異が全然ない。
データ読出伝送において、データが有効な時点をリクエスタに知らせ、このデータラッチを始めるためにＲＥＡＤＹ信号が使用される。このＲＥＡＤＹ信号は、バスマスタとスレーブにより発生される。
このプロトコルを満足させるためには、すべてのＩＯＢＵＳノードは、リクエストを処理する前、ＩＯＢＵＳインターフェースを設計する必要がある。このインターフェースは次のスぺックを満足させなければならない。
【００７８】

【００７９】
３．２ピン説明
以下、バスマスタ側から見たシステムＩＯＢＵＳのためのアドレス、データ及び制御信号の定義を説明する。ＩＯＢＵＳ構造定義を示している図２４を参照すること。上述のごとく、ＩＯＢＵＳは多重化されたアドレス／データバスである。
“ｘｘｘ”はリクエスタ名称（ｃｃｕ、ｂｓｐ、ｕｒｔ、ｔｍｒ、ｉｎｔ）を示す３個の文字コードである。
＊システムＩＯＢＵＳ信号定義
【００８０】
【表２８】

【００８１】
３．３ロジック定義
ＩＯＢＵＳ仲裁制御ユニットは、図２５の図示のとおりである。
３．４ＩＯＢＵＳタイミング
ＩＯＢＵＳ読出タイミング（伝送サイズ＝１ワード（４バイト））は、図２６の図示のとおりであり、ＩＯＢＵＳ記入タイミング（伝送サイズ＝１ワード（４バイト））は、図２７の図示のとおりであり、ＩＯＢＵＳ読出タイミング（伝送サイズ＝４ワード（１６バイト））は、図２８の図示のとおりであり、ＩＯＢＵＳ記入タイミング（伝送サイズ＝４ワード（１６バイト））は、図２９の図示のとおりである。
【００８２】
第４章ＦＢＵＳ説明
本章は、ハードウェアデザイナー側面で、ＦＢＵＳのスぺックを記述したことである。
４．１概要
メモリコントローラー、ＰＣＩ、カストマ注文型半導体及びキャッシュサブシステムは、非多重化されたアドレス及びデータバスラインを通し、システムバス“ＦＢＵＳ”とイオンターフェースする。１つの中央ＦＢＵＳ仲裁制御ロジックはリクエストを監視し、優先順位体系を使用して承認を発生する。バスマスタ（アドレス及びデータソース）は、常にバスリクエストを表示し、承認を待つ。正常状態において、承認はバスをペンディングするリクエストがさらに他のマスタ／スレーブにより使用されない同一なサイクルで発生する（すべての承認は結合的に発生される）。一応マスタがバス承認を受信すると、アドレス／データ／制御ラインは、次のサイクルに送られる。“データ準備”信号は、常に次のサイクルラッチを始めたことを受信器に知らせるために、実質的なデータを処理する。
バス帯域幅を最大に使用するために、４個の連続的なリクエストはパイプライン折り返し(back to back)方式で受信／伝送され、４個のリクエストを提供するために“リクエストＦＩＦＯ”を必要とする。メモリコントローラーは、４個のディープ(deep)リクエストＦＩＦＯと、２個のディープデータＦＩＦＯを有する。このようなプロトコル特性によって、“ＡＦ＿ＦＵＬＬ”と“ＤＦ＿ＦＵＬＬ”信号を必要とする。これらはそれぞれアドレスＦＩＦＯフールとデータＦＩＦＯフールを示す。ＦＢＵＳは承認カウント及びリクエストサイズバスを使用し、８、１６及び３２バイトのデータ伝送を支援する。
【００８３】
それぞれのＦＢＵＳユニットは、バスをリクエストするための制御ロジックを有する。このロジックは応用（メモリ／ＰＣＩ／キャッシュ等）によって、ユニット毎に異なる。しかし、実際のバス仲裁ユニットは各ユニットに対し同一であり、すべてのサブモジュールで重複される。このユニットは、外部バスマスタ／スレーブと内部ユニットロジック間の媒体として作用する。例えば、メモリコントローラーの場合、一応ＣＡＳが活性化されると、メモリコントローラーは、ＦＢＵＳを使用する必要があることを表わす内部信号を通して、内部リクエストをＦＢＵＳ仲裁ロジックに表示する。このリクエストに応信し、ＦＢＵＳコントローラーは、メモリコントローラーに対して外部のシステムにリクエストを表示し、承認を待つ。一応承認が受信されると、アドレス／データ制御は、応信の第１番目のエントリーとメモリコントローラーのデータＦＩＦＯから伝送される。
【００８４】
メモリコントローラーに対するシステムリクエストサイズは、１バイトから最大３２ビットサイズまでできる。３２バイト以上のリクエストサイズの場合、ソース／リクエスタはＦＢＵＳサイズビットを使用し、多数個のリクエストを初期化する。これはＳＤＲＡＭメモリバス（１または２個の(三星 SDRAM 1M*16）の限界に因ることである。ＳＤＲＡＭは残りのシステムにより要求される完全な３２バイトを実現するために、８個のラップ(wrap)の長さに対してプログラムされる。３２バイト以下のリクエストの場合、３２バイトのすべてがＳＤＲＡＭからフェッチされるが、所望の数のバイトのみが目的地に伝送される。
また、１０個のビットリクエスタＩＤバスは、“チップ選択”信号（アドレス／データと同一なサイクル）で有効化される。
すべてのＦＢＵＳノードは、３ビットの“目的地ＩＤ”をＦＢＵＳアービタに発生する。この３ビットはリクエストと共に有効化され、リクエストの目的地を表わす。目的地ＩＤビット［１：０］は、下記のように入力されるリクエスタＩＤからデコーディングされる。
【００８５】
リクエスタＩＤ［９：６］ソース目的地ＩＤ［１：０］
００００予約Ｎ／Ａ
０００１ＡＲＭ７Ｎ／Ａ
００１０ＦＵＮ／Ａ
００１１ＬＳＵＮ／Ａ
０１００ＣＣＵ００
０１０１ＡＳＩＣ１１
０１１０ＭＥＭ０１
０１１１ＰＣＩ１０
１ｘｘｘ予約
目的地ＩＤビット［２］は、読出／記入リクエストステータスを表わすことに使用される。これはＦＢＵＳがアドレスリクエスト（読出）と、アドレス／データリクエスト（記入）間を区別することを助けてやる。
正常状態で、承認カウントビット“ｇｒＣＮＴ［１：０］”は、リクエスターがバスを必要とするＦＢＵＳサイクル数を示す。折り返しリクエストに対し、リクエストはバスマスタにリクエストの長さを知らせる。ＦＢＵＳマスタコントローラーは２個の承認カウントビットによって承認を表示する。
ＦＢＵＳはポストされた読出を支援するスプリットトランザクションバスである。これはリクエスタがバスをリクエストし、一応承認されると、このＦＢＵＳはアドレスを駆動しトランザクションを終了する。しばらく後で、スレーブ／データソースは目的地ＩＤを使用し、かつ同一なリクエスト１１２をリクエスタに戻すことによってデータを戻す。このような特性は、バス帯域幅を大幅向上させ、他のマスタがＦＢＵＳのさらに迅速な使用を許容する。より詳細なことはタイミング図を参照すること。
４．２ピン説明
以下、システムＦＢＵＳのアドレス、データ及び制御信号を説明する。上述のごとく、ＦＢＵＳは非多重化されたアドレス／データバスである。
“ｘｘｘ”はリクエスタ名称（ｍｅｍ、ｐｃｉ、ａｓｃ、ｃｃｕ）を表わす３個の文字コードである。
【００８６】
システムＦＢＵＳ信号定義
【表２９】

【表３０】

【００８７】
図３０は、メモリ読出リクエストＦＢＵＳフローを示したものであり、図３１はメモリ記入リクエストＦＢＵＳフローを示しているし、図３２はマスタ／スレーブ“非メモリ”リクエストＦＢＵＳフローを示したものであり、図３３は中央のＦＢＵＳ仲裁制御ユニットを示したものである。
図３４〜図３６はＦＢＵＳタイミング図であり、図３４はメモリリクエストＦＢＵＳタイミングを示す（８バイトデータ伝送を示しており、１６／３２／６４／１２８バイトの複数個のデータサイクルが使用される）。図３５はメモリ読出リクエストＦＢＵＳタイミングを示し（伝送サイズ＝８バイト）、図３６はメモリ折り返し記入リクエスト（伝送サイズ＝３２バイト）を示したものである。
【００８８】
第５章ＰＣＩバス
本章は、ＰＣＩコアと、内部ＦＢＵＳとインターフェースするＰＣＩグルー(glue)ロジックスぺックを説明したものである。
５．１概要
ＭＳＰ＿１ＥＰＣＩコントローラーは、ＰＣＩバススぺック改正版２．１を満足させるために設計されたものである。より詳細なことはこの標準スぺックを参照すること。
ＰＣＩユニットは、２個のメインセクション：ＰＣＩコアとＦＢＵＳ‘グルー’ロジックを含む。ＰＣＩコアは、主に３３ＭＨｚのＰＣＩバス速度で動作する外部のＰＣＩデバイスとインターフェースする。ＦＢＵＳ‘グルー’ロジックは、８０ＭＨｚで動作する三星ＦＢＵＳとインターフェースする。この‘グルー’ロジックは、ＰＣＩコアとＦＢＵＳ間をインターフェースする。速度同期化は、サブブロックの２個のエンドでＦＩＦＯを利用して実現できる。
三星のＰＣＩコアは、また仮想的なフレームバッファロジックと、ＦＢＵＳを通してＡＲＭ７とインターフェースすることに必要なすべてのＶＦＢレジスタを含む。
このＰＣＩユニットに対し唯一な特徴は、ホストＣＰＵＭＳＰチップと、ＭＳＰチップからホストＣＰＵへのインタラプト処理である。これに対してより詳細に説明する。
【００８９】
５．１．１三星ＰＣＩコアブロック図は図３７の図示のとおりである。
５．２ＰＣＩＦＢＵＳインターフェースロジック（図３８参照）
ＰＣＩコアのサブブロックは、ＭＳＰ内部ＦＢＵＳとＳＡＮＤマイクロのＰＣＩコアとインターフェースする。アドレスとデータは、２個のエンドでＦＩＦＯに貯蔵される。このサブブロックはまた、ＰＣＩ信号とＦＢＵＳクロックを同期化させる役割をする。
ＰＣＩコアロジックは、ＦＢＵＳマスタ及びスレーブデバイスであることもある。大部のアクセスは、６４ビットＦＢＵＳを通してローカルＳＤＲＡＭメモリに向かう。ＦＢＵＳプロトコルの説明のためには、ＦＢＵＳ章を参照すること。
ＰＣＩＦＢＵＳ制御ロジックは、また仮想フレームバッファレジスタと制御とを含む。このレジスタはＦＢＵＳを通してＡＲＭ７によりプログラムされる。ブロック図を参照すること。
５．３ＰＣＩＶＦＢロジック
図３９はＶＦＢブロック図であり、図４０はＶＦＢレジスタである。
５．４ＰＣＩコアロジック
ＭＳＰＰＣＩコアは、ＰＣＩ２．１スぺックを完全に満足する。追加事項はインタラプトとソフトウェアＭＳＰリセットの為に付加されたレジスタ数である。ＡＲＭ７にあるソフトウェア、ＭＳＰ制御レジスタのＭＳＰ（ｂｉｔ＜３＞）からＰＣＩホストインタラプトリクエストをセットすることによって、ホストＣＰＵをインタラプトすることができる。これはＰＣＩバス（ＩＮＴＡ＃）にあるインタラプトピンをセットすることによって、ＰＣＩコアロジックがホストＣＰＵをインタラプトするようにする。以降、ホストＣＰＵは、ＭＳＰ制御レジスタのＰＣＩホストインタラプト認知（ｂｉｔ＜４＞）を通してインタラプトを認知する。これはインタラプトラインがイ非活性状態となるようにする。
ＭＳＰＰＣＩコアはまた基本的に、ＡＲＭ７に対するインタラプトであるホストＣＰＵからのインタラプトを受け取ることができる。ＰＣＩスぺックが任意のインタラプト入力ピンを支援しないので、ＭＳＰ制御レジスタにある、ホストからのＭＳＰインタラプトリクエスト（ｂｉｔ＜２＞）がこの機能を提供することに使用される。ホストＣＰＵは、ＡＲＭ７に対するインタラプトを表わすためにこのビットを設定することができる。次に、一応ホストインタラプトを認知すると、ＡＲＭ７はこのレジスタをクリアーさせる。図４１のブロック図を参照すること。図４１に対して、ＰＣＩ空間でないＭＳＰ領域にマッピングされた３個のレジスタが必要である。
実質的なＰＣＩコアに対する細部的な情報のためには、ＰＣＩ２．１スぺックを参照すること。
【００９０】
第６章メモリコントローラー
６．１本章は、ハードウェアとソフトウェアデザイナー側面で、メモリコントローラーの仕様を説明したものである。
６．２概要
ＭＳＰメモリコントローラーはいく特徴を有し、費用と性能に対するトレードオフのためのプログラム可能性レベルを有する。メモリコントローラーは８０ＭＨｚで動作するメインシステムバス“ＦＢＵＳ”とＤＲＡＭチップとインターフェースする。８０ＭＨｚクロック周波数を実現するために、初期の設計段階で同期ＤＲＡＭが使用される。
結局、メモリサブシステムは、標準高速ページＤＲＡＭ、拡張されたデータ出力（ＥＤＯ）ＤＲＡＭと同期ＤＲＡＭとを支援する。メモリバンクサイズは、インターリーブ可能な２個の外部バンクに制限される。
初期の同期ＤＲＡＭメモリコントローラーは、ＤＲＡＭを動作させることに必要な最小限の特徴を有する。次は基本的な第１パスメモリコントローラーの特徴を示している。
＊三星の同期ＤＲＡＭ支援
＊２個のＳＤＲＡＭチップを使用した１つのメモリバンク（１Ｍ×１６）
＊Ｃａｓ−Ｂｅｆｏｒｅ−Ｒａｓ（ＣＢＲ）リフレッシュ支援
＊読出ー修正ー記入（Ｒｅａｄ−Ｍｏｄｉｆｙ−Ｗｒｉｔｅ）動作を初期化する部分的な記入支援
＊内部のバンクインターリーブ支援（ＭＡ［１１］を通したピンポン）
＊８０ＭＨｚメモリとプロセッサバス（１：１）周波数マッチ
＊プログラマブルリフレッシュ率
＊システムバスを効率的に使用するためのアドレスとデータキューイング
＊マニュアル“２個バンクプリチャージ”支援
ＭＳＰメモリコントローラーは、２個のメインサブ構成要素：データコントローラーとアドレスコントローラーとを有する。データコントローラーは、ＤＲＡＭから読み出されたデータを貯蔵し、プロセッサバスからデータを記入するための読出及び記入データキューを有する。データコントローラーはまた、バイト記入のためのＲＭＷロジックを含む。データコントローラーに対するすべての制御は、アドレスコントローラーから発生する。
アドレスコントローラーは、リクエストキュー、応信ＩＤキュー、メモリアクセスデコーディングロジック、ページ比較器ロジック、ＲＡＳ／ＣＡＳ状態マシン、リフレッシュ状態マシンと、データコントローラーにより使用される必要なすべての制御信号を有する。
ＳＤＲＡＭメモリクロックは、システムクロックと同一である。ＳＤＲＡＭは前記１セットの各制御信号を受信する。
【００９１】
６．２．１メモリコントローラーブロック図は図４２の図示のとおりである。
６．２．２メモリコントローラーフローは、図４３の図示のとおりである。
６．３アドレスコントローラー（ＡＣ）
メモリコントローラーでアドレスコントローラーセクションは、データコントローラーを管理することだけでなく、すべてのＤＲＡＭ制御を発生させる役割をする。ＭＳＰメモリコントローラーのこのセクションは、またＦＢＵＳインターフェースのアドレスと制御経路を担当する。次のブロック図は、アドレスコントローラーユニットの多数個のサブ−セクションを示す。
６．３．１アドレスコントローラーブロック図は図４４の図示のとおりである。
６．３．２メモリコントローラーリクエストＦＩＦＯ
ＭＳＰメモリコントローラーは、実質的なメモリコントローラー状態マシンへのディスパッチ(dispatch)のための、ＦＢＵＳアドレスと制御情報とを貯蔵する４個のディープリクエストＦＩＦＯを有する。リクエストＦＩＦＯのそれぞれのエントリーは、特定エントリーが有効であることを表わす“有効”ビットを有する。メモリコントローラー状態マシンは、常にＥＮＴＲＹ＿０のＦＩＦＯにある最下位エントリーを支援する。一応リクエストが提供され、列アドレスストローブ（ＣＡＳ）が活性化されると、メモリコントローラーはこのエントリーをクリアーさせるために、クリアー信号を表示する。ＦＩＦＯＦＵＬＬ／ＥＭＰＴＹステータスによって、バレルシフトが有効な内容をエントリー０にシフトさせるために初期化される。
ＭＳＰメモリコントローラーリクエストＦＩＦＯフォーマットは、図４５の図示のとおりである。
【００９２】
６．３．３メモリコントローラーアドレスデコード／マップ
アドレスデコーディングロジックは、主に１１ビットのＳＤＲＡＭ行アドレスＭＡ［１０：０］と８ビットの列アドレスＭＡ［７：０］とを発生させる役割をする。このアドレスラインは、ＳＤＲＡＭアドレス入力［１１：０］へ直接駆動される。メモリアドレスビット［１１］は、性能のために内部ＳＤＲＡＭバンクと、改善されたメモリバス使用の間をトグルすることに使用される。
このメモリアドレスは、次のことを表わすレジスタを通して与えられるプログラマブルマルチプレクサーを使用して発生される。
−現在システムキャッシュラインサイズ
−内部バンクの数
−内部バンクインターリービング
システムキャッシュラインオフセットは、３２バイトキャッシュラインに対して５ビットである。図４６は、１６ＭＢＤＲＡＭのためのＦＢＵＳシステムアドレスから発生される提案されたメモリアドレスフォーマットを示している。この多重化されたメモリアドレスは、メモリコントローラー状態マシンによって指示されるＲＡＳとＣＡＳストローブとを有する、１つのサイクルに対して有効である。
ＭＣＵは読出−修正−記入動作を指示せず、８バイト記入を遂行することができる。しかし、ＦＢＵＳアドレスのｂｉｔ［２］は常にアドレスのみをスタートするために、ゼロである。このビットは下記のようにスターティングアドレスを表わす３個ビットの中の１つである、ＳＤＲＡＭアドレスのｂｉｔ［０］にマッピングされる。
Ｆａｄｄｒ［４：２］記入シーケンス（ＷＲＡＰ＝８）
００００−１−２−３−４−５−６−７
０１０２−３−４−５−６−７−０−１
１００４−５−６−７−０−１−２−３
１１０６−７−０−１−２−３−４−５
これらは全部偶数のスターティングアドレスであり、ＭＣＵによって支援されるシーケンスである。
すべての読出動作は３２バイトを仮定し、スタートアドレスは
（０００）＝ｒｎａ［２：０］＝Ｆａｄｄｒ［４：２］である。
【００９３】
６．３．４メモリコントローラー状態マシン
ＭＳＰメモリコントローラーは、１つのマスタコントローラー状態マシンを有する。この状態マシンは、ＳＤＲＡＭ制御信号のためのすべてのタイミング（ＲＡＳ／ＣＡＳ／ＷＥ／ＣＳ／ＤＱＭ）を発生させる役割を担う。状態マシンは、常にエントリー０にある有効エントリーのために、リクエストＦＩＦＯをモニタする。一応、有効ビットが検出されると、状態マシンはＳＤＲＡＭシーケンス開始をキックオフする。また、ＲＡＳプリチャージが必要であるか否かを判断するために、ページ比較器からＰａｇｅ＿ｈｉｔ信号をモニタする。
ＲＡＳプロチャージは、現在のアクチブ／開放バンク上で遂行される。マニュアルプリチャージシーケンスは、ゼロの状態を活性化させるために、ＣＳ、ＲＡＳ、ＷＥとＭＡ［１１］を表示することを含む。内部バンク選択ビットＭＡ［１１］は、プリチャージするためのバンクを選択することに使用される。読出の場合：プリチャージコマンドは、データ衝突を避けるために、データがＳＤＲＡＭから受信されてから表示される。記入の場合、プリチャージは最後のビットのデータがメモリに記入されてから発行される。一応プリチャージコマンドが完了されると、特定バンクは次のメモリ動作のためにアイドル(idle)状態となる。ＳＤＲＡＭスぺックによると、プリチャージコマンドは、ｔＲＡＳ(min)（ここでは６０ｎｓ）が満足されてから何時でも発生できる。しかし、現在４のラップ(wrap)の長さによって、メモリコントローラー状態マシンは、データがメモリに読出／記入されてからプリチャージコマンドを発生させる。
次は、ＭＳＰメモリコントローラーと共に使用されるＳＤＲＡＭパラメータを示す。
【００９４】
【表３１】
ＳＤＲＡＭパラメータ

【００９５】
＊ｔＲＡＳは、同期ＤＲＡＭの６０ｎｓ列アクセスタイムを実現させるために、５サイクルで使用され得る。メモリコントローラータイミング図を参照すること。
６．３．４．１状態マシンダイアグラム
図４７はＳＤＲＡＭメモリコントローラーＲＡＳ／ＣＡＳ状態マシンダイアグラムを示す。
６．４メモリコントローラーリフレッシュ
同期ＤＲＡＭは、それぞれの貯蔵セルにあるデータを維持するために毎３２ｍｓ（１５．６ｕｓ）毎にリフレッシュされる必要がある。同期ＤＲＡＭはまた、２個モードのリフレッシュ：自動リフレッシュとセルフリフレッシュとを支援する。
【００９６】
６．４．１ＳＤＲＡＭ自動リフレッシュ
標準の自動リフレッシュを使用し、２個の内部バンクが内部カウンタにより交番的にリフレッシュされる。行(row)の数が４０９６であるので、自動リフレッシュはＤＲＡＭ全体をリフレッシュするために、２０４８自動リフレッシュサイクルを必要とする。
自動リフレッシュコマンドは、ＣＫＥとＷＥがハイで、ＣＳ、ＲＡＳ＆ＣＡＳがローであることを表示することにより発生される。このコマンドは、２個のバンクがアイドル状態にある場合のみに表示される。
自動リフレッシュを終了することに必要な時間は、
ｔＲＣ(min)／サイクル時間＝100ｎｓ(spec)/12.5ns＝８サイクル(80MHz)
６．４．２ＳＤＲＡＭセルフリフレッシュ
セルフリフレッシュは、三星のＳＤＲＡＭに使用されるさらに別のモードである。これは一般的にデータ維持及び低電力動作のために好ましいリフレッシュモードである。ここでＳＤＲＡＭは、内部クロックとＣＫＥを除いたすべての入力バッファをディスエーブルさせる。
ＣＳ、ＲＡＳ、ＣＡＳとＣＫＥがローでＷＥがハイの場合、セルフリフレッシュモードに入る。セルフリフレッシュモードは、ＳＤＲＡＭクロックのシュッティングとＣＫＥ信号を用いた再試みを要求するので、ＭＳＰメモリコントローラーは、このリフレッシュモードを使用しない。
６．４．３マニュアルリフレッシュ
このリフレッシュモードは、状態マシン／カウンタ設計を要求する。カウンタは１５．６ｕｓ毎にタイムアウトされ、メモリコントローラーロジックでリフレッシュストローブを表示する。次に、メモリコントローラーは、現在のリフレッシュを終了し、すぐＳＤＲＡＭリフレッシュサイクルを初期化させる。このサイクルは、アイドル状態における制限を持たず、全く自動リフレッシュサイクルと類似している。
【００９７】
６．５データコントローラー（ＤＣ）
メモリコントローラーでデータコントローラーセクションは、主にプロセッサからデータを記入するか、またはＳＤＲＡＭからデータを読み出すためのデータキューとして提供される。このコントローラーはまた、すべての部分的な記入時（バイト記入）のための記入併合ロジックを有する。部分的な記入はまず、ＤＲＡＭ読出をキックオフした後データを併合し、最後に完全に修正されたワードをメモリに更に記入する。従って、部分的な記入シーケンス次の任意のリクエストは性能ヒットを取らなければならない。
６．５．１データコントローラーブロック図は図４８の図示のとおりである。
【００９８】
６．６ピン説明
このコントローラーは、次のパッケージピンを提供する。
＊ＲＡＳ＿Ｉ：出力ピン（アクチブロー）。これはＭＡ［１１：０］からの行アドレスを、選択されたＤＲＡＭバンクの内部行アドレスバッファにラッチするための行アドレスストローブである。
＊ＣＡＳ＿Ｉ：出力ピン（アクチブロー）。これはＭＡ［１１：０］からの列アドレスを、選択されたＤＲＡＭバンクの内部列アドレスバッファにラッチするための、列アドレスストローブである。
＊ＷＥ＿Ｉ：出力ピン（記入時アクチブロー）。ＤＲＡＭの記入イネーブル入力ピンを駆動するためのものである。
＊ＭＡ［１１：０］：出力ピン。ＤＲＡＭに対する多重化された行及び列アドレス信号。
＊ＤＱＭ：出力ピン。クロック及び出力をマスクした後、ＳＤＲＡＭデータ出力をハイインピーダンスにする。（このピンは同期ＤＲＡＭインターフェースに対してのみ使用する。）
＊ＣＳ＿Ｉ：出力ピン（アクチブロー）。選択されたＳＤＲＡＭ動作のためにディスエーブルまたはイネーブルされる。（このピンは同期ＤＲＡＭインターフェースに対してのみ使用する。）
＊ＣＬＫ：出力ピン。これは同期ＤＲＡＭに対するクロック出力ピンであって、ＳＤＲＡＭのみで使用され、ＭＳＰのシステムクロックと同じ位相を有する。
【００９９】
６．７メモリコントローラータイミング図は、図４９から図５１の図示のとおりである。図４９に関連した事項は下記のとおりである。
−三星のＳＤＲＡＭに仮定
−８０ＭＨｚで動作するメモリとシステム。
−１個または２個の外部ＳＤＲＡＭ（１Ｍ×１６）。
−メモリからラインをフェッチするための４／８プログラマブルラップの長さ。
−ｔＲＣＤ＝３。
−ｔＣＡＳ＝３。
−内部遅延＝２クロック。
−メモリ待ち時間＝８サイクル（８×１２．５＝１００ｎｓ）．
−メモリからのシステムデータは、仲裁（読出データ）のために２個サイクルほど遅延する。
【０１００】
６．８プログラマブルモデル
プログラマー側面で、メモリコントローラーに関連した制御レジスタは、下記のとおりである。
６．８．１ＳＤＲＡＭリセットレジスタ（Ｒ／Ｗ）
このレジスタは、それぞれのシステムリセット後でリセットされる。これはＳＤＲＡＭパワーオンシーケンスを始めるｒｅｓｅｔ＿ｓｄｒａｍ信号を伝達する１ビットレジスタである。システムリセット時にこのレジスタは１に設定される。ＳＤＲＡＭを動作させるために、ソフトウェアによりこのレジスタをクリアーさせなければならない。
ｂｉｔ０はシステムリセットで設定され、ＳＤＲＡＭを動作させるためにクリアーされる。
プログラミングアドレス：
Ｆａｄｄｒ［３１：２０］＝１２’ｈ０１０
Ｆａｄｄｒ［３：０］＝４’ｂ１０１１
６．８．２ＳＤＲＡＭバーストタイプレジスタ（Ｒ／Ｗ）
このレジスタは、ＳＤＲＡＭバーストタイプをプログラムする。これは順次的なバーストタイプに対しゼロにプログラムされる１ビットレジスタである。
プログラミングアドレス：
Ｆａｄｄｒ［３１：２０］＝１２’ｈ０１０
Ｆａｄｄｒ［３：０］＝４’ｂ１０１０
ｂｉｔ０はシステムリセットとともにリセットされ、ＳＤＲＡＭを動作させるためにクリアーされる。
６．８．３ＳＤＲＡＭリフレッシュレジスタ（Ｒ／Ｗ）
このレジスタは、ＳＤＲＡＭリフレッシュ値をプログラムする。これはＦＢＵＳを通してプログラムされる１２ビットレジスタである。
プログラミングアドレス：
Ｆａｄｄｒ［３１：２０］＝１２’ｈ０１０
Ｆａｄｄｒ［３：０］＝４’ｂ１００１
ｂｉｔ１１−０はシステムリセットとともにリセットされ、４Ｅ０のリフレッシュ値にプログラムされる。
６．８．４ＳＤＲＡＭＲＡＳプリチャージ（ｔＲＰ）レジスタ（Ｒ／Ｗ）このレジスタはＳＤＲＡＭＲＡＳプリチャージ値をプログラムする。これはＦＢＵＳを通してプログラムされる３ビットレジスタである。
プログラミングアドレス：
Ｆａｄｄｒ［３１：２０］＝１２’ｈ０１０
Ｆａｄｄｒ［３：０］＝４’ｂ１０００
ｂｉｔ２−０はシステムリセットとともにリセットされ、１または２または３にプログラムされる。
６．８．５ＳＤＲＡＭＣＡＳ待ち時間（ｔＣＡＣ）レジスタ（Ｒ／Ｗ）
このレジスタはＳＤＲＡＭＣＡＳ待ち時間をプログラムする。これはＦＢＵＳを通してプログラムされる３ビットレジスタである。
プログラミングアドレス：
Ｆａｄｄｒ［３１：２０］＝１２’ｈ０１０
Ｆａｄｄｒ［３：０］＝４’ｂ００１１
ｂｉｔ２−０は、システムリセットとともにリセットされ、１または２または３にプログラムされる。
６．８．６ＳＤＲＡＭＲＡＳＣＡＳ待ち時間（ｔＲＣＤ）レジスタ（Ｒ／Ｗ）
このレジスタはＳＤＲＡＭＲＣＤ待ち時間をプログラムする。これはＦＢＵＳを通してプログラムされる３ビットレジスタである。
プログラミングアドレス：
Ｆａｄｄｒ［３１：２０］＝１２’ｈ０１０
Ｆａｄｄｒ［３：０］＝４’ｂ００１０
ｂｉｔ２−０は、システムリセットとともにリセットされ、１または２または３にプログラムされる。
６．８．７ＳＤＲＡＭＷＲＡＰＬＥＮＧＴＨレジスタ（Ｒ／Ｗ）
このレジスタはデータに対するＳＤＲＡＭのラップ長さをプログラムする。これはＦＢＵＳを通してプログラムされる３ビットレジスタである。
プログラミングアドレス：
Ｆａｄｄｒ［３１：２０］＝１２’ｈ０１０
Ｆａｄｄｒ［３：０］＝４’ｂ０００１
ｂｉｔ２−０は、システムリセットとともにリセットされ、１、２、４、または８の中にいずれかにプログラムされる。
６．８．８ＳＤＲＡＭＮＯＰＴＩＭＥレジスタ（Ｒ／Ｗ）
このレジスタはパワーオンシーケンスのためのＳＤＲＡＭのＮＯＰ時間をプログラムする。これはＦＢＵＳを通してプログラムされる１６ビットレジスタである。
プログラミングアドレス：
Ｆａｄｄｒ［３１：２０］＝１２’ｈ０１０
Ｆａｄｄｒ［３：０］＝４’ｂ００００
ｂｉｔ１５−０はシステムリセットとともにリセットされ、クロック周波数によって２００ｕｓにプログラムされる。
【０１０１】
第７章ＡＳＩＣインターフェース
本章ではＡＳＩＣインターフェースユニットの仕様を説明した。
７．１概要
ＡＳＩＣインターフェースユニット（図５２参照）は、１つのプログラマブル３２ビットＤＭＡ、多数個のＦＩＦＯと制御ブロックを有する。ＡＳＩＣインターフェースブロックは、８０ＭＨｚで動作するメインシステムバス（ＦＢＵＳ）と、ＭＳＰ、ＡＤ１８４３（オーディオ、電話）、ＫＳ０１２２（ビデオキャプチャ）、ＫＳ０１１９とＶＧＡをインターフェースするＣＯＤＥＣインターフェースブロックをインターフェースする。現在の仮定は、任意の同期化問題を避けるために、すべてのＣＯＤＥＣインターフェースとＤＭＡコントローラーを完全なＦＢＵＳ速度で動作させることである。
カストマＡＳＩＣブロックは、３個の主要セクション：ＦＢＵＳマスタ／スレーブインターフェース、ＭＳＰ８−チャンネルＤＭＡコントローラーと実際のＣＯＤＥＣとを有する。データはＦＢＵＳからＣＯＤＥＣに伝達されるか、またはＣＯＤＥＣからＦＢＵＳに伝達される。しかし、アドレスはただＤＭＡコントローラーのみから発生される。そうすると、このアドレスはＦＢＵＳインターフェースロジックでマッピングされたＦＢＵＳである可能性もある。他のＦＢＵＳノードからのすべての記入は、ただＣＯＤＥＣセクションにあるレジスタのみをプログラムする。他のすべてのトラヒックでは、サイズ及びＩＤ情報を有する応信を読み出さなければならない。ＦＢＵＳ仕様を参照すること。
次は、ＡＳＩＣインターフェースユニットに対する特徴である。
＊３２ビット基本ＤＭＡ機能を支援（各コーデックに対し、８個のチャンネル−−１個のチャンネル）。
＊２個の４ディープ×６４ビットデータＦＩＦＯ。
＊１個の１ディープ×５２ビットリクエストＦＩＦＯ。
＊１個の２ディープ×５２ビット応信ＦＩＦＯ。
＊ＦＢＵＳとＣＯＤＥＣインターフェースブロックのためのマスタ／スレーブを支援。
＊動作周波数：８０ＭＨｚまで。
＊メモリに対するＩＯ、ＩＯに対するメモリ間のアクセス支援。
＊ＫＳ０１１９用に使用されるチャンネル０のための最上位優先順位支援。
＊ＫＳ０１１９に対して高性能を実現するために特殊アドレスバス支援。
このカストマインターフェースロジックは、３個の相互に異なるＣＯＤＥＣを支援する。
＊オーディオ及び電話ＣＯＤＥＣ（ＡＤ１８４３）。このＣＯＤＥＣはＤＭＡコントローラーと通信する両方向６４ビットデータバスを有する。（チャンネル４→ＤＡＣ１、チャンネル５→ＤＡＣ２、チャンネル６→ＡＤＣ左側、チャンネル７→ＡＤＣ右側）
＊ビデオキャプチャＣＯＤＥＣ（ＫＳ０１２２）。このＣＯＤＥＣは両方向６４ビットデータバスを有し、ＤＭＡ（チャンネル２）に対してＭ→ＩＯ、ＩＯ→Ｍリクエストを初期化することができる。
＊ビデオバックエンド(backend)ＣＯＤＥＣ（ＫＳ０１１９）。このＣＯＤＥＣはメモリコントローラーからデータを直接に受信する（チャンネル０）。
【０１０２】
ＡＳＩＣインターフェースブロック
７．２直接メモリアクセス（ＤＭＡ）コントローラー
ＤＭＡコントローラーは、アドレス発生及び解釈のために使用されるレジスタを有する。このＤＭＡコントローラーは、８個の独立的なチャンネルを有する。各チャンネルは、現在のアドレスレジスタと停止アドレスレジスタとを有する。開始及び停止アドレスレジスタは、構成ブロックを通して先にプログラムされる。現在のアドレスレジスタは、８個のＣＯＤＥＣの中のいずれか１つからＤＭＡリクエストが発生する時ごとにロードされる。一応、ＦＢＵＳがアクセスを承認すると、このＤＭＡアドレスは、現在アドレスが停止アドレスレジスタとマッチされるまで、サイクルごとに増加する。その時点で、ＤＭＡコントローラーが信号“ＥＯＰ(End Of Process)”を発生する。この信号はプロセスでインタラプトを誘発する。８個のすべてのＤＭＡチャンネルは、マルチプレクサーとアドレス比較ブロックを制御する共通の仲裁ユニットを有する。
このＤＭＡコントローラーは、ＩＯメモリ、メモリとＩＯ、メモリとメモリ間のアクセスを支援する。ＣＯＤＥＣがＤＭＡと通信しようとする時ごとに、ＣＯＤＥＣはＤＭＡ＿ＲＥＱ信号を表示し、ＤＭＡからＤＭＡ認知信号の“ＤＡＣＫ”を待つ。一応、認知されるとＣＯＤＥＣはＭ−ＩＯ信号とデータとを駆動する。ＤＭＡコントローラーは承認されたＤＡＣＫによって、適切なチャンネルを選択する。ブロック図を参照すること。
【０１０３】
７．３ＤＭＡレジスタ説明
７．３．１現在アドレスレジスタ
各チャンネルは、すべてのアドレスが８バイトに配列されることを要求する、２９ビット現在アドレスレジスタ（ｂｉｔｓ＜３１：３＞）を有する。事実上、このレジスタは２９ビットカウンタである。このレジスタはＡＲＭ７によって読み出され、初期値はＦＢＵＳを通してＡＲＭ７からロードされる。このアドレスはデータ伝送サイズに基づいて増加される。現在アドレスレジスタにあるアドレスは、マルチプレクサーを通してＦＢＵＳ上のアドレスをロードするために、アドレス発生ブロックに伝送される。現在アドレスレジスタは、アイドル状態ではアドレス値をホールドさせる。
７．３．２停止アドレスレジスタ
各チャンネルは、すべてのアドレスは８バイトに配列されることを要求する、２９ビット停止アドレスレジスタ（ｂｉｔｓ＜３１：３＞）を有する。このレジスタには、ＦＢＵＳを通してＡＲＭ７により記入される。この値は、比較ブロックで現在アドレスと比較されることに使用される。もし現在アドレスが停止アドレスと一致すると、ＤＭＡコントローラーは、各チャンネルに対して“ＥＯＰ”信号を発生させる。
７．３．３ステータスレジスタ
このレジスタは、各チャンネルが停止アドレスに到達しか否かを表わす情報を貯蔵する。Ｂｉｔｓ＜７：０＞は、どのチャンネルが停止アドレスに到達したかを規定し、ＡＲＭ７がＣＣＵを通して現在アドレスレジスタを初期化した時にリセットされる。
このレジスタはＡＲＭ７により読み出され、ＡＲＭ７はこのレジスタを記入することができない。
７．３．４制御レジスタ
このレジスタは、ＤＭＡコントローラーの動作に対する情報を貯蔵する。Ｂｉｔｓ＜７：０＞は、どのＤＭＡチャンネルが動作のためにイネーブルされたかを規定する。このビットは当該チャンネルが停止アドレスに到達する時ごとにリセットされ、ＡＲＭ７は動作を再開始するためにこのビットを設定する。任意のチャンネルイネーブルビットが“０”の場合、ＤＭＡは例えＣＯＤＥＣがＤＭＡにＤＭＡ＿ＲＥＱを送るとしても、当該ＣＯＤＥＣにＤＭＡ＿ＡＣＫを送らない。Ｂｉｔｓ＜１９：１６＞は、ＤＭＡチャンネルの中のいずれか１対が、ダブル−バッファとして動作するために共に連結されているかを規定する。例えば、チャンネル０とチャンネル１がダブル−バッファとして連結されている場合、チャンネル０の現在アドレスが停止アドレスに到達した場合、ＤＭＡコントローラーは自動的にチャンネル１を切り換え、チャンネル１の現在アドレスが停止アドレスに到達した場合、ＤＭＡコントローラーは自動的にチャンネル０を切り換える。Ｂｉｔ＜２８：２１＞は、各チャンネルの読出／記入モードに関連した情報を貯蔵する。もしこのビットの中の任意のビットがＡＲＭ７によって“１”に設定されると、該当チャンネルは読出動作のために使用され、残りのチャンネルは記入動作のために使用される。Ｂｉｔ＜３１＞は、ＤＭＡがＥＯＰ信号をインタラプトコントローラーに送ったか否かを規定する。もしこのビットが“０”であれば、ＤＭＡは任意のチャンネルが停止アドレスに到達しても、ＥＯＰを送らない。
【０１０４】
７．３．５マスクレジスタ
制御レジスタにある各ビットは、マスクレジスタにあるマスクビットと連結されている。マスクビットが“０”であれば、制御レジスタにある該当ビットがアップデートされることを防止する。初期には、このレジスタ＜３１：０＞は、ＦＦＦＦＦＦＦＦ(hex)に設定される。
７．３．６プログラミング
開始及び停止アドレスは、ＦＢＵＳを通してＡＲＭ７によりプログラムされる。
ＦＢＵＳマッピング値は下記のとおりである。
ＣＣＵ→００４０＿００００−００７Ｆ＿ＦＦＦＦ、
ＭＣＵ→００８０＿００００−０４７Ｆ＿ＦＦＦＦ、
ＰＣＩ→０８００＿００００−ＦＦＦＦ＿ＦＦＦＦ。
アドレスプログラミングにおいて、Ａｄｄｒｅｓｓ［２６：０］は表に基づいて設定される。
【０１０５】
ＤＭＡレジスタアドレスマップ
【表３２】

【０１０６】
【表３３】
ステータスレジスタのエンコーディグ

【０１０７】
制御レジスタのエンコーディング
【表３４】

【０１０８】
７．４ＣＯＤＥＣ初期化
カストマＡＳＩＣユニットは、各ＣＯＤＥＣの初期化を支援する。実質的には、ＡＲＭ７がカストマＡＳＩＣユニットを通してＣＯＤＥＣ初期化を担当する。カストマＡＳＩＣユニットは、各ＣＯＤＥＣに対するリクエスト信号を発生させるためのアドレスデコーダーを有している。カストマＡＳＩＣユニットは、任意のＣＯＤＥＣと通信しようとする時ごとに、ＣＯＤＥＣにリクエスト信号を送り、ＣＯＤＥＣからの認知信号を待つ。認知信号を受信してから、カストマＡＳＩＣユニットは、データとアドレスとをＣＯＤＥＣに送る。
ＡＲＭ７がＣＣＵを通して、任意のＣＯＤＥＣにある構成データを読もうとする場合、カストマＡＳＩＣユニットは、アドレスをＣＯＤＥＣに送る。カストマＡＳＩＣユニットは、ＣＯＤＥＣからのデータを受信すると、トランザクションＩＸをＣＣＵに返還する。この時点で、構成データがＣＣＵを通してＡＲＭ７に伝送される。
【０１０９】
【表３５】
ＣＯＤＥＣ構成レジスタＦＢＵＳアドレスマップ

【０１１０】
図５３は、カストマＡＳＩＣ回路網を示したものである。
４．Ｉ／Ｏピン定義
【０１１１】
カストマＡＳＩＣユニットに対するＩ／Ｏピン定義
【表３６】

【０１１２】
第８章ＡＤ１８４３ＣＯＤＥＣインターフェース
８．１
本章はＡＤ１８４３ＣＯＤＥＣインターフェースに関する説明である。
８．２概要
ＡＤ１８４３ＣＯＤＥＣインターフェースブロックは、ＡＤ１８４３シリアルバーストＭＳＰＤＭＡモジュール間のインターフェースのためのものである。ＡＤ１８４３はシリアルポートを通してデータ及び制御／ステータス情報を送信及び受信する。ＡＤ１８４３は、シリアルインターフェースを担当する４個のピン：ＳＤＩ、ＳＤＯ、ＳＣＬＫ、ＳＤＦＳを有する。ＳＤＩピンは、ＡＤ１８４３に対するシリアルデータ入力のためのものであり、ＳＤＯピンはＡＤ１８４３からのシリアルデータ出力のためのものである。ＳＣＬＫピンはシリアルインターフェースクロックのためのものである。
ＡＤ１８４３内部と外部の通信において、データビットはＳＣＬＫの上昇エッジ以降に伝送され、ＳＣＬＫの下降エッジでサンプリングされることを要求する。ＳＤＦＳピンはシリアルインターフェースフレーム同期のためのものである。ＡＤ１８４３ＣＯＤＥＣインターフェースは、マスタモードに基づいたものであって、これはＳＣＬＫとＳＤＦＳ信号がＡＤ１８４３によって発生されることを意味する。省略時(default)ＳＣＬＫ周波数は、１２．２８８ＭＨｚであり、１つのフレームサイクルは４８ＫＨｚである。
ＣＯＤＥＣインターフェースの基本構造は、ＤＭＡに基づいたものである。ＡＤ１８４３インターフェースは、４個の相互に異なるＤＭＡチャンネル：ＤＡＣ１も対するチャンネル４、ＤＡＣ２に対するチャンネル５、ＡＤＣ左側に対するチャンネル６、ＡＤＣ右側に対するチャンネル７を指定する。ＤＭＡからまたはＤＭＡへのチャンネル伝送サイズは、１回当たり６４ビットである。従って、ＤＭＡチャンネル４とチャンネル５は、２個の相互に異なる３２ビットデータ（左側のための１６ビットと右側のための１６ビット）を伝送する。一方、ＤＭＡチャンネル６とチャンネル７は、１回に４個の相互に異なる１６ビットデータをＣＯＤＥＣインターフェースからＳＤＲＡＭに送る。
ＤＡＣ１とＤＡＣ２インターフェースは、各チャンネルのフラグビットが設定された時、データが有効であることを認識する。ＤＡＣ１とＤＡＣ２インターフェースは、フラグビットをチェッキングしてからＤＭＡを要請する。フラグビットがリセットされると、ＤＡＣ１とＤＡＣ２インターフェースは、ＤＭＡリクエストを発生させない。フラグビットの実際の動作は、ＤＭＡクロックによって制御される。ＤＭＡブロックはフラグビットがリセットされると、ＤＭＡ認知信号を発生させない。ＡＤＣ左側及び右側のＦＩＦＯが満ちていなければ、ＤＭＡリクエストは発生されない。ソフトウェアはＡＤＣフラグレジスタをチェックし、データバスを通して残っているデータを読み出さなければならない。データバスを通してこれらのデータを読出してから、ＦＩＦＯは空くようになり、ＦＩＦＯが満ちるとＤＭＡリクエストを発生させる。
ＡＤ１８４３制御レジスタは、制御ワード入力の制御レジスタアドレスと共に読出／記入リクエストを伝送することによって、読出及び記入される。読出が要請されると、アドレシングされた制御レジスタの内容は、次のフレームの間に伝送され、記入が要請されると、記入されるデータはＡＤ１８４３スロット１に伝送されなければならない。ＭＳＰの性能を向上させるために、プログラマーはＣＯＤＥＣの制御レジスタを読出または記入する前に、制御フラグレジスタをチェックしなければならない。制御フラグレジスタのフラグビットが設定されると、ＣＯＤＥＣレジスタの読出及び記入動作が可能である。
【０１１３】
８．３ＤＭＡチャンネル指定
ＤＭＡチャンネル４ＤＡＣ１左側、右側
ＤＭＡチャンネル５ＤＡＣ２左側、右側
ＤＭＡチャンネル６ＡＤＣ左側
ＤＭＡチャンネル７ＡＤＣ右側
８．４ＤＭＡに対するデータフォーマット
データサイズは６４ビットであり、下記のように構成される。
【０１１４】

【０１１５】
８．５基本アドレス
０４Ｃ０＿４０００ＤＡＣ１ＢＡＳＥ
０４Ｃ０＿５０００ＤＡＣ２ＢＡＳＥ
０４Ｃ０＿６０００ＡＤＣＬＢＡＳＥ（左側チャンネル）
０４Ｃ０＿７０００ＡＤＣＲＢＡＳＥ（右側チャンネル）
８．６レジスタマップ
【０１１６】
【表３７】

【０１１７】
８．７レジスタ定義
８．７．１制御レジスタ記入データ入力

最上位ビット（ＭＳＢ）は、伝送された最初のデータ入力ビットである。
８．７．２制御ワード入力

ｒ／ｗ読出／記入リクエスト。制御レジスタからの読出または制御レジスタへの記入がフレームごとに発生される。“１”に設定したことは、制御レジスタ読出を示す反面、このビットを“０”にリセットさせることは、制御レジスタ記入を示す。
ｉａ４：０読出または記入のための制御アドレスレジスタ
８．７．３制御レジスタデータ出力

以前フレームでアドッレシングされた制御レジスタの内容
８．７．４ＡＤＣフラグレジスタ

ｒ４ｖ−ｒ１ｖ有効ＡＤＣ右側データがバッファにある。バッファにあるどのデータが有効であるかを指示する。
１４ｖ−１１ｖ有効ＡＤＣ左側データがバッファにある。バッファにあるどのデータが有効であるかを指示する。
８．７．５ＡＤＣ左側の第１番目のデータ

バッファにあるＡＤＣ左側の第１番目のデータ
８．７．６ＡＤＣ左側の第２番目のデータ

バッファにあるＡＤＣ左側の第２番目のデータ
８．７．７ＡＤＣ左側の第３番目のデータ

バッファにあるＡＤＣ左側の第３番目のデータ
８．７．８ＡＤＣ左側の第４番目のデータ

バッファにあるＡＤＣ左側の第４番目のデータ
８．７．９制御フラグレジスタ

ｗｆ１制御レジスタ記入フラグ。設定されるとＣＯＤＥＣは制御レジスタデータを受信する準備をする。
ｒｆ１制御レジスタ読出フラグ。設定されるとＣＯＤＥＣは制御レジスタデータを伝送する準備をする。
【０１１８】
第９章ビデオコーデック
９．１概要
ビデオコーデックロジックは、評価(evaluation)ボード上のＫＳ０１１９とＫＳ０１２２チップに対しインターフェースし、ＭＳＰチップにあるＤＭＡモジュールに対してインターフェースする。ＫＳ０１１９ＣＯＤＥＣはまたスクリーンリフレッシュ動作を提供する。この動作のために、ＭＣＵモジュールに対する直接的なデータ経路は、図５４のように具現される。
９．２上位モジュール定義
上位のモジュールは、図５５でのような３個のサブモジュールを有する。
−ＫＳ０１１９スクリーンリフレッシュモジュール
−ＫＳ０１２２ビデオデータキャプチャモジュール
−ＫＳ０１１９及びＫＳ０１２２チップ構成レジスタをアクセスする、３−ワイヤーシリアルホストインターフェースとモジュール。
９．３ＤＭＡチャンネル指定
ＤＭＡＣＨ０ＫＳ０１１９ＣＯＤＥＣ
ＤＭＡＣＨ１予約
ＤＭＡＣＨ２ＫＳ０１２２ＣＯＤＥＣ
ＤＭＡＣＨ３予約
ＤＭＡＣＨ４ＡＤ１８４３オーディオＣＯＤＥＣ
ＤＭＡＣＨ５ＡＤ１８４３オーディオＣＯＤＥＣ
ＤＭＡＣＨ６ＡＤ１８４３オーディオＣＯＤＥＣ
ＤＭＡＣＨ７ＡＤ１８４３オーディオＣＯＤＥＣ
ＤＭＡＣＨ８予約
ＤＭＡＣＨ９予約
９．４．３−ワイヤーホストインターフェースモジュール
このモジュールは、チップ内部のすべてのレジスタが、シリアルインターフェースを通してアクセスされるＫＳ０１１９とＫＳ０１２２チップに対してインターフェースする。３ワイヤーシリアルインターフェースモジュールは、これらのチップに通信プロトコルの機能を支援し、ＫＳ０１１９とＫＳ０１２２インターフェースロジックのためのレジスタを含む。図３を参照すること。
９．５ＥＰＲＯＭインターフェース
ＫＳ０１１９ＩＯピンは、システムがリセットされた後、直ちにプログラムデータをロードすることに使用され、ＭＳＰ−１ＥＸブート初期化の一部の、外部ＥＰＲＯＭに対するインターフェースとして使用される。より詳細なことはピン指定を参照すること。
ＥＰＲＯＭはＣ００００ＨからＤＦＦＦＦＨまでのアドレスでマッピングされたメモリである。
９．６ＫＳ０１１９レジスタ説明
ＫＳ０１１９は０４Ｂ０００００と同一な基本アドレスＣＯＤＥＣ＿ＲＥＱ０を有し、これは０４ＢＦＦＦＦＦまで拡張される。
９．６．１ＫＳ０１１９レジスタアドレスマップ
ＫＳ０１１９レジスタアドレスマップ
【０１１９】
【表３８】

【０１２０】
９．６．２フレームサイズレジスタ
このレジスタは、図５７の図示のとおり、ＣＯＤＥＣチップに伝送されるフレームサイズを制御し、最小フレームの長さは、３バイトである。
９．６．３チップＩＤレジスタ
このレジスタはＣＯＤＥＣチップＩＤ値を貯蔵するが、ＫＳ０１１９記入に対しては０３Ｈ、ＫＳ０１１９読出のためには８３Ｈを貯蔵する。
９．６．４制御／データレジスタ
このレジスタは、次に伝送されるバイトが、レジスタインデックスまたはデータバイトであるという事実を、ＣＯＤＥＣチップＫＳ０１１９に伝える。ＫＳ０１１９に対し、０８Ｈは次のバイトがインデックスであることを、０９Ｈは次のバイトがデータであることを表わす。
９．６．５インデックス／データ０レジスタ
このレジスタは、その以前のバイトに伝送された値によって、ＣＯＤＥＣチップ構成レジスタに対するインデックス値または、データ０バイトを貯蔵する。プログラミング参照部の通信プロトコルを参照すること。
９．６．６データ１レジスタ
このレジスタは、ＣＯＤＥＣレジスタＩｎｄｅｘ＋１に記入されるデータを貯蔵する。
９．６．７データ２レジスタ
このレジスタは、ＣＯＤＥＣレジスタＩｎｄｅｘ＋２に記入されるデータを貯蔵する。
９．６．８データ３レジスタ
このレジスタは、ＣＯＤＥＣレジスタＩｎｄｅｘ＋３に記入されるデータを貯蔵する。
９．６．９ＫＳ０１１９ロジック制御レジスタ
ＫＳ０１１９制御レジスタに対するビット指定は、図５８の図示のとおりである。
９．６．１０ＨＳ及びＶＳ極性
このレジスタは、水平同期と垂直同期信号の極性を定義する。０値はアクチブローに定義される一方、１値はアクチブハイに定義される。ビット指定は下記のとおりである。
Ｂｉｔ＜０＞：ＶＳ極性
Ｂｉｔ＜１＞：ＨＳ極性
９．６．１１ＨＳオフセット
アクチブ信号は、このオフセット値以降に発生され、このオフセット値は００Ｈに定義される。
９．６．１２ＶＳオフセット
アクチブ信号は、このオフセット値以降に発生され、このオフセット値は００Ｈに定義される。
９．６．１３ステータスレジスタは図５９の図示のとおりである。
９．６．１４読出データシリアルインターフェースレジスタ
このレジスタは、読出フラグがビジー(busy)状態から準備(ready)状態への遷移を表してから、シリアルポートからの有効データを貯蔵する。
９．６．１５読出ＰＲＯＭデータレジスタ
このレジスタはＰＲＯＭフラグが準備状態の場合、有効データを貯蔵する。
９．６．１６プログラミングレ参照
９．６．１６．１構成及び初期化
ビデオディスプレーハードウェアは、２種類のモードすなわち、ＶＧＡオーバーレーモードとＶＧＡエミュレーションモードに動作するように製作される。このモード動作は、ロジック制御レジスタにあるビットを設定することにより制御される。
ＭＳＳＥＬ：ＶＧＡオーバーレーモードの場合０、
ＶＧＡエミュレーションモードの場合１。
ＶＧＡオーバーレーモードでは、ＰＣシステム上にＶＧＡカードの存在が要求される。
−モニタケーブルは、ＭＳＰカードに連結される。
−支援されるＶＧＡ解像度は８００×６００までである。
ディスプレーバッファは、ＶＧＡセッティングと同じサイズであることが要求される。
【０１２１】
ソフトウェアによりＶＧＡフレームバッファで、カラーキー四角領域が満たされるビデオウインドーを設定するために、ビデオデータはＭＳＰＳＤＲＡＭでＶＧＡフレームバッファにある四角領域と同一サイズと位置の四角領域に記入されなければならない。図６０を参照すること。
ＫＳ０１１９チップはカラーキーを認識し、ＶＧＡ入力ポートをビデオ入力ポートに切り換える。ソフトウェアによりＤＭＡチャンネル０スタートアドレスを、ＳＤＲＡＭビデオ出力バッファ上位の左側に設定し、ＤＭＡレコードの長さは、ＶＧＡカードに設定された解像度とビデオデータで使用された画素当りのビット（４：２：２＝画素当り１６ビット）によって設定される。
９．６．１６．２ＫＳ０１１９に対するシリアルプロトコル３−ワイヤーインターフェース
ＫＳ０１１９チップにある構成レジスタを設定する場合、プロトコルは下記のとおりである。
−周辺チップに伝送されるためには、最少に２個のフレームが必要である。
−第１番目のフレームは、構成レジスタのインデックスを設定するためのものである。
−第２番目のフレームは、データ（レジスタの内容）の読出または記入のためのものである。
ソフトウェアによりフレームサイズレジスタを適切な長さで設定し、シリアルアクセスビットを１に設定する。そうすると、フレームサイズレジスタを変更する前、フレームに必要なすべてのバイトをソフトウェアによりロードし、ＣＯＤＥＣインターフェースロジックは、フレームシリアル化が開始される前、すべてのバイトがロードされる時まで待つ。
第１番目に伝送されるフレームは、インデックスを設定するためのもので、フレームサイズは３である。図６１を参照すること。
【０１２２】
第２番目のフレームは、レジスタを設定するためのもので、フレームサイズは３である。
各データバイトの以降、チップはインデックスを１ずつ自動に増加させ、これは複数バイトのデータを４個データバイトまで支援するＣＯＤＥＣインターフェースロジックに伝送することによって、連続的なレジスタを設定することを可能にする。
読出または記入動作が遂行された場合、ソフトウェアにより読出動作時に、有効データのためのステータスレジスタの読出及び記入フラグをチェックするか、次のフレームを伝送する前、記入フラグ＝準備(ready)であるかをチェックする。
次の例は、ＫＳ０１１９データシートを設定する段階を示している。
２個のレジスタが連続的なインデックスを有するので、この二バイトは単一フレームにロードされ得る。まず、インデックスは下記のとおり設定されなければならない。
−８３Ｈ値（フレームサイズ＝３、シリアルアクセスビット設定）を有するロードフレームサイズレジスタ（Address=04B0_0000H)
−０３値を有するロードＩＤレジスタ（Address=04B0_0001H)
−ロードデータ／制御バイト：ＫＳ０１１９に、次のバイトがインデックスであることを知らせる０８Ｈ値（Address=04B0_0002H)
−６Ｈ値を有するロードインデックスレジスタ（Address=04B0_0003H)
シリアルインターフェースは、フレームサイズレジスタにある内容の一致可否を検出しフレーム伝送を開始し、ステータスレジスタにある記入フラグは、ビジー(busy)状態に設定される。次のフレームを伝送する前、ソフトウェアによりステータスレジスタにあるフラグをチェックする。フラグが準備状態であれば、ソフトウェアにより次のフレームのための値をロードすることができる。
９．７ＫＳ０１２２レジスタ説明
ＫＳ０１２２は０４Ｃ０２０００に該当する基本アドレスを有し、これは０４２０２ＦＦＦまで拡張される。
９．７．１ＫＳ０１２２レジスタアドレスマップ
【０１２３】
【表３９】

【０１２４】
９．７．２フレームサイズレジスタ
このレジスタは、図６２に定義されたように、ＣＯＤＥＣチップに伝送されるフレームサイズを制御し、最小フレームの長さは３バイトである。
９．７．３チップＩＤレジスタ
このレジスタはＣＯＤＥＣチップＩＤ値を貯蔵するが、ＫＳ０１２２記入に対しては０４Ｈ、ＫＳ０１２２読出のためには８４Ｈを貯蔵する。
９．７．４制御／データレジスタ
このレジスタは、次に伝送されるバイトが、レジスタインデックスまたはデータバイトであるとの事実をＣＯＤＥＣチップＫＳ０１２２に伝える。ＫＳ０１２２に対し、００Ｈは次のバイトがインデックスであることを、０１Ｈは次のバイトがデータであることを表わす。
９．７．５インデックス／データ０レジスタ
このレジスタは、その以前のバイトに伝送された値によって、ＣＯＤＥＣチップ構成レジスタに対するインデックス値或いはデータ０バイトを貯蔵する。プログラミング参照部の通信プロトコルを参照すること。
９．７．６データ１レジスタ
このレジスタは、ＣＯＤＥＣレジスタＩｎｄｅｘ＋１に記入されるデータを貯蔵する。
９．７．７データ２レジスタ
このレジスタは、ＣＯＤＥＣレジスタＩｎｄｅｘ＋２に記入されるデータを貯蔵する。
９．７．８データ３レジスタ
このレジスタは、ＣＯＤＥＣレジスタＩｎｄｅｘ＋３に記入されるデータを貯蔵する。
【０１２５】
９．７．９ＫＳ０１２２ロジック制御レジスタ
ＫＳ０１２２制御レジスタに対するビット指定は、下記のとおりである。
ｂｉｔｓ＜１：０＞
００４：２：２フォーマット
０１４：１：１フォーマット
１０ＣＣＩＲ６５６フォーマット
９．７．１０ステータスレジスタ
ビット＜０＞：フィールドステータース
０：偶数フィールド
１：奇数フィールド
ビット＜１＞：ＶＳステータス
０：１から０までのＶＳ
１：０から１までのＶＳ
９．７．１１読出データシリアルインターフェースレジスタ
このレジスタは、読出フラグがビジー(busy)状態から準備(ready)状態への遷移を表してから、シリアルポートからの有効データを貯蔵する。
９．７．１２ＫＳ０１２２に対するシリアルプロトコル３−ワイヤーインターフェース
ＫＳ０１２２チップにある構成レジスタを設定する場合、プロトコルは下記のとおりである。
−周辺チップに伝送されるためには、最少２個のフレームが必要である。
−第１番目のフレームは、構成レジスタのインデックスを設定するためのものである。
−第２番目のフレームは、データ（レジスタの内容）の読出または記入のためのものである。
【０１２６】
ソフトウェアによりフレームサイズレジスタを適切な長さに設定し、シリアルアクセスビットを１に設定する。そうすると、フレームサイズレジスタを変更する前、フレームに必要なすべてのバイトをソフトウェアによりロードし、ＣＯＤＥＣインターフェースロジックは、フレームシリアル化が開始される前、すべてのバイトがロードされる時まで待つ。
第１番目に伝送されるフレームはインデックスを設定するためのもので、フレームサイズは３である。図６３を参照すること。
第２番目のフレームはレジスタを設定するためのもので、フレームサイズは３である。
各データバイトの後、チップはインデックスを１ずつ自動に増加させ、これは複数バイトのデータを４個データバイトまで支援するＣＯＤＥＣインターフェースロジックに伝送することによって、連続的なレジスタを設定することが可能にする。
読出または記入動作が遂行された場合、ソフトウェアにより読出動作時に、有効データのためのステータスレジスタの読出及び記入フラグをチェックするか、次のフレームを伝送する前、記入フラグ＝準備(ready)であるかをチェックする。
次の例は、ＫＳ０１２２データシートを設定する段階を示している。
クロマキーバイト０とバイト１に対する値を設定するために、このレジスタのためのインデックスは、バイト０に対し６ＡＨ、バイト１に対し６ＢＨである。ＫＳ０１２２データシートを参照すること。
２個のレジスタが連続的なインデックスを有するので、この二バイトは単一フレームにロードされ得る。まず、インデックスは下記のとおり設定されなければならない。
【０１２７】
−８３Ｈ値（フレームサイズ＝３、シリアルアクセスビット設定）を有するロードフレームサイズレジスタ（Address=04B0_0000H)
−０３値を有するロードＩＤレジスタ（Address=04B0_0001H)
−ロードデータ／制御バイト：ＫＳ０１２２に、次のバイトがインデックスであることを知らせる０８Ｈ値（Address=04B0_0002H)
−６Ｈ値を有するロードインデックスレジスタ（Address=04B0_0003H)
シリアルインターフェースは、フレームサイズレジスタにある内容の一致可否を検出してフレーム伝送を開始し、ステータスレジスタにある記入フラグはビジー(busy)状態に設定される。次のフレームを伝送する前、ソフトウェアによりステータスレジスタにあるフラグをチェックする。フラグが準備状態であれば、ソフトウェアにより次のフレームのための値をロードすることができる。
【０１２８】
第１０章ビットストリーム処理器
１０．１
本章は、ビデオデータ圧縮及び伸長応用のための主要ＭＳＰ処理エンジン中の１つのビットストリーム処理器（ＢＰ）を設計するための、機能的な要求条件を説明する。
１０．２略語
Ａ／Ｖオーディオ及びビデオ
ＢＰビットストリーム処理器（ＭＳＰブロック）
ＣＣＵキャッシュ制御ユニット（ＭＳＰブロック）
ＣＩＦ２９．９７Ｈｚで３５２×２８８の輝度サンプル解像度を有する、共通中間フォーマット
ＤＣＴ離散余弦変換
ＤＭＡ直接メモリアクセス
ＤＳＭデジタル貯蔵メディア
ＦＢＵＳ速いバス（ＭＳＰ内部データバス）
ＧＯＢブロックグループ
ＧＳＴＮ一般スイッチテレフォンネットワーク（既に公知されたＰＳＴＮ）
ＨＤＤハードディスクドライバー
Ｉ／Ｆインターフェース
ＩＯＢＵＳ入出力バス（ＭＳＰ内部周辺バス）
ＩＴＵ−Ｔ−６０１２９．９７Ｈｚで７２０×４８０と、２５Ｈｚで７２０×５７６のそれぞれのサンプル解像度を有する、カラーテレビジョン信号のデジタルコーディング用のテーブル基準（以前はＣＣＩＲ６０１とも呼ばれる）。しかし、ディスプレー解像度は、７２０×４８０または７０４×４８０である場合もある。
ＬＳＢ最少ビット
ＬＵＴルックアップテーブル
ＭＰＥＧモーション映像専門家(expert)グループ
ＭＳＢ最大有意ビット
ＭＳＰ三星マルチメディア信号処理器
ＱＣＩＦ２９．９７Ｈｚで１７６×１４４の輝度を有するＱｕａｒｔｅｒ＿ＣＩＦ
ＲＬＣＲＵＮ＿長さ及びレベルコード
ＳＤＲＡＭ同期ダイナミックランダムアクセスメモリ
ＳＩＦＮＴＳＣ用２９．９７Ｈｚで３５２×２４０及び、ＰＡＬ用の２５Ｈｚで３５２×２８８の輝度解像度を有する、ＭＰＥＧー１ビデオテーブル基準用の情報入力及びフォーマット
ＴＳＤ定義される
ＶＬＣ可変長さコード
ＶＰベクトルプロセッサ（ＭＳＰブロック）
１０．３主要特徴
ＭＰＥＧ−１、ＭＰＥＧ−２、Ｈ．２６１及びＨ．２６３のエンコーディング及びデコーティング応用とスライス（またはＧＯＢ）層をなして解釈する構文を支援する。
＊実時間でＲＬＣ処理を遂行
＊ＭＰＥＧ−１、ＭＰＥＧ−２、Ｈ．２６１及びＨ．２６３ビデオ標準にあるすべてのハフマンテーブルを用いて実時間でハフマンコード処理を遂行。
＊２個の順方向／逆方向ジグザグスキャン変換方式を支援。
＊７３１．４Ｍｂｉｔｓ／ｓｅｃ(32-bit＠40MHz）の最大伝送レートでインターフェースするＩＯＢＵＳ
＊最大動作クロック周波数は４０ＭＨｚ
＊ハフマンコーデックルック−アップテーブルのための９．２ＫｂｉｔＲＯＭを含む。
＊３２０ｂｙｔｅ内部ＳＤＲＡＭを含む。
＊先占(pre-emptive)及び協力文脈スイッチングモードを支援
＊制御経路のための目的ゲート計算は、６Ｋｇａｔｅｓ＋ＲＡＭ及びＲＯＭ
【０１２９】
１０．４概要
ビットストリーム処理器（ＢＰ）は、４個のＭＳＰ内部周辺装置の中の１つである。これは、ビデオ圧縮及び復元状態の数ビットストリームを支援するために、ハードウェア組織ブロックである。
このような装置は特に、ＭＳＰ内部のＶＰ及びＡＲＭ７がこのようなビットの操作に効率的なアーキテクチャーを有していないので、ビット＿レベル処理のために設計された。
このようなＢＰは、７３１．４Ｍｂｉｔｓ／ｓｅｃの最大伝送速度を有するＩＯＢＵＳと呼ばれる３２ビットバスを通してデータを送受信する。
そしてＢＰは、独立的な処理装置として動作し、ＡＲＭ７またはＶＰのソフトウェアにより制御される。
その上、特にＢＰはスライスまたはＧＯＢ及び、その以下に含まれているすべての情報をエンコーディング及びデコーディングし、そしてＣＣＵから／にデータを送受信する。前記のＢＰはまた、順方向及び逆方向ジグザグ変換を遂行し、差動ＤＣ係数をエンコーディング及びデコーディングする。
さらに、このようなＢＰはデコーディングで差動モーションベクトルを使用してモーションベクトルを復元し、２個の特殊なモードすなわち、ＭＰＥＧ−２エンコーディングでデュアル＿プライムモードと、Ｈ．２６３エンコーディング及びデコーディングで予測モードを除いて、エンコーディングではその反対の動作を遂行する。
もし、ＢＰが単純なモードで動作すると仮定すると、ＢＰは一応スライスまたはＧＯＢを処理しつつ始め、ＢＰはスライスまたはＧＯＢ処理が完了されてからインタラプトされる。このような動作は、全二重モードがスライスまたはＧＯＢをインタリーブによりエンコーディング及びデコーディングすることにより遂行される。
もし、ＡＲＭ７がＢＰを他の作業に瞬間的にスイッチングさせることを願うと、ＢＰは現在のスライスまたはＧＯＢが完了される前、ＢＰ過程を完了する先占文脈切換モードを支持するようになる。
【０１３０】
図３はＢＰのブロックダイアグラムを示したものである。
図３の図示のとおり、ＢＰは５個のブロックＩＯＢＵＳインターフェース装置、ＶＬＣＦＩＦＯ装置、ＶＬＣＬＵＴＲＯＭ、制御状態マシン及びＢＰコア装置とを含む。入出力データは、１６×３２ビットラムを含む、ＩＯＢＵＳインターフェース装置により動作される。これはすべてのデータ移動及びインタラプト要求を支援する。ＶＬＣＦＩＦＯ装置は、データ復号化動作のために、次のデータワードを準備し、そしてデータ符号化動作のために、出力データパッキングを遂行する。
ＶＬＣルックアップテーブルロムは、すべてのハフマンコードの処理のために、すべての必要な情報を貯蔵する７６８×１２ｂｉｔのサイズを有する。制御状態マシンを設計する時、すべてのエンコーディング及びデコーディングを制御する。ＢＰコア装置は加算器、比較器、バレルシフター、レジスタファイル及び、１２８×１６ビットのＲＡＭを含む小さいプロセッサである。ビット操作は、前記のコアに有用である。
【０１３１】
１０．５信号定義
ＢＰ外部インターフェースに要求される信号は、表４５に示している。文字“１”の最後にある信号は、アクチブ＿ローを示す。
テーブル１の“方向”コラムで“Ｂ”、“Ｉ”“Ｏ”は両方向信号であって、入力信号及び出力信号とをそれぞれ意味する。
【０１３２】
ＢＰ信号定義
【表４０】

【０１３３】
１０．６エンコーディング／出コーティング用のデータ流れ図
ここで例えば、代表的なビデオエンコーディング及びデコーディング
応用のデータ流れを含む。ここでは、オーディオデータの流れについては
詳細に記述しない。
１０．６．１エンコーディングの場合
段階Ｅ１：ロー（ＲＡＷＡ／Ｖデータ入力）
普通の入力ビデオ及びオーディオ信号がサンプリングされ、外部コーデックによりデジタル化され、そして使用者ＡＳＩＣに供給される。しかし、マルチメディアＰＣ環境で、あるＶＧＡ制御ボードはまたフレーム捕獲子(grabber)とサウンドキャップチャを含む。従って、ロー(RAW)Ａ／Ｖデータは、使用者ＡＳＩＣまたはＰＣＩバスインターフェースの中のいずれか１つから伝達される。カストマＡＳＩＣまたはＰＣＩバスは、３２ＢＹＴＥＳの小さいバッファを含む。このバッファにあるデータはＤＡＭ操作を利用したＦＢＵＳを通して、外部ＳＤＲＡＭに伝達される。このようなデータ移動は、電源がリセットされてからＡＲＭ７により初期化される。
段階Ｅ２：ＶＰによるプリフィルタリング
まず、ＶＰはＶＰデータ開始（一般的にスクラッチパッド領域）のＳＤＲＡＭに貯蔵されたイメージデータをフェッチする。そして、ＶＰはこのような画素を一時的にフィルタリングしてから空間をスケーリングする。プリフィルタリングしてから、映像の解像度は正常的にＩＴＵ＿Ｔ＿６０１サイズからＣＩＦまたはＱＣＩＦサイズに変換される。このＶＰは、外部ＳＤＲＡＭに対するプリフィルターされた結果を記録する。
【０１３４】
段階Ｅ３：ＶＰによるデータ圧縮
ＶＰは、これに対応する標準に提示された法則によって圧縮が遂行されるように、さらにＶＰデータキャッシュの中にＳＤＲＡＭのプリフィルターされたデータをフェッチする。正常的にＶＰは、順方向ＤＣＴ／順方向適応量子化、モーション予測、マクロブロックタイプ決定などを遂行する。
このような過程を遂行してから、ＶＰは更にＶＰデータキャッシュへ適当なヘッド情報を有する結果を記録しなければならない。実際的に、このＶＰデータキャッシュ領域は、ＢＰ入力バッファとして利用される。バッファの状態を検査するために、フラグ信号が利用される。
段階Ｅ４：ＡＲＭ７によるＢＰ初期化
実際的にＢＰが動作される前、ＡＲＭ７はＢＰの初期レジスタを初期化しなければならない。
このような初期化は、パワーオンリセット信号が印加されてから、１２８サイクルの間には遂行されない。特にＡＲＭ７は、入出力バッファアドレス及びＢＰ命令レジスタを初期化させなければならないし、スライスまたはＧＯＢ内で符号化されたマクロブロック数を指定しなければならない。
このようなレジスタを初期化してから、ＡＲＭ７はＢＰ過程を遂行するようにＢＰイネーブルフラグをセットしなければならない。
段階Ｅ５：ＢＰによりビットストリーム過程
もし、入力２個のバッファの中のいずれか１個がフール(full)の場合、ＢＰはＩＯＥＪＳを通してデータを読み込み始める。すなわちＢＰは、バッファがフールの場合のみデータが読み込める。そして、ＢＰはジグザグフォーマットで８×８ブロックデータを変換させる。そして、その結果は直接にＲＬＣ及びハフマン符号化される。
このようなハフマン符号化された結果は、ＡＲＭ７データキャッシュまたはＳＤＲＡＭ中の中のいずれか１つに伝送され得る。ＢＰは、前記のバッファがオーバーフローされないように空いている場合のみ、出力バッファに書き込むべきである。この過程の最後の例を挙げれば、処理されたマクロブロック数がＡＲＭ７により指定されたマクロブロック数と同一な場合、ＢＰは最後のデータのバイト及び位置でＡＲＭ７とインタラプトするようになり、現在スライスまたはＧＯＢ過程を終了する。
【０１３５】
段階Ｅ６：ＡＲＭ７によりビットストリーム形成とＡ／Ｖマルチプレキシング
ＡＲＭ７はハフマン符号化されたデータ及び構文パラメータを結合し、最後のビットストリームを作り、その過程を反復する。
そしてＡＲＭ７はまた、スライスまたはＧＯＢマルチプレクスオーディオ及びビデオビットストリームの上部層と操作することができる。この結果は、ＡＲＭ７によりＳＤＲＡＭに書かれる。
段階Ｅ７：ＶＰによるネットワークインターフェース（ビデオ会議用選択）ビデオ電話またはビデオ画像会議の応用のために、前記の段階６まではＶＰがＨ．３２４ＧＳＴＮビデオ電話用のＶ．３４モデムまたは、Ｈ．３２０ＩＳＤＮビデオ会議端末用の１４００系列のインターフェースのようなネットワークインターフェースがその機能を遂行してきた。
段階Ｅ８：最後のビットストリーム出力
ＳＤＲＡＭに貯蔵された最後のビットストリームは、カストマＡＳＩＣまたはＰＣＩの中のいずれか一つに伝送される。正常的に使用者ＡＳＩＣブロックはネットワークインターフェースに使用され、そしてＰＣＩバスインターフェースは、記録装置（例えば、ＨＤＤ）データ貯蔵のために利用される。
このデータが移動する時には、ＡＲＭ７により初期化されたＤＭＡデータ伝送を利用する。
【０１３６】
１０．６．２デコーディングの場合
段階Ｄ１：ビットストリームフェッチ
マルチメディア環境で圧縮されたビットストリームは、ＣＤ−ＲＯＭドライバー、ＨＤＤ及びネットワークインターフェースの中のいずれか一つから供給される。
従って、このビットストリームは、カストマＡＳＩＣまたはＰＣＩバスの中のいずれか１つとなる。カストマＡＳＩＣまたはＰＣＩバスの３２ｂｙｔｅに貯蔵されたデータは、ＤＭＡを利用したＳＤＲＡＭに伝送される。
段階Ｄ２：ＶＰによるネットワークインターフェース（ビデオ会議用選択）ビデオ会議において、データはまず、ＶＰによりＶ．３４または１４００系列のネットワークインターフェースルーチンが遂行される。ＶＰはＳＤＲＡＭに対する結果を書き込む。
段階Ｄ３：ＡＲＭ７によりＡ／Ｖディマルチプレキシング及びヘッダ分析ＡＲＭ７はＳＤＲＡＭ内のデータをＡＲＭ７データキャッシュに移動させ、Ａ／Ｖビットストリームディマルチプレキシングを遂行する。ビデオビットストリームのためにＡＲＭ７はまた、すべてのスタートコードを検索し、そしてスライスＧＯＢが検出するまでヘッダを分析する。ＡＲＭ７は復号化されたビットストリーム構文パラメータを、ＡＲＭ７によりＳＤＲＭの特別領域に貯蔵させる。ディマルチプレキシングされたオーディオ及びビデオビットストリームは、ＳＤＲＡＭにあるレートバッファにそれぞれ伝送される。各動作のために、レートバッファのサイズを異にしてもよい。例えば、ビデオ速度バッファサイズのために、ＭＰＥＧ−１は３７０Ｋｂｉｔｓに、ＭＰＥＧ−２は１．８３５Ｍｂｉｔｓに勧告する。
【０１３７】
段階Ｄ４：ＡＲＭ７によるＢＰ初期化
この段階の遂行は、ただ符号化されたマクロブロック数に対してレジスタの初期化を要求しないことを除いては、以前のサブセクションの段階Ｅ４と同様である。すなわち、初期化はパワーオンリセット信号が印加されてから、１２８サイクルの間に遂行されてはいけない。
段階Ｄ５：ＢＰによるビットストリーム過程
特別なスライスまたはＧＯＢのためにＢＰを初期化させてから、復元されたデータは２個のバッファに伝送する。
ＢＰはフールフラグの状態を検査するＩＯＢＵＳを通してデータを読み込む。ＢＰはもし、入力データがヘッドワードを含んでいると、構文パラメータを分析する。
もし、ＢＰが続く次のビットをハフマンコードで認識すると、各ハフマンコード用の最上４サイクル以内にハフマンデコーディングを遂行する。もし、ハフマンデコードがＤＣＴＡＣ係数であれば、ハフマンデコードされた結果が、６４画素成分を表わすデコードされたＲＬＣとなる。
再現画素はこれとは反対に、ジグザグに変換され、そして最後にＶＰが順方向量子化を遂行するように、２個の出力バッファに伝送される。ＢＰはスライスまたはＧＯＢでない初期コードを検出してから、このような過程を遂行し続ける。もし、これが検出されなければ、ＢＰは最後に使用されたデータに対して、バイト及びビットの位置情報を有するＡＲＭ７とインタラプトさせる。そうすると、ＡＲＭ７は次のスライスまたはＧＯＢスタートコードを検索し、このような過程を繰り返す。
【０１３８】
段階Ｄ６：ＶＰのデータ復元
段階Ｄ５の結果を使用し、ＶＰは逆量子化、逆ＤＣＴ及びモーションベクトルを利用した映像再現を遂行する。符号化過程を完了してから、ＶＰはＳＤＲＡＭの中にその結果を貯蔵する。
段階Ｄ７：ＶＰの以降の過程
ビデオ及びオーディオデータが、デジタル／アナログ変換器に伝送される前、画素はＶＰが好ましい出力解像度及びイメージを得るように、前記の過程を遂行する。
このような結果はまた、ＳＤＲＡＭに貯蔵される。
段階Ｄ８：ロー(RAW)Ａ／Ｖデータ出力
最後に、ＳＤＲＡＭ内部の再現オーディオ及びビデオデータは、ＤＭＡを利用して出力される。さらに、このようなデータ移動はＡＲＭ７により初期化される。現在のビデオオーバーレー技術は、ＰＣＩバスがビデオソースにデータを伝送できるようにし、最後にデータはカストマＡＳＩＣまたはＰＣＩバスの中のいずれか１つに伝送される。
【０１３９】
１０．７プログラミングモデル
１０．７．１ＢＰベース装置アドレス
ＢＰは次の３２ビット基本装置アドレスを有している。
<MSP_BASE><BP_BASE><Address_Offset>
ここで、<MSP_BASE>はＭＳＰベースＰＣＩ装置アドレスにより規定された５ビットであり、<BP_BASE>は7'b 1111100に等しい７ビットであり、<Address_Offset>はＢＰ内部レジスタに割当てられた２０ビットである。
従って、全体ＭＳＰＩ／Ｏ装置アドレスマップで、ＢＰに割当てられたアドレス範囲は、27'h 7C0_0000から27'h 7CF_FFFFまでである。
１０．７．２内部レジスタ説明
内部レジスタセットは、表に示しており、表のすべてのレジスタは、ＡＲＭ７またはＶＰにより書かれるか読まれることができる。
【０１４０】
ＢＰ内部レジスタ
【表４１】

【０１４１】
＊ＢＰ−ＭＯＤＥ［３１：０］（読出専用、省略時の値なし）−このレジスタはビデオ標準タイプと多様な画像レベル情報を定義し、詳細なことはサブセクション１０．８．１で示している。
＊ＢＰ＿ＣＯＮＴＲＯＬ［３１：０］（読み取り書き込み、省略時の値は“32 'h 0000_0000”）−このレジスタは、ＢＰ動作のために多様な制御パラメータを含む。ＡＲＭ７またはＶＰは、このレジスタにある各フラグをセットし、あるフラグはＢＰによりセットされる。ビット仕様はサブセクション１０．８．２で示している。
＊ＩＢＵＦ０＿ＳＴＡＲＴ［３１：０］（読み取り書き込み、省略時の値なし）−このレジスタは、ＢＰ入力両方向バッファの入力バッファ０となるように、初期アドレスをＡＲＭ７により定義し初期化する。ＩＢＵＦ０＿ＳＴＡＲＴ用の初期化値は常にＩＢＵＦ０＿ＥＮＤより小さく、ＩＢＵＦ０＿ＳＴＡＲＴ［３：０］は4'b0000と同一である。
＊ＩＢＵＦ０＿ＥＮＤ［３１：０］（読出専用、省略時の値なし）−このレジスタはＢＰ入力両方向バッファの入力バッファ０で最後のアドレスを定義しており、この内容はセクション１０．１１に記述されている。
＊ＩＢＵＦ１＿ＳＴＡＲＴ［３１：０］（読み取り書き込み、省略時の値なし）−このレジスタはＢＰ入力ダブルバッファの入力バッファが１となるように、ＡＲＭ７のスタートアドレスを初期化させる。ＩＢＵＦ１＿ＳＴＡＲＴの初期化値は常にＩＢＵＦ１＿ＥＮＤより小さく、ＩＢＵＦ１＿ＳＴＡＲＴ［３：０］は4'b0000と同一になる。この内容はセクション１０．１１に記述されている。＊ＩＢＵＦ１＿ＥＮＤ［３１：０］（読出専用、省略時の値なし）−このレジスタはＢＰ入力ダブルバッファの入力バッファ１が１となるように、最後のアドレスを定義する。この内容はセクション１０．１１に記述されている。
＊ＯＢＵＦ０＿ＳＴＡＲＴ［３１：０］（読み取り書き込み、省略時の値なし）−このレジスタはＢＰ出力ダブルバッファの出力バッファが０となるように、ＡＲＭ７のスタートアドレスを初期化させる。ＯＢＵＦ０＿ＳＴＡＲＴの初期化値は、ＯＢＵＦ０＿ＥＮＤより常に小さく、ＯＢＵＦ０＿ＳＴＡＲＴ［３：０］は4'b0000と同一である。この内容はセクション１０．１１に記述されている。
＊ＯＢＵＦ０＿ＥＮＤ［３１：０］（読出専用、省略時の値なし）−このレジスタはＢＰ出力ダブルバッファの出力バッファが０となるように、最後のアドレスを定義する。この内容なセクション１０．１１に記述されている。
＊ＯＢＵＦ１＿ＳＴＡＲＴ［３１：０］（読み取り書き込み、省略時の値なし）−このレジスタはＡＲＭ７によりＢＰ出力ダブルバッファの出力バッファが１となるように、ＡＲＭ７のスタートアドレスを初期化する。ＯＢＵＦ１＿ＳＴＡＲＴの初期化値は、ＯＢＵＦ１＿ＥＮＤより常に小さく、ＯＢＵＦ１＿ＳＴＡＲＴ［３：０］は4'b0000と同一である。この内容はセクション１０．１１に記述されている。
＊ＯＢＵＦ１＿ＥＮＤ［３１：０］（読出専用、省略時の値なし）−このレジスタはＢＰ出力ダブルバッファの出力バッファが１となるように、最後のアドレスを定義する。この内容はセクション１０．１１に記述されている。
＊ＳＡＶＥ＿ＡＤＲ［３１：０］（読出専用、省略時の値なし）−このレジスタは先占文脈切換モードが要求される場合、ＢＰ内部文脈を貯蔵するように、ＳＤＲＡＭの初期アドレスに定義する。関連資料はサブセクション１０．１２．１を参照する。
【０１４２】
＊ＶＡＬＩＤ＿ＢＹＴＥ＿ＡＤＲ［３１：０］（読み取り書き込み、省略時の値なし）−このレジスタはデコーディングで入力ダブルバッファまたは、エンコーディングで出力ダブルバッファの最後の有効データバイト位置を表わす。このレジスタの目的は、ＡＲＭ７及びＢＰの間でハンドシェーキングするためのものである。一般に、有効バイトデータの有効ビット位置のために追加的な情報が要求されるが、これはＢＰ＿ＣＯＮＴＲＯＬ［３１：０］レジスタ内に含まれている。詳細な内容はセクション１０．１３にある。
＊ＢＰ＿ＳＴＡＴＵＳ［３１：０］（読み取り書き込み、省略時の値は“32 'h 0000_0000”）−このレジスタは、ＢＰの多様な内部状態を表わす。最下２バイト（例えば、ＢＰ＿ＳＴＡＴＵＳ［１５：０］）のすべてのビット位置は、ＡＲＭ７＿ＩＲＱを“１”にセットすることができるインタラプト条件である。このレジスタは２つの方法で接近することができる。ＡＲＭ７またはＶＰアドレス27'h7C0_0050を使用する全３２−ビットレジスタを読取または書込可能である。しかし、一般的にＡＲＭ７及びＶＰは、ビット単位でＢＰ＿ＳＴＡＴＵＳレジスタの内容を書込（またはリセット）することが好ましい。ＢＰはまた、ＢＰ＿ＳＴＡＴＵＳの各ビット当り27'h7C0_0030から27'h7C0_004Fまでの範囲のアドレスを割当てることによって、この特徴的な内容を支援する。このようなビット内容は、サブセクション１０．８．３に記述されている。
＊ＢＰ＿ＩＮＴ＿ＭＡＳＫ［１５：０］（読出専用、省略時の値は“16hFFFF”）−このレジスタの各ビットは、前記のＢＰ＿ＳＴＡＴＵＳ［１５：０］によるインタラプト条件に対応し、ＢＰ＿ＳＴＡＴＵＳ［１５：０］の内部でコーディングされる前の条件を有する論理的な値(and-ed)である。もし１つのマスクビットが“０”にセットされると、対応インタラプト条件は、無条件的に“０”（例えば、ディスエーブルされる）にセットする。このようなインタラプトに対する詳細な内容は、セクション１０．９に記述されている。
＊Ｖ＿ＭＢ＿ＳＩＺＥ［７：０］（読出専用、省略時の値なし）−このレジスタは符号化または復号化される画像の垂直サイズを表わす。ここで、この値はマクロブロック数を意味する。例えば、もし垂直サイズが２８８画素であれば、Ｖ＿ＭＢ＿ＳＩＺＥ［７：０］＝２８８／１６＝１８となる。ＡＲＭ７はＢＰエンコーディング及びデコーディング動作をスタートする前に常に設定しなければならない。
【０１４３】
＊Ｈ＿ＭＢ＿ＳＩＺＥ［７：０］（読出専用、省略時の値なし）−このレジスタは符号化または復号化される画像の水平サイズを表わす。ここで、この値はマクロブロック数を意味する。例えば、もし垂直サイズが３５２画素であれば、Ｈ＿ＭＢ＿ＳＩＺＥ［７：０］＝３５２／１６＝２２となる。ＡＲＭ７はＢＰエンコーディング及びデコーディング動作をスタートする前に常に設定しなければならない。
＊ＡＲＭ７＿ＩＲＱ［０］（読出専用、省略時の値は“０”）−このレジスタは、ＡＲＭ７にインタラプトを要求するための１ビットフラグであり、ＡＲＭ７＿ＩＲＱ出力ポートに直接連結されている。もしＢＰ＿ＳＴＡＴＵＳ［１５：０］の任意ビットが“１”にセットされると、このフラグはセットされる。そしてＡＲＭ７はこのフラグをリセットさせる。
１０．８ＢＰＩ／Ｏデータワードフォーマット
このセクションでは、ＢＰ入出力用命令語データ及びマクロブロックデータワードフォーマットを含む。
１０．８．１ＢＰ＿ＭＯＤＥレジスタフォーマット
27'h7C0_0000アドレスの３２ビットＢＰ＿ＭＯＤＥレジスタは、テーブル２５に与えられた下記のフォーマットを有している。すなわち、ＢＰ＿ＭＯＤＥ［３１］＝ＰＡＲＡＭ＿ＳＥＴ２［７］とＢＰ＿ＭＯＤＥ［０］＝ＳＦ［０］とを表わす。
【０１４４】
【表４２】
ＢＰ＿ＭＯＤＥレジスタフォーマット

【０１４５】
＊ｓｔａｎｄａｒｄ＿ｆｏｒｍａｔ［ＳＦ］−使用されるビデオ標準は、テーブル２６に定義されている。前記のＳＦは、常にＢＰがすべてのビデオエンコーディング及びデコーディング応用にイネーブルされる前に、ＡＲＭ７により定義されなければならない。
【０１４６】
【表４３】
ＳＦ定義

【０１４７】
＊ｐｉｃｔｕｒｅ＿ｔｙｐｅ（ＰＴ）−映像コーディングタイプは、テーブル２７に定義されている。
ＰＴ用値００は、ＭＰＥＧ−１、ＭＰＥＧ−２及びＨ．２６３応用のための特殊な場合である。特にＤ＿映像は、たとえＭＰＥＧ−２に使用されなくても、ＭＰＥＧ−２用の映像タイプに割当てられる。その理由は、ＭＰＥＧ−１ビットストリームがＭＰＥＧ−２ビットストリームのサブセットからである。
【０１４８】
【表４４】
ＰＴの定義

【０１４９】
＊ｐｉｃｔｕｒｅ＿ｓｔｒｕｃｔｕｒｅ（ＰＳ）−映像構造情報は、表４３に定義されている。更に、ＰＳ用値００は非論理的であるのでエラーを招来する。
【０１５０】
【表４５】
ＰＳの定義

【０１５１】
＊ｐａｒａｍｅｔｅｒ＿ｓｅｔ０、１及び２（PARAM_SET0、PARAM_SET1、PARAM_SET2)−このような３バイトは、ＭＰＥＧ−１、ＭＰＥＧ−２及びＨ．２６３に使用される多様なパラメータで定義される。各パラメータセット用定義は、表に記述されている。
【０１５２】
【表４６】
PARAM_SET0の定義

【０１５３】
＊ｉｎｔｒａ＿ｄｃ＿ｐｒｅｃｉｓｉｏｎ（ＩＤＰ）-ＭＰＥＧ−２に定義された２ビットイントラーｄｃ精度パラメータは、ＭＰＥＧ−１応用で００にセットされなければならない。
＊ｔｏｐ＿ｆｉｅｌｄ＿ｆｉｒｓｔ（ＴＦＦ）−モーションベクトルエンコーディング及びデコーディングに使用されるＭＰＥＧ−２用フラグである。
＊ｆｒａｍｅ＿ｐｒｅｄ＿ｆｒａｍｅ＿ｄｃｔ（ＦＰＦＤ）−ＭＰＥＧ−２用フラグは、フレーム＿ＤＣＴ及びフレーム予測が使用されることを表わす。
＊ｃａｎｃｅａｌｍｅｎｔ＿ｍｏｔｉｏｎ＿ｖｅｃｔｏｒｓ（ＣＭＶ）またはａｄｖａｎｃｅｄ＿ｐｒｅｄｉｃｔｉｏｎ＿ｍｏｄｅ（ＡＰ）−ＭＰＥＧ−２で、このフラグはモーションベクトルが映像間のマクロブロックで使用されることを表わす。Ｈ．２６３で、このフラグはもし、改良予測モードがＯＮであれば、１にセットされる。そうでない場合は０にセットされる。
次の標準のためにこのフラグは０にセットされなければならない。
＊ｉｎｔｒａ＿ｖｌｃ＿ｆｏｒｍａｔ（ＩＶＦ）−ＭＰＥＧ−２用フラグは、映像間のマクロブロックのためのＶＬＣテーブル形態を決定する。
＊ａｌｔｅｒｎａｔｅ−ｓｃａｎ（ＡＳ）−ＭＰＥＧ−２用フラグは、符号化及び復号化される係数の順序を決定する。
＊ｖｅｒｔｉｃａｌ＿ｓｉｚｅ＿ｆｌａｇ（ＶＳＦ）またはｃｏｎｔｉｎｕｏｕｓ＿ｐｒｅｓｅｎｃｅ＿ｍｕｌｔｉｐｏｉｎｔ（ＣＰＭ）−ＭＰＥＧ−１及びＭＰＥＧ−２で、このフラグの映像の垂直サイズが、２８００ラインを超過する場合は１にセットされ、そうでなければ０にセットされなければならない。Ｈ．２６３で、このフラグは、連続的に現在のマルチポイントモードが使用されると１にセットされ、そうでなければ０にセットされる。
【０１５４】
【表４７】
ＰＡＲＡＭ＿ＳＥＴ１及びＰＡＲＡＭ＿ＳＥＴ２の定義

【０１５５】
１０．８．２ＢＰ＿ＣＯＮＴＲＯＬレジスタフォーマット
ＢＰ＿ＣＯＮＴＲＯＬ［３１：０］レジスタ（アドレス27'h7C0_0004)のためのビット仕様は、表４７で示している。
【０１５６】
ＢＰ＿ＣＯＮＴＲＯＬレジスタフォーマット
【表４８】

【０１５７】
＊ＢＰ＿ｅｎａｂｌｅ（ＢＰ＿ＥＮ）−このフラグがＡＲＭ７またはＶＰにより１にセットされる場合、ＢＰはプロセッシングを遂行する。従って、すべての他のレジスタ構造は、このフラグがセットされる前に完了される。もしＢＰがプロセッシングを終えると、このフラグはＢＰによりクリアーされる。
＊ｓｏｆｔｗａｒｅ＿ｒｅｓｅｔ（ＳＯＦＴ＿ＲＥＳＥＴ）−フラグがＡＲＭ７またはＶＰによりセットされる時、ＢＰは現在の処理を中断し、省略時の状態ですべての初期レジスタにリターンし、アイドル状態となる。ＡＲＡＭ７はＢＰ＿ＥＮフラグをセッティングし、ＢＰ過程をさらに始め得る。ＢＰハードウェアリセット信号はアクチブローである。
＊ｐａｕｓｅ（ＰＡＵＳＥ）−フラグがＡＲＭ７またはＶＰにより１にセットされる時、ＢＰは現在の処理動作を中止する。使用者はＢＰ＿ＥＮフラグを設定することによって、中止動作が実行される。
＊ｄｅｔｅｃｔ＿ｓｔａｒｔ＿ｃｏｄｅ（ＤＥＴＥＣＴ＿ＳＴＡＲＴ＿ＣＯＤＥ）−フラグがＡＲＭ７またはＶＰにより１にセットされる時、ＢＰはＩＢＵＦ０にあるデータの中から次のスタートコードを探す。従って、使用者がＩＢＵＦ０＿ＳＴＡＲＴ及びＩＢＵＦ０＿ＥＮＤのための好ましいアドレスをセットしなければならない。このような命令語は、もしＢＰがアイドル状態であれば、適切に動作する。従って、ＡＲＭ７はもしＢＰがアイドルでない場合、この命令を外部に送る前、ソフトウェアリセット命令を優先的にＢＰに送るべきである。
＊ｓｔｅｐ（ＳＴＥＰ）−このフラグがＡＲＭ７またはＶＰにより１にセットされると、ＢＰは現在動作過程の一状態を遂行する。これはデバッギングすることに非常に必要な特徴である。ＡＲＭ７はこの段階動作がイネーブルするように中止命令を優先的に送るべきである。
＊ｃｏｎｔｅｘｔ＿ｓｗｉｔｃｈｉｎｇ＿ｍｏｄｅ（ＣＴＸ＿ＭＯＤＥ）−フラグが“１”にＣＴＸ＿ＳＷＩＴＣＨをセッティングし、ＡＲＭ７またはＶＰにより“１”にセットされる時、ＢＰは先占スイッチングモードを遂行する。もし、これはＣＴＸ＿ＳＷＩＴＣＨが“１”にセッティングされることによって“０”にセットされると、ＢＰは協力文脈スイッチングモードを遂行する。“１”にＣＴＸ＿ＳＷＩＴＣＨをセッティングせず、ＣＴＸ＿ＭＯＤＥをセッティングすることはＢＰ処理に影響を及ぼさない。文脈切換の詳細な内容はセクション１０．１２を参照する。
＊ｃｏｎｔｅｘｔ＿ｒｅｌｏａｄ＿ｒｅｑｕｅｓｔ（ＣＴＸ＿ＲＥＬＯＡＤ）−フラグがＡＲＭ７またはＶＰにより“１”にセットする時、ＢＰは既にＳＤＲＡＭに貯蔵された文脈を更にロードする。そうすると、ＢＰはアドレスＳＡＶＥ＿ＡＤＲ［３１：０］から貯蔵された文脈を読み込む。文脈切換に対する詳細な内容はセクション１０．１２を参照。
＊ｅｒｒｏｒ＿ｈａｎｄｌｅ＿ｍｏｄｅ（ＥＲＲ＿ＨＡＮＤＬＥ＿ＭＯＤＥ）−このフラグは、伝送された圧縮ビットストリームでエラーが発生した時、ＢＰのエラー復旧過程を遂行することに利用される。
入力ビットストリームが無効データである場合、ＢＰはＡＲＭ７をインタラプトさせ、このフラグの内容をチェックする。このフラグが“１”にセットされる時、ＢＰは自動的に次のスタートコードを探す。もし、スタートコードがスライスまたはＧＯＢであれば、ＢＰはこの過程を更に遂行する。このフラグは“０”にセットされる時、ＢＰは次のスタートコードを探せず、アイドル状態で動作する。ＢＰとＡＲＭ７と間のハンドシェーキングはセクション１０．１３に記述されている。
＊ｎｕｍｂｅｒ＿ｏｆ＿ｍａｃｒｏｂｌｏｃｋｓ＿ｔｏ＿ｂｅ＿ｅｎｃｏｄｅｄ（ＮＯ＿ＭＢＳ［１５：０］）−このレジスタは、スライスまたはＧＯＢで符号化されたマクロブロックの数を表わす１６ビットを含む。このような６５５３５までのビット分解能を使用し、マクロブロックはスラスまたはＧＯＢでエンコーディングされる。ここにおいて、“０”値はマクロブロック数として許容されない。
１０．８．３ＢＰ＿ＳＴＡＴＵＳレジスタフォーマット
ＢＰ＿ＳＴＡＴＵＳ［３１：０］（アドレス27'h 7C0_0050)は表５４に示されている。
【０１５８】
ＢＰ＿ＳＴＡＴＵＳレジスタフォーマット
【表４９】

【０１５９】
＊ｉｎｐｕｔ＿ｂｕｆｆｅｒ＿０＿ｄｏｎｅ（ＩＢＵＦ０＿ＤＯＮＥ）−このフラグは入力バッファ０にあるデータが、前記のＢＰにより全部使用される。このフラグはＢＰによりセットされ、ＡＲＭ７またはＶＰによりクリアーされる。このようなフラグはインタラプト状態を表わす。
＊ｉｎｐｕｔ＿ｂｕｆｆｅｒ＿１＿ｄｏｎｅ（ＩＢＵＦ１＿ＤＯＮＥ）−このフラグは入力バッファ１にあるデータが、前記のＢＰにより全部使用されたことを表わす。このフラグはＢＰによりセットされ、ＡＲＭ７またはＶＰによりクリアーされる。このようなフラグはインタラプト状態を表わす。
＊ｏｕｔｐｕｔ＿ｂｕｆｆｅｒ＿０＿ｆｕｌｌ（ＯＢＵＦ０＿ＦＵＬＬ）−このフラグはＢＰにより出力バッファ０が満たされることを表わす。前記のフラグはＢＰによりセットされ、そしてＡＲＭ７またはＶＰによりクリアーされる。このようなフラグはインタラプト状態を表わす。
＊ｏｕｔｐｕｔ＿ｂｕｆｆｅｒ＿１＿ｆｕｌｌ（ＯＢＵＦ１＿ＦＵＬＬ）−このフラグは、ＢＰにより出力バッファ１が満たされることを表わす。前記のフラグはＢＰによりセットされ、そしてＡＲＭ７またはＶＰによりクリアーされる。このようなフラグはインタラプト状態を表わす。
＊ＢＰ＿ｐｒｏｃｅｓｓｉｎｇ＿ｄｏｎｅ（ＢＰ＿ＤＯＮＥ）−このフラグは、前記のＢＰがスライスまたはＧＯＢをエンコーディングするかまたはデコーディングする時、スライスまたはＧＯＢでないスタートモードを検出したことを表わす。このフラグはＢＰによりセットされ、そしてＡＲＭ７またはＶＰによりクリアーされる。このようなフラグはインタラプト状態を表わす。
＊ｃｏｎｔｅｘｔ＿ｓｗｉｔｃｈｉｎｇ＿ｄｏｎｅ（ＣＴＸ＿ＳＷ＿ＤＯＮＥ）−このフラグは、ＢＰが文脈スイッチングモードから他の作業に転換されるように準備されていることを表わす。このフラグはＢＰによりセットされ、そしてＡＲＭ７またはＶＰによりクリアーされる。このようなフラグはインタラプト状態を表わす。
＊ｃｏｎｔｅｘｔ＿ｒｅｌｏａｄ＿ｄｏｎｅ（ＣＴＸ＿ＲＥＬＯＡＤ＿ＤＯＮＥ）−このフラグは、ＢＰがアドレスＳＡＶＥ＿ＡＤＲ［３１：０］から貯蔵された文脈のために再ロード動作が完了されたことを表わす。このフラグはＢＰによりセットされ、そしてＡＲＭ７またはＶＰによりクリアーされる。このようなフラグはインタラプト状態を表わす。
【０１６０】
＊ＢＰ＿ｅｒｒｏｒ＿ｆｌｇ（ＢＰ＿ＥＲＲ）−このフラグは、エラーがデータを処理する間にＢＰで発生されることを表わす。このフラグはＢＰ＿ＥＲＲ＿ＣＯＤＥ［７：０］（ＢＰ＿ＳＴＡＴＵＳ［３１：２４］がゼロでない場合にセットされる。詳細な内容はサブセット１０．９．２に記述されている。
＊ｉｎｐｕｔ＿ｂｕｆｆｅｒ＿０＿ｆｕｌｌ（ＩＢＵＦ０＿ＦＵＬＬ）−このフラグは、入力バッファ０にあるデータが、ＡＲＭ７またはＶＰにより満たされることを表わす。このフラグはＡＲＭ７またはＶＰによりセットされ、ＢＰによりクリアーされる。
＊ｉｎｐｕｔ＿ｂｕｆｆｅｒ＿１＿ｆｕｌｌ（ＩＢＵＦ１＿ＦＵＬＬ）−このフラグは、入力バッファ１にあるデータが、ＡＲＭ７またはＶＰにより満たされることを表わす。このフラグはＡＲＭ７またはＶＰによりセットされ、ＢＰによりクリアーされる。
＊ｏｕｔｐｕｔ＿ｂｕｆｆｅｒ＿０＿ｄｏｎｅ（ＯＢＵＦ０＿ＤＯＮＥ）−このフラグは、入力バッファ０にあるデータが、ＡＲＭ７またはＶＰにより全部使用されたことを表わす。このフラグはＡＲＭ７またはＶＰによりセットされ、ＢＰによりクリアーされる。
＊ｏｕｔｐｕｔ＿ｂｕｆｆｅｒ＿１＿ｄｏｎｅ（ＯＢＵＦ１＿ＤＯＮＥ）−このフラグは、入力バッファ１にあるデータが、ＡＲＭ７またはＶＰにより全部使用されたことを表わす。このフラグはＡＲＭ７またはＶＰによりセットされ、ＢＰによりクリアーされる。
＊ｖａｌｉｄ＿ｂｉｔ＿ｐｏｓｉｔｉｏｎ（ＶＡＬＩＤ＿ＢＩＴ＿ＰＯＳ［２：０］）−次の過程のために、ＶＡＬＩＤ＿ＢＹＴＥ＿ＡＤＲ［３１：０］に貯蔵されている３ビット情報は、データバイトの中の有効なビット位置を表わす。ビデオエンコーディングで、ＢＰは値を設定すべきであり、ＡＲＭ７はこのビット位置から次の過程を遂行しなければならない。ビデオデコーディングで、ＡＲＭ７は値を設定すべきであり、ＢＰはこのようなビット位置からプロセッシングを遂行しなければならない。
＊ＢＰ＿ｅｒｒｏｒ＿ｃｏｄｅ（ＢＰ＿ＥＲＲ＿ＣＯＤＥ［７：０］）−８ビット情報は、ＢＰからどのようなエラーが発生したかを表わす。ゼロ値は、何等のエラーも発生しなかったことを表わす。詳細な内容はサブセクション１０．９．２で記述している。
【０１６１】
１０．８．４デコーディングするための入力データフォーマットとエンコーディングするための出力データフォーマット
このような場合、データは実質的に圧縮されたビットストリームからなっている。このようなデータは初期コード、ヘッダーパラメータ及び、これに対応される標準によって圧縮されたデータを含まなければならない。このようなビットストリームはバイト単位別にパケットされるが、しかし、或る動作ではバイトを割当てる必要がない。このようなビットストリームは多様なスライスまたはＧＯＢのためのデータを含む。
１０．８．５エンコーディングするための入力データフォーマットとデコーディングのための出力データフォーマット
このような場合、データは実質的にマクロブロックヘッダ情報、モーションデータ及び画素係数データからなっている。このような種類のデータ用フォーマットは、次のように定義される。
１０．８．５．１マクロブロックヘッダワード
マクロブロックヘッダは常に６バイトからなっており、テーブル３３に与えられた下記のデータフォーマットを有している。
【０１６２】
マクロブロックヘッドワードフォーマット
【表５０】

【０１６３】
ここにおいて、前記のテーブルに示したパラメータは下記に定義される。
＊ｖｅｒｔｉｃａｌ＿ｍａｃｒｏｂｌｏｃｋ＿ａｄｄｒｅｓｓ（ＶＭＡ）またはｇｒｏｕｐ＿ｎｕｍｂｅｒ（ＧＲＮＯ）−このようなバイトは、１から２５５までの値を有する垂直マクロブロックの位置を表わす。第１垂直位置は、０でなく１に記載されている。例外的な場合、Ｈ．２６１エンコーディング時、このようなフィールドはブロックグループの位置を表わすｇｒｏｕｐ＿ｎｕｍｂｅｒ情報を表わす。
＊ｈｏｒｉｚｏｎｔａｌ＿ｍａｃｒｏｂｌｏｃｋ＿ａｄｄｒｅｓｓ（ＨＭＡ）またはｍａｃｒｏｂｌｏｃｋ＿ｐｏｓｉｔｉｏｎ（ＭＢＰＳ）−このようなフィールドは、１から２５５までの値を有する水平マクロブロックの位置を表わす。第１水平位置は、０でない１に記載されている。例外的な場合、Ｈ．２６１エンコーディング時、このようなフィールドはＧＯＢの中でマクロブロックの３３可能性のある位置の中のいずれか１つを表わす。
＊ｍａｃｒｏｂｌｏｃｋ＿ｉｎｔｒａ（Ｉ）−もし、現在のマクロブロックが映像間符号化されると１にセットされ、そうでない場合は０にセットされる。＊ｍａｃｒｏｂｌｏｃｋ＿ｐａｔｔｅｒｎ（Ｐ）−もし、現在のマクロブロックが符号化されたブロックを含むと１にセットされ、そうでない場合は０にセットされる。
＊ｍａｃｒｏｂｌｏｃｋ＿ｑｕａｎｔ（Ｑ）−もし、現在のマクロブロックが新しい量子尺度パラメータを有していると１にセットされ、そうでない場合は０にセットされる。
＊ｍａｃｒｏｂｌｏｃｋ＿ｍｏｔｉｏｎ＿ｆｏｒｗａｒｄ（ＭＦ）−もし、現在のマクロブロックが順方向予測であれば１にセットされ、そうでない場合は０にセットされる。
＊ｍａｃｒｏｂｌｏｃｋ＿ｍｏｔｉｏｎ＿ｂａｃｋｗａｒｄ（ＭＢ）−もし、現在のマクロブロックが逆方向予測またはＨ．２６３でＢ−ｂｌｏｃｋｓを含むと１にセットされ、そうでない場合は０にセットされる。
＊ｄｃｔ＿ｔｙｐｅ（ＤＴ）、ｌｏｏｐ＿ｆｉｌｔｅｒ（ＬＦ）またはａｄｖａｎｃｅｄ＿ｐｒｅｄｉｃｔｉｏｎ（Ｍ４）−バイト２のビット［５］は、それぞれの動作で異なる意味を有する。ＭＰＥＧ−１ではこれは使用されない。ＭＰＥＧ−２ではｄｃｔ＿ｔｙｐｅを意味する。もし、マクロブロックがフィールドＤＣＴコーディングされた場合は、このフラグは１にセットされる。もし、フレームＤＣＴコーディングされた場合は、０にセットされなければならない。Ｈ．２６１でこのフラグはもし、ループフィルターが現在のマクロブロックで使用された場合はセットされる。そうでなければ０にセットされる。Ｈ．２６３で、もし現在のマクロブロックが改良予測モードを使用した場合は１にセットされ、そうでない場合は０にセットされる。
＊ｍｏｔｉｏｎ＿ｔｙｐｅ（ＭＴ）−２ビットフィールドは、表５６及び表５７で意味する、ＭＰＥＧ−２で使用されたｆｒａｍｅ＿ｍｏｔｉｏｎ＿ｔｙｐｅまたはｆｉｅｌｄ＿ｍｏｔｉｏｎ＿ｔｙｐｅを表わす。
【０１６４】
【表５１】
ｆｒａｍｅ＿ｍｏｔｉｏｎ＿ｔｙｐｅの意味

【０１６５】
【表５２】
ｆｉｅｌｄ＿ｍｏｔｉｏｎ＿ｔｙｐｅの意味

【０１６６】
＊ｑｕａｎｔｉｚｅｒ＿ｓｃａｌｅ（Ｑ＿ＳＣＡＬＥ）−ＤＣＴ係数レベルの再現レベルをスケーリングするために、範囲１から３１まで表示されない整数である。すべてのマクロブロックヘッダは、たとえその値がその以前のマクロブロック（すなわち、ｍａｃｒｏｂｌｏｃｋｓ＿ｑｕａｎｔはゼロである。）値と同一であるとしても、このようなパラメータに適当な値を含まなければならない。エンコーディングで、使用者はこのようなフィールドに適当な値をライトさせる責任がある。デコーディング時、ＢＰはこのフィールドにハフマンデコードされたｑｕａｎｔｉｚｅｒｓｃａｌｅ値を書き込まなければならない。もし、現在のマクロブロックがこのフィールドにハフマンコードを含まなければ、ＢＰはその以前のマクロブロックのスケール値を書かなければならない。
＊ｃｏｄｅｄ＿ｂｌｏｃｋ＿ｐａｔｔｅｒｎ＿０（ＣＢＰ＿０）−６ビットコードは、現在のマクロブロックで符号化されたブロックを表わす。
ここにおいて、
ＣＢＰ＿０［５］＝＝＞輝度（Ｙ）０ブロック
ＣＢＰ＿０［４］＝＝＞輝度（Ｙ）１ブロック
ＣＢＰ＿０［３］＝＝＞輝度（Ｙ）２ブロック
ＣＢＰ＿０［２］＝＝＞輝度（Ｙ）３ブロック
ＣＢＰ＿０［１］＝＝＞色相ブルー（Ｃｂ）ブロック
ＣＢＰ＿０［０］＝＝＞色相レッド（Ｃｒ）ブロック
＊ｃｏｄｅｄ＿ｂｌｏｃｋ＿ｐａｔｔｅｒｎ＿１（ＣＢＰ＿１）−Ｈ．２６３でＰＢフレームのＢ−ｂｌｏｃｋｓのための追加的なｃｏｄｅｄ＿ｂｌｏｃｋｓ＿ｐａｔｔｅｒｎである。ここで、
ＣＢＰ＿１［５］＝＝＞輝度（Ｙ）０ブロック
ＣＢＰ＿１［４］＝＝＞輝度（Ｙ）１ブロック
ＣＢＰ＿１［３］＝＝＞輝度（Ｙ）２ブロック
ＣＢＰ＿１［２］＝＝＞輝度（Ｙ）３ブロック
ＣＢＰ＿１［１］＝＝＞色相ブルー（Ｃｂ）ブロック
ＣＢＰ＿１［０］＝＝＞色相レッド（Ｃｒ）ブロック
＊ｌｏｇｉｃａｌ＿ｃｈａｎｎｅｌ＿ｉｎｄｉｃａｔｏｒ（ＬＣＩ）−ＧＯＢ論理的なチャンネルのための２ビット情報は、ただＨ．２６３で連続的なマルチポイントのみで使用される。
＊ｆｒａｍｅ＿ｉｄ（ＦＩＤ）−Ｈ．２６３用のＧＯＢフレームＩＤの２ビット情報
＊ｍａｃｒｏｂｌｏｃｋ＿ａｄｄｒｅｓｓ＿ｉｎｄｉｃａｔｏｒ（ＭＢＡ＿ＩＮＣ）−現在のマクロブロックアドレスが増加する値を表わすために２バイト情報を表わす。この情報は、常に追加的な情報としてＢＰにより提供され、使用者は入力フォーマットに設定する必要がない。そして、入力マクロブロックヘッダーワードで規定したいずれの値は、ＢＰにより無視される。
＊ｐｒｅｖｉｏｕｓ＿ｄｃ＿ｌｕｍｉｎａｎｃｅ（ＰＲＥ＿ＤＣ＿Ｙ）−以前のマクロブロックで輝度ブロックのｄｃ値のための２バイト情報である。もし、マクロブロックがスキップされると、リセット値が伝送される。この情報は、常に追加的な情報としてＢＰにより提供され、使用者は入力フォーマットに設定する必要がない。そして、入力マクロブロックヘッダワードで規定したいずれの値は、ＢＰにより無視される。
＊ｐｒｅｖｉｏｕｓ＿ｄｃ＿ｃｈｒｏｍｉｎａｎｃｅ＿ｂｌｕｅ（ＰＲＥ＿ＤＣ＿Ｃｂ）−以前のマクロブロックでブルー色彩ブロックのｄｃ値のための２バイト情報である。もし、マクロブロックがスキップされると、リセット値が伝送される。この情報は、常に追加的な情報としてＢＰにより提供され、使用者は入力フォーマットを設定する必要がない。そして、入力マクロブロックヘッダワードで規定したいずれの値は、ＢＰにより無視される。
＊ｐｒｅｖｉｏｕｓ＿ｄｃ＿ｃｈｒｏｍｉｎａｎｃｅ＿ｒｅｄ（ＰＲＥ＿ＤＣ＿Ｃｒ）−以前のマクロブロックでレッド色彩ブロックのｄｃ値のための２バイト情報である。もし、マクロブロックがスキップされると、リセット値が伝送される。この情報は、常に追加的な情報としてＢＰにより提供され、使用者は入力フォーマットに設定する必要がない。そして、入力マクロブロックヘッダワードで規定したいずれの値は、ＢＰにより無視される。
【０１６７】
１０．８．５．２モーションデータワード
各マクロブロックヘッダはもし、マクロブロックがモーションベクトルを含むと、追加的なヘッダワードを持たなければならず、まずＭＰＥＧ−１及びＭＰＥＧ−２の場合を考える。このような標準は、下記の過程の中のいずれか１つが発生する時、モーションベクトルのための表５９に図示された、追加的なヘッダワードフォーマットを持つようになる。
条件１）ＭＦ＝１または（Ｉ＝１及びＣＭＶ＝１）の時、
条件２）ＭＢ＝１の時、
【０１６８】
ＭＰＥＧ−１及びＭＰＥＧ−２のための一般的なモーションベクトルデータフォーマット
【表５３】

【０１６９】
表５３で、すべての要素値は、半画素精密度となる。前記のＦＳ０、ＦＳ１、ＦＳ２及びＦＳ３は、各モーションベクトルでフィールド選択を確認するための１ビットフラグである。もし、いずれのフィールドも選択されなければ、前記のフラグは０にセットされなければならない。その理由は、ＭＰＥＧ−１はフィールド選択情報を使用しないので、このようなフラグは０にセットされる。
１つの例外的な場合は、デュアルプライムモーションベクトルのＭＰＥＧ−２エンコーディグで発生する。このような場合、順方向モーションベクトルは、１６バイト（実質的に８バイトが使用される）から構成され、フォーマットはテーブル３７のようになる。正常的に、ＢＰはビデオエンコーディング応用で、モーションベクトル値を差動値に変換する。しかし、テーブル３７のモーションベクトル成分は、ハフマンエンコーダーですぐその入力となる差動値である。デュアルプライムモーションベクトルは、ＭＰＥＧ−２デコーディング応用の場合、ＢＰにより動作される。
【０１７０】
【表５４】
ＭＰＥＧ−２エンコーディング用デュアルプライムモードでモーションベクトルデータフォーマット

【０１７１】
前記のＨ．２６１及びＨ．２６３は、多少異なるモーションベクトルデータフォーマットを有する。大部の場合に、一バイトで或るモーションベクトル成分値を充分に表わすことができる。ＭＦ及びＭ４フラグの内容によって、対応されるモーション補償マクロブロックは、少なくとも２個であり、そして、多い場合１０個のモーションベクトル成分を有する。前記のモーションベクトルデータのデータフォーマットは、表６０に示されている。
【０１７２】
Ｈ．２６１及びＨ．２６３のためのモーションベクトルデータフォーマット
【表５５】

【０１７３】
１０．８．５．３ピクセル係数データワード
４個のビデオ圧縮標準は、量子化レベルのために、相異なる最大ピクセルビットの長さを有する。このような比較は下記の表に示されている。
【０１７４】
【表５６】
入出力ピクセルビット分解能

【０１７５】
従って、ＭＰＥＧ及びビデオ画像会議標準用のピクセルデータフォーマットは、表５６のとおりその差異がある。
【０１７６】
ピクセル係数データフォーマット
【表５７】

【０１７７】
１０．９インタラプト条件
ＢＰはもし、このセクションで記述されたインタラプト条件に適合すると、ＡＲＭ７＿ＩＲＱフラグを確認しＡＲＭ７をインタラプトする。前記のＢＰは、２個のインタラプト条件のセットすなわち、省略時及びエラー条件を有している。このような条件は、ＢＰ＿ＳＴＡＴＵＳ［１５：０］に貯蔵されている。もし、いずれか１つのビットがＢＰによりセットされると、ＡＲＭ７＿ＩＲＱ信号を活性化させる。このような条件は、ＢＰ＿ＩＮＴ＿ＭＡＳＫ［１５：０］レジスタの対応するビットをセッティングさせることによって、そのすべてをマスクすることができる。
【０１７８】
１０．９．１省略時インタラプト条件
＊Ｄｅｆａｕｌｔｃｏｎｄｉｔｉｏｎ０（ＢＰ＿ＳＴＡＴＵＳ［０］）−入力バッファ０の処理を終了した時、前記のＢＰがＩＢＵＦ０＿ＤＯＮＥフラグもセッティングさせるＡＲＭ７＿ＩＲＱを確認しなければならない。
＊Ｄｅｆａｕｌｔｃｏｎｄｉｔｉｏｎ１（ＢＰ＿ＳＴＡＴＵＳ［１］）−入力バッファ１の処理を終了した時、前記のＢＰがＩＢＵＦ１＿ＤＯＮＥフラグもセッティングさせるＡＲＭ７＿ＩＲＱを確認しなければならない。
＊Ｄｅｆａｕｌｔｃｏｎｄｉｔｉｏｎ２（ＢＰ＿ＳＴＡＴＵＳ［２］）−入力バッファ０の処理を終了した時、前記のＢＰがＯＢＵＦ０＿ＤＯＮＥフラグもセッティングさせるＡＲＭ７＿ＩＲＱを確認しなければならない。
＊Ｄｅｆａｕｌｔｃｏｎｄｉｔｉｏｎ３（ＢＰ＿ＳＴＡＴＵＳ［３］）−入力バッファ１の処理を終了した時、前記のＢＰがＯＢＵＦ１＿ＤＯＮＥフラグをセッティングさせるＡＲＭ７＿ＩＲＱを確認しなければならない。
＊Ｄｅｆａｕｌｔｃｏｎｄｉｔｉｏｎ４（ＢＰ＿ＳＴＡＴＵＳ［４］）−ビデオエンコーディングの場合、ＡＲＭ７により設計されたスライスまたはＧＯＢを終える時、またはビデオデコーディングの場合、スライスまたはＧＯＢでなく初期コードに到着した時、また前記のＢＰはＢＰ＿ＤＯＮＥフラグをセッティングするＡＲＭ７＿ＩＲＱを確認しなければならない。
＊Ｄｅｆａｕｌｔｃｏｎｄｉｔｉｏｎ５（ＢＰ＿ＳＴＡＴＵＳ［５］）−先占文脈スイッチングモードで文脈貯蔵動作を終了した時、または協力文脈スイッチングモードで、現在のスライスまたはＧＯＢを終了した時、前記のＢＰはＣＴＸ＿ＳＷ＿ＤＯＮＥフラグをセッティングするＡＲＭ７＿ＩＲＱを確認しなければならない。
＊Ｄｅｆａｕｌｔｃｏｎｄｉｔｉｏｎ６（ＢＰ＿ＳＴＡＴＵＳ［６］）−文脈再ロードが終わった時、前記のＢＰは、ＣＴＸ＿ＲＥＬＯＡＤ＿ＤＯＮＥフラグをセッティングするＡＲＭ７＿ＩＲＱを確認しなければならない。
＊Ｄｅｆａｕｌｔｃｏｎｄｉｔｉｏｎ７（ＢＰ＿ＳＴＡＴＵＳ［７］）−現在、ＢＰ＿ＳＴＡＴＵＳ［７］が維持される。従って、このようなビットは“０”にセットされるべきである。正常的に、このような省略時インタラプト条件は、ＢＰ＿ＩＮＴ＿ＭＡＳＫ［７：０］を使用しマスクされるように勧告はしない。しかし、或る動作では、使用者がＤｅｆａｕｌｔｃｏｎｄｉｔｉｏｎ１をマスクすることを願う場合もある。
【０１７９】
１０．９．２エラーインタラプト条件
もしエラーがＢＰで発生すると、前記のＢＰはＡＲＭ７インタラプトが要求されるように、ＢＰ＿ＥＲＲフラグをセットする。同時に、前記のＢＰは、前記のＢＰ＿ＳＴＡＴＵＳレジスタのＢＰ＿ＥＲＲ＿ＣＯＤＥフィールドでゼロでない値の中から適当なデータをセットする。このような８ビットＢＰ＿ＥＲＲ＿ＣＯＤＥは下記のような意味を有している。
＊ＢＰ＿ＣＯＤＥ＝8'b0000_0000：何等のエラーも発生しない
＊ＢＰ＿ＥＲＲ＿ＣＯＤＥ＝8'b0000_0001：ＢＰ＿ＭＯＤＥレジスタに不適当にセッティングされる
＊ＢＰ＿ＥＲＲ＿ＣＯＤＥ＝8'b0000_0010：不適当に水平マクロブロック位置がセットされる
＊ＢＰ＿ＥＲＲ＿ＣＯＤＥ＝8'b0000_0011：不適当に垂直マクロブロック位置がセットされる
＊ＢＰ＿ＥＲＲ＿ＣＯＤＥ＝8'b0000_0100：マクロブロックアドレス増加に対する不適当したＶＬＣ
＊ＢＰ＿ＥＲＲ＿ＣＯＤＥ＝8'b0000_0101:マクロブロックタイプに対する不適当したＶＬＣ
＊ＢＰ＿ＥＲＲ＿ＣＯＤＥ＝8'b0000_0110:マクロブロックモーションコードに対する不適当したＶＬＣ
＊ＢＰ＿ＥＲＲ＿ＣＯＤＥ＝8'b0000_0111：不適当した取消しモーションベクトルマーカービット
＊ＢＰ＿ＥＲＲ＿ＣＯＤＥ＝8'b0000_1000:符号化されたブロックパターンに対する不適当したＶＬＣ
＊ＢＰ＿ＥＲＲ＿ＣＯＤＥ＝8'b0000_1001：ブロックＤＣＴｄｃサイズに対する不適当したＶＬＣ
＊ＢＰ＿ＥＲＲ＿ＣＯＤＥ＝8'b0000_1010：不適当したＤＣＴｄｃ値
＊ＢＰ＿ＥＲＲ＿ＣＯＤＥ＝8'b0000_1011：ブロックＤＣＴａｃ係数に対する不適当したＶＬＣ
＊ＢＰ＿ＥＲＲ＿ＣＯＤＥ＝8'b0000_1100：１つのマクロブロックでブロックの＃が６４を超過する
＊ＢＰ＿ＥＲＲ＿ＣＯＤＥ＝8'b0000_1101：不適当したｆ＿ＣＯＤＥ値（例えば、値が０）
＊ＢＰ＿ＥＲＲ＿ＣＯＤＥ＝8'b0000_1110：ブロックＤＣＴａｃ係数に対する不適当したＶＬＣ
＊ＢＰ＿ＥＲＲ＿ＣＯＤＥ＝8'b0000_1111：不適当したＩＢＵＦ及びＯＢＵＦアドレス設定
＊ＢＰ＿ＥＲＲ＿ＣＯＤＥ＝8'b0001_0000：ＢＰ入出力バッファ用スタートアドレスの最下位４ビットはゼロでない
他のＢＰ＿ＥＲＲ＿ＣＯＤＥ値は貯蔵される。
【０１８０】
１０．１０詳細な機能性要求
１０．１０．１ＩＯＢＵＳインターフェース
ＢＰとＣＣＵとの間のすべてのデータ移動は、ＩＯＢＵＳを通して遂行される。前記のＩＯＢＵＳは、多重化されたアドレス及びデータを含む３２ビット＠４０ＭＨｚ同期バスである。少なくとも７サイクルは、前記のＩＯＢＵＳを通して１６バイトデータを伝達するように要求されるので、前記ＩＯＢＵＳの最大伝達速度は、９１．４Mbytes/sec(７３１．４Mbits/sec)である。
前記のＢＰは、すべてのＩＯＢＵＳ読出及び書込伝送のためにマスタまたはスレーブとなり得る。前記のＢＰがマスタとして動作する時、ＩＯＢＵＳアービタにリクエスタ信号を送らなければならない。もし、前記のＩＯＢＵＳがなければ、前記のアービタは前記のＢＰに与え、そしてデバイスセレクト信号を送る。前記のＩＯＢＵＳを通したデータの競争は、下記の３個のカテゴリーの中のいずれか１つである。すなわち、２または４このピクセル要素を含んでいる３２ビットピクセルデータと、３２ビット圧縮ビットストリームワードと、エンコーディング及びデコーディング動作のための構文／制御パラメータの中の１つである。ＩＯＢＵＳインターフェースに関するタイミング図のような情報に追加し、使用者は前記のＭＳＰＩＯＢＵＳ仕様を検討するよう勧告される。
１０．１０．２ブロック層プロセッシング
１０．１０．２．１ジグザグスキャン規定
前記のＢＰは、ＭＰＥＧビデオ標準で提示された２個のジグザグスキャン変換マトリクスを支援する。ＶＰ及びＢＰの間で伝送されるこのような８×８ブロックデータは、すべて６４個の成分を含んでいる。
１０．１０．２．２ＲＬＣコード
ＲＬＣデコーディングのために、前記のＢＰはＤＣＴａｃ係数のハフマンデコード結果によって、ゼロ及びレベルデータを発生する。もし、データが１つの８×８ブロックに生成される前、６４個のピクセルｅｎｄ＿ｏｆ＿ｂｌｏｃｋ信号が検出されると、前記のＲＬＣデコーダーは、残りのゼロデータを作るようになる。ＲＬＣエンコーディングのために、前記のＢＰは隣接したゼロデータをカウントし、次のＮＯＮ−ゼロデータと結合することによって、ラン−長さ及びレベルコードを発生させる。もし、残こっているデータのすべてがゼロに等しい場合、残っているデータのためにＲＬＣを発生するよりは、ｅｎｄ＿ｏｆ＿ｂｌｏｃｋを発生させるとよい。ＲＬＣコードのための動作サイクルは、このように発生されたゼロの数ほど進行される。
【０１８１】
１０．１０．２．３ハフマンコード
前記のＢＰハフマンコードは、ＭＰＥＧ−１、ＭＰＥＧ−２、Ｈ．２６１及びＨ．２６３ビデオテーブル基準で勧告されたすべてのハフマンテーブルを支援する。全てのロムワードが１２ビットの場合、すべてのテーブルはルックアップテーブルで実行され得る。しかし、単純であるかまたは非常に複雑なものを有する或るハフマンテーブルは、ハードワイヤード(hardwired)ロジックを使用して具現され得る。前記のルックアップテーブルロムを使用して具現する前記のデコーダーテーブルは、表に要約されている。
【０１８２】
【表５８】
ハフマンデコーダールックアップテーブルのために要求されるロムサイズ

【０１８３】
前記のエンコーダーテーブルは、前記のデコーダーテーブルより更に大きいロムサイズを要求する内容が、表に要約されている。
【０１８４】
【表５９】
ハフマンエンコーダールックアップテーブルのために要求されるロムサイズ

【０１８５】
表から、ハフマンエンコーダ及びデコーダのために全体的に要求されるロムのサイズは、７６８×１２ビットである。前記のテーブルは、スタッフコード、ｅｓｃａｐｅ＿ｃｏｄｅ、ＤＣＴ係数のサインビット及び、ステートマシンにより操作されるｅｎｄ＿ｏｆ＿ｂｌｏｃｋコードを含んでいない。前記それぞれのハフマンコードのための動作サイクルは、表に記述されている。
【０１８６】
【表６０】
ハフマンコード用処理サイクル

【０１８７】
最後に、ＪＰＥＧデコードテーブルは、前記のような接近過程を用いると、具現され得ないことを表わす。しかし、ｄｃ＿ｃｏｅｆｆ＿ｎｅｘｔ＿０テーブルでは、ＪＰＥＧエンコーディング応用が使用され得る。
１０．１０．２．４差動ｄｃ値
イントラーブロックの場合、前記のＢＰはまた、８×８ブロックデータの第１番目要素の差動ｄｃ係数を計算し、そして既に伝送された差動ｄｃ係数でｄｃ値を再現する。
１０．１０．２．５非コード化されたブロック
前記のＢＰは、コード化されないブロックを支援しない。前記のＶＰ及びＡＲＭ７は、非符号化されたブロックを遂行する。前記のＶＰ及びＡＲＭ７がこのようなブロックを処理するように、前記のＢＰはマクロブロックヘッダのワードに表れているｃｏｄｅｄ＿ｂｌｏｃｋ＿ｐａｔｔｅｒｎで、非コード化されたブロックを表わす。
１０．１０．２．６ブロック伝送順序
エンコーディング及びデコーティングのために伝送された１つのマクロブロックで、ブロックの順序は下記のとおりである：輝度（Ｙ）ブロック０、１、２及び３、色相ブルー(Cb)及び色相レッド(Cr)ブロックである。
１０．１０．３マクロブロック層処理
１０．１０．３．１差動モーションベクトル
前記のＢＰは、前記のモーション推定結果から差動モーションベクトルを計算し、次の場合を除いては伝送された差動モーションベクトルを有してモーションベクトルを再現する。
＊第１場合は、ＭＰＥＧ−２ビデオエンコーディングの場合のデュアルプライムモードである。この場合で、前記のＢＰに伝送されたモーションベクトルは、ベクトル‘［０］［０］［１：０］’であり、ベクトル‘［ｒ］［０］［１：０］
ではない（ＭＰＥＧ−２ビデオ標準の７．６．３．６節参照）。
＊第２の場合は、Ｈ．２６３の改良予測モードである。この場合、４個のモーションベクトル及び、このような値が差動値として前記のＢＰから／に伝送されなければならない。
１０．１０．３．２スキップされたマクロブロック
前記のＢＰは、スキップされたマクロブロックを支援しない。前記のＶＰ及びＡＲＭ７は、このようにスキップされたマクロブロックを処理する。前記のようにスキップされたマクロブロックを処理するための前記のＶＰ及びＡＲＭ７で、前記のＢＰは、前記マクロブロックのヘッダワードに水平及び垂直マクロブロックアドレスを書き込む。
１０．１０．３．３マクロブロックスタッフコード
ＭＰＥＧ−１で、一サイクルでマクロブロックスタッフコードがもし生ずると、前記のＢＰはこれを捨てなければならない。しかし、ＭＥＰＧ−１エンコーディングで、ＢＰは使用者がマクロブロック層ヘッダ内のマクロブロックスタッフコードを含まないようにする。一般的に、このようなスターピングコードは、出力ビデオレートバッファを制御することに使用される。従って、マクロブロックスタッフコードを挿入する代わりに、スタートコードの間にゼロスタッフビットを挿入するように勧告する。
ＭＰＥＧ−１及びＭＰＥＧ−２の応用のために、ビットストリーム出力はスライス層までバイト−アラインメントされなければならない。たとえビットストリーム出力が画像層までバイト−アラインメントされてもＨ．２６３応用のためには、ＧＯＢ層までバイト−アラインメントさせる。しかし、Ｈ．２６１エンコーダーの出力はバイト−アラインメントされない。従って、ＡＲＭ７でルーチンを形成するビットストリームは、このような差異を考えてプログラムされる。ＩＯＢＵＳを通した最後のデータ伝送のためのデータ量が、エンコーディングの場合１６ビット以下の場合、ＢＰはスライスのエンドにゼロを満たす動作(zero-fill)を自動的に遂行する。
【０１８８】
１０．１０．４．２エキストラスライス情報
デコーディングにおいて、ＢＰはＭＰＥＧ−１またはＭＰＥＧ−２ビットストリームのスライスヘッダに含まれる、任意のエキストラスライス情報を捨てる。エンコーディングにおいて、ＢＰは使用者により要請された任意のエキストラスライス情報を挿入させない。もし、使用者がまだＭＰＥＧ−１またはＭＰＥＧ−２ビットストリームにこの情報を含ませようとする場合は、ＢＰにより予めエンコーディングされたビットストリームにこの情報を挿入すればよい。
１０．１０．４．３イントラースライス
ＭＰＥＧ−２スライス層ビットストリームにおいて、intra_sliceというパラメータは、現在スライスがイントラーマクロブロックのみで構成されたことを表わすことに使用される。この情報は、デコーディング過程では使用されず、高速前進または高速後進機能を遂行する場合、ＤＳＭ応用を助けるためのものである。従って、ＢＰはデコーディング応用の場合この情報を捨て、エンコーディング応用の場合、スライス層ヘッダにあるｉｎｔｒａ＿ｓｌｉｃｅに０を挿入する。
１０．１０．４．４スライスまたはＧＯＢスタートコード
ＭＰＥＧ−１、ＭＰＥＧ−２、Ｈ．２６１において、すべてのピクチャーは少なくとも１つのスライスまたはＧＯＢスタートコードを有する。しかし、Ｈ．２６３ピクチャーは、ＧＯＢスタートコードとヘッダ情報を有しない。特に、任意のＨ．２６３ピクチャーで第１番目のＧＯＢは、スタートコードとヘッダ情報を有しない。従って、入力されるビットストリームがＨ．２６３のためのものである場合、ＢＰ状態マシンは、マクロブロック層を直ちに処理しなければならない。それだけでなく、ビットストリームがデコーディングされる間、ＧＯＢスタートコードが発見されると、ＢＰはスタートコードをデコーディングし、ＡＲＭ７をインタラプトせず処理過程を続いて遂行する。
【０１８９】
１０．１１入力／出力ダブルバッファインターフェース
１０．１１．１一般的な説明
入力及び出力バッファは、ダブルバッファにより具現される。従って図６４と図６５の図示のとおり、ＩＢＵＦ０、ＩＢＵＦ１、ＯＢＵＦ０、ＯＢＵＦ１の４個のメモリバッファを使用するようになる。
図６４よ図６５でのように、それぞれのバッファはスタート及び終了アドレスと充満及び完了フラグを有する。各バッファサイズを決定するために、使用者は各バッファに対するスタート及び終了アドレスに適切な値を記入しなければならない。
一応、バッファ用ソースプロセッサは、バッファに対する記入を完了すると、充満フラグを設定し、他のバンクに対する記入を開始する。バンク用シンクプロセッサはアクセスされるバンクが全部満たされていることが分かると、データを読み出す。バンクが空いていると、シンクは完了フラグを設定し、他のバンクの充満フラグをチェックする。
４個のスタートアドレスは、小節１０．７．２の説明のとおり、ＢＰによりアップデートされる。スタートアドレスのためのそれぞれのレジスタは、ＢＰが入力または出力バッファをアクセスする都度に、ＢＰによりアクセスされる最後のバイトアドレスを貯蔵する。従って、ＡＲＭ７はＩＢＵＦ０＿ＤＯＮＥ、ＩＢＵＦ１＿ＤＯＮＥ、ＯＢＵＦ０＿ＦＵＬＬ、ＯＢＵＦ１＿ＦＵＬＬのフラグの中のいずれか１つが設定されると、該当スタートアドレスを設定する。
また、スタートアドレスの最後の４ビットは常にＡＲＭ７によりゼロに設定される。その理由はＦＢＵＳ、ＣＣＵとＩＯＢＵＳとの間の内部データ割り当て構造のためである。また、各バッファサイズの全体バイト数が１６の倍数となるように、それぞれの最後アドレスを設定しなければならない。それだけでなく、最小バッファサイズは、ＭＰＥＧ−１とＭＰＥＧ−２に対して６４バイト、Ｈ．２６１とＨ．２６３に対して１２８バイトであることが勧告される。これはＡＲＭ７に対するＢＰの頻繁なインタラプトによる、性能の劣化を防止するためのことである。
【０１９０】
１０．１１．２非正常的なバッファステータスの処理
２個の出力バッファが満たされると、ＢＰは処理を中止させ、入力ダブルバッファステータスに関係なく、アイドル状態に落ちる。ＯＢＵＦ０＿ＤＯＮＥまたはＯＢＵＦ１＿ＤＯＮＥフラグが設定されると、ＢＰは自動的にこのアイドル状態から抜け出る。
２個の入力バッファが空くと、ＢＰは処理を中止する必要がなく、内部に残っているデータの処理が完了されるまで処理し続ける。しかし、２個の入力バッファが空くと、ＢＰはすぐＡＲＭ７をインタラプトする。残っているデータ処理の終了以降、入力バッファがまだ空いていると、ＢＰはアイドル状態に落ちる。ＩＢＵＦ０＿ＦＵＬＬまたはＩＢＵＦ１＿ＦＵＬＬフラグが設定されると、更にＢＰは自動的にこの状態から抜け出る。
この小節で記述するアイドル状態は、この仕様書で記述した他のアイドル状態とは異なる。その理由は、他のアイドル状態から抜け出るためには、通常ＡＲＭ７の制御コマンドを必要とするからである。
【０１９１】
１０．１１．３Ｉ／Ｏバッファの物理的な具現：例
大抵、ＢＰ入力及び出力バッファの位置とサイズを決定することは、使用者の分である。使用者はＶＰデータキャッシュ、ＡＲＭ７データキャッシュまたはＳＤＲＡＭのスクラッチパッド領域でバッファを具現する。ＢＰ入力及び出力ダブルバッファの具現が多少制限的であっても、前記のバッファを具現するための効率的な方法がある。
ここにおいて、ビデオデコーディング応用において、レートバッファの具現に対する特殊な例を挙げる。この場合、使用者はＢＰ入力バッファを循環バッファで具現しようとする。ここで、ＳＤＲＡＭを使用し、完全なレートバッファは図６６の図示のとおり、４個のブロックに分割されるものと仮定する。
初期に、使用者はＲａｔｅ＿Ｂｕｆｆｅｒ＿Ｂｌｏｃｋ＿０とＲａｔｅ＿Ｂｕｆｆｅｒ＿Ｂｌｏｃｋ＿１とをそれぞれＩＢＵＦ０とＩＢＵＦ１に設定することができる。これは下記のように設定することによって可能になる。
ＩＢＵＦ０＿ＳＴＡＲＴ＝Ｒａｔｅ＿Ｂｕｆｆｅｒ＿Ａｄｄｒｅｓｓ＿０；
ＩＢＵＦ０＿ＥＮＤ＝Ｒａｔｅ＿Ｂｕｆｆｅｒ＿Ａｄｄｒｅｓｓ＿１；
ＩＢＵＦ１＿ＳＴＡＲＴ＝Ｒａｔｅ＿Ｂｕｆｆｅｒ＿Ａｄｄｒｅｓｓ＿２；
ＩＢＵＦ１＿ＥＮＤ＝Ｒａｔｅ＿Ｂｕｆｆｅｒ＿Ａｄｄｒｅｓｓ＿３。
ＩＢＵＦ０にあるデータ（すなわち、Ｒａｔｅ＿Ｂｕｆｆｅｒ＿Ｂｌｏｃｋ＿０にあるデータ）のすべてがＢＰにより使用されると、ＢＰはＡＲＭ７をインタラプトする。そうすると、ＡＲＭ７は下記のように設定することによって、Ｒａｔｅ＿Ｂｕｆｆｅｒ＿Ｂｌｏｃｋ＿２をＩＢＵＦ０に設定する。
ＩＢＵＦ０＿ＳＴＡＲＴ＝Ｒａｔｅ＿Ｂｕｆｆｅｒ＿Ａｄｄｒｅｓｓ＿４；
ＩＢＵＦ０＿ＥＮＤ＝Ｒａｔｅ＿Ｂｕｆｆｅｒ＿Ａｄｄｒｅｓｓ＿５。
ＩＢＵＦ１にあるデータのすべてがＢＰにより使用されると、ＢＰはＡＲＭ７をインタラプトする。そうすると、ＡＲＭ７は下記のように設定することによって、Ｒａｔｅ＿Ｂｕｆｆｅｒ＿Ｂｌｏｃｋ＿３をＩＢＵＦ１に設定する。
ＩＢＵＦ１＿ＳＴＡＲＴ＝Ｒａｔｅ＿Ｂｕｆｆｅｒ＿Ａｄｄｒｅｓｓ＿６；
ＩＢＵＦ１＿ＥＮＤ＝Ｒａｔｅ＿Ｂｕｆｆｅｒ＿Ａｄｄｒｅｓｓ＿７。
Ｒａｔｅ＿Ｂｕｆｆｅｒ＿Ｂｌｏｃｋ＿２にあるデータのすべてがＢＰにより使用されると、ＡＲＭ７は第１番目の段階でのようにアドレスを設定することをによって、Ｒａｔｅ＿Ｂｕｆｆｅｒ＿Ｂｌｏｃｋ＿０をＩＢＵＦ０に更に設定する。
従って、循環バッファは単にこのような完全な過程を繰り返すことによって具現され得る。この例は、ＢＰダブルバッファの使用が、使用者の意図によって非常に柔軟であることを表している。
【０１９２】
１０．１２文脈スイッチング
１つ以上の応用がＭＳＰを駆動させる場合、ＡＲＭ７動作システムは、ＢＰが現在作業を終結させ、他の作業に切り換えられるように命令する。この過程は、通常に“文脈スイッチング”という。ＢＰは下記の２種類の文脈スイッチングモードを支援する。
１０．１２．１先占(preemptive) 文脈スイッチング
先占文脈スイッチングは、ＢＰが現在８×８画素ブロック処理を遂行してから正常的な処理が終了したことを意味する。ＡＲＭ７はＢＰ＿ＣＯＮＴＲＯＬ［６：５］レジスタにあるＣＴＸ＿ＳＷＩＴＣＨとＣＴＸ＿ＭＯＤＥフラグを“１１”に設定することによって、先占文脈スイッチングモードを命令する。現在ブロック処理が完了されると、ＢＰは後の処理のために、内部文脈を外部ＳＤＲＡＭに送る。
ＢＰは文脈貯蔵を完了すると、ＢＰ＿ＳＴＡＴＵＳ［５］に位置したＣＴＸ＿ＳＷ＿ＤＯＮＥフラグを設定することによって、ＡＲＭ７をインタラプトする。そうすると、ＡＲＭ７はＢＰの入出力バッファのすべての内容を貯蔵し、他の作業のためにＢＰを初期化させる。
このモードは、ＢＰがＡＲＭ７の文脈スイッチングリクエストをできるだけ速く応答するようにする。最悪の場合、ＢＰは現在ブロック処理を完了するために、約１５０サイクル（＝3.75μsec)を必要とする。しかし、正常的な場合、ブロック処理を完了するために数十のサイクルを必要とすると見なすことが好ましい。
【０１９３】
１０．１２．２協調型 (cooperative) 文脈スイッチング
協調型文脈スイッチングによると、ＢＰで文脈貯蔵過程を除去することができる。これはすべてのスライスまたはＧＯＢ層処理時に、ＢＰ内部の状態のすべてを初期化させなければならないという事実に起因したものである。このモードで、ＢＰは現在スライスまたはＧＯＢを続いて正常的に処理してから処理を完結させる。
ＡＲＭ７はＢＰ＿ＣＯＮＴＲＯＬ［６：５］レジスタにあるＣＴＸ＿ＳＷＩＴＣＨとＣＴＸ＿ＭＯＤＥフラグを“１０”に設定することによって、協調型文脈スイッチングモードを命令する。現在スライスまたはＧＯＢ処理が完了されると、ＢＰはＢＰ＿ＳＴＡＴＵＳ［５］に位置したＣＴＸ＿ＳＷ＿ＤＯＮＥフラグを設定することによってＡＲＭ７をインタラプトする。そうすると、ＡＲＭ７はＢＰの入出力バッファのすべての内容を貯蔵し、他の作業のためにＢＰを初期化させる。
【０１９４】
１０．１２．３文脈再ロード
その以前の作業を切換するために、ＢＰはアドレスＳＡＶＥ＿ＡＤＲ［３１：０］からＳＤＲＡＭに貯蔵される文脈を再ロードする。この文脈再ロードを要請するために、ＢＰがアイドル状態にある必要がある。このリクエストのための可能な状況は、ＢＰ＿ＤＯＮＥが設定された場合、ＣＴＸ＿ＤＯＮＥまたはＡＲＭ７がソフトウェアでＢＰをリセットさせた場合である。それで、ＡＲＭ７がＢＰ＿ＣＯＮＴＲＯＬ［７］にあるＣＴＸ＿ＲＥＬＯＡＤフラグを設定すると、ＢＰはアイドル状態から抜け出て、貯蔵された文脈を読み出し始める。
ＢＰは文脈再ロード動作を完了してから、ＣＴＸ＿ＲＥＬＯＡＤ＿ＤＯＮＥフラグを設定してＡＲＭ７をインタラプトする。そうすると、ＡＲＭ７はＢＰの内部レジスタを初期化させ、以前の作業処理のためにＢＰをイネーブルさせる。
１０．１３作業ハンドシェーキング
この節は、ＢＰが処理を終えた場合、作業ハンドシェークのための細部的な過程を取扱う。ここにおいて、“最後のデータのためのポインターのアップデート”とは、ＢＰがＶＡＬＩＤ＿ＢＹＴＥ＿ＡＤＲ［３１：０］とＶＡＬＩＤ＿ＢＩＴ＿ＰＯＳ［２：０］にそれぞれ適切な値を記入したことを意味する。
【０１９５】
１０．１３．１エンコーディングの場合
正常状態で、エンコーディングのための入力データは、ＶＰから供給される。入力ダブルバッファの中の１つがＶＰにより満たされると、ＢＰはＩＯＢＵＳを通してデータの読出を開始する。処理が終了される時点（すなわち、処理されたマクロブロック数が、ＡＲＭ７により指定されたマクロブロック数と同一な場合）で、ＢＰはＢＰ＿ＤＯＮＥフラグを設定してＡＲＭ７をインタラプトし、アイドル状態に落ちる。
有効データのためのポインターは、スライスまたはＧＯＢに対する“圧縮されたビットストリームの最後”を表わす。また、ＶＡＬＩＤ＿ＢＹＴＥ＿ＡＤＲ［３１：０］は、出力ダブルバッファの中のいずれか１つの位置を表わす。
ＡＲＭ７はこの圧縮されたビットストリームと、上位層ヘッダを結合させて最終のビットストリームを形成し、処理過程を繰り返す。出力ダブルバッファにあるデータを完全に焼尽させる前、ＡＲＭ７がＢＰを再開始しようとする場合は、少なくとも１つの出力ダブルバッファを焼尽させ、ＢＰが再開始された時、ポインタはＢＰによりアップデートされるので、最後のデータのためのポインタはそのまま置いておくことによって可能である。
【０１９６】
１０．１３．２デコーディングの場合
まず、ＡＲＭ７はスライスまたはＧＯＢスタートコード（存在時）を探索する。スタートコードが発見されると、ＡＲＭ７はＢＰを初期化しイネーブルさせる。ＢＰでハフマンデコーディング、ＲＬＣデコーディング及び逆ジグザグスキャン変換を遂行してから、データはＶＰ処理のために出力バッファに伝送される。ＢＰはこの処理ルーチンを非スライスまたは非ＧＯＢスタートコードが検出されるまで続く。これらが検出されると、ＢＰは“非スライスまたは非ＧＯＢスタートコードのエンド”に使用される最後のデータのためにポインタを設定し、ＡＲＭ７をインタラプトする。次に、ＡＲＭ７はスタートコードをデコーディングして、次回のスライスまたはＧＯＢコードが発見されるまで、ヘッダ解釈(parsing)を遂行する。
１０．１３．３圧縮されたビットストリームで発見されたエラー
実際のデータが、電話線及び公衆スイッチ網を通して伝送される画像電話応用において、いくつかの無効データが入力されるビットストリームに含まれる可能性が非常に高い。この場合、ＢＰはＡＲＭ７をインタラプトし、ＥＲＲ＿ＨＡＮＤＬＥ＿ＭＯＤＥフラグをチェックしなければならない。もしＢＰが特定の応用のためにイネーブルされる前、使用者がエラー処理モードを決定する場合は、安全である。
ERR_HANDLE_MODEフラグが“１”に設定されると、ＢＰは自動的に次のスタートコードを探し出す。スタートコードがスライスまたはＧＯＢのためのものである場合、ＢＰは正常的な処理を続く。このモードが非常に効率的であるが、その理由はＢＰがＡＲＭ７より更に迅速にスタートコードを探し出すことができ、ＢＰが次のスタートコードを探す間、ＡＲＭ７が他の処理ルーチンを遂行し得るからである。しかし、スライスまたはＧＯＢ層とは異なるスタートコードが発見されると、ＢＰはＢＰ＿ＤＯＮＥフラグを設定してＡＲＭ７を更にインタラプトし、アイドル状態に落ちる。このような場合、最後のデータのために使用されたポインタは、次回のスタートコードのエンドを指示しなければならない。
ＥＲＲ＿ＨＡＮＤＥＬ＿ＭＯＤＥフラグが“０”に設定されると、ＢＰは次回のスタートコードを探せず、アイドル状態に落ちる。このような場合、最後のデータのために使用されるポインタは、エラーは発見された位置を指示しなければならない。このモードは使用者がＡＲＭ７命令語を利用して、汚染されたビットストリームをデバッギングする場合に有用である。
【０１９７】
【実施例２】
付録Ｂ
ＭＰＣビットストリーム処理器
ビットストリーム処理器（ＢＰ）は、ビデオデータエンコーディング及びデコーディング応用に重要なＭＳＰ処理コアの中の１つである。ＢＰはＭＰＥＧスライス層エンコーディング及びデコーディング、そしてＨ．２６１／Ｈ．２６３グループブロック（ＧＯＢ）層エンコーディング及びデコーディングを処理する。デコーディング応用において、ＢＰは各マクロブロックに含まれた全体情報を、ベクトル処理器及びＡＲＭ７コアに提供する。
ビットストリーム処理器ハードウェアは、４個の機能ブロックに分けられる。
＊Ｉ／Ｏ制御及びデコーディングユニットを含むＩＯＢＵＳポートインターフェース
＊ＢＰ制御ステートマシン
＊ＢＰレジスタマルチプレクサー、レジスタ、算術論理ユニット(ALU)及びマルチプレクサー、ＦＩＦＯ制御ユニットを含むコーデックコア
＊ＶＬＣＦＩＦＯユニット
＊コーデックアドレス発生器と共にルックアップを含むＶＬＣコーデック
ＶＬＣＬＵＴＲＯＭ（３４０、図３参照）についての説明は下記のとおりである。
１．０方法論
ルックアップユニットは、ハフマンエンコーディング及びデコーディングの核心である。このユニットは、ＭＰＥＧ−１、ＭＰＥＧ−２、Ｈ．２６１及びＨ．２６３仕様に含まれたＶＬＣテーブルを支援し、三星ＭＳＰにより支援される。このテーブルの大部分は、１２ビット幅を有するＲＯＭで具現される。しかし、ルックアップ処理があまり単純で、ＲＯＭテーブルのサイズに当たらない場合、特殊なエンコーディング及びデコーディングが適用される。このようなレイヤーの４つの仕様のすべては、多い可変長さコードを１７ビットまで含む。エンコーディング値またデコーディング以外に、コードサイズ及び有効コード指示者が、エンコーディング及びデコーディングのために提供され、正確に処理されるようにする。ＶＬＣテーブルをエンコーディングまたはデコーディングするために従来の方法を使用する場合、ＲＯＭテーブル及びアドレス発生器が更に大きくなる。
【０１９８】
１．１具現方法は下記のとおりである：
＊もしアドレス発生器の設計が難しくなければ、ＲＯＭテーブルをできるだけ多く共有する。
＊エンコーディングまたはデコーディングに基づいて、ＶＬＣテーブルを再配列する。
＊ハフマンコードに基づいて‘０’カウントと‘１’カウントをまずデコーディングする。
＊符号または偶数／奇数のような１ビットフラグを使用しテーブルサイズを減らす。
＊可能であれば、１つのＲＯＭ位置を‘上位’と‘下位’に分離する。
＊ＲＯＭテーブルアドレスを発生させるために、ＶＬＣの最下位ビット(LSB)を使用しアドレス発生器を簡素化する。
この方法は非常に効率的である。最終のＲＯＭテーブルサイズは、７６８＊１２ビットで、問題を伴うにはずっと小さい。ルックアップは、ＲＯＭテーブルアドレス発生器と、ＲＯＭテーブルルックアップ処理により遂行される。アドレス発生器はテーブル種類、モード及びＶＬＣ値のような入力信号をデコーディングし、ＲＯＭテーブルのアドレスを発生させる。以降、ＲＯＭテーブル値及び他の情報からエンコーディングまたはデコーディングデータが得られる。デコーディングテーブルは２つのフォーマットを有するが、一つはＶＬＣコード当りのＲＯＭ位置を有するＤＣＴ係数に適用され、もう一つはそれぞれのＲＯＭ位置が上位６ビットと下位６ビットに分割されている他のテーブルに適用される。従って、各位置は２つのＶＬＣコードを有する。エンコーディングテーブルは、２つのフォーマットを有するが、一つはＨ．２６３のＴＣＯＥＦに関するもので、もう一つは他のテーブルに対するものである。各ＲＯＭ位置はエンコーディング応用のために一つのハフマンコードを含む。ＲＯＭテーブルのサイズは７６８×１２ビットである。テーブルは下記のように示すことができる。
【０１９９】
ＶＬＣデコーディングＲＯＭテーブルマップ
【表６１】

【０２００】
ＶＬＣエンコーディングＲＯＭテーブルマップ
【表６２】

【０２０１】
１．２デコーディング
デコーディングに対するすべてのテーブルは、‘０’または‘１’カウントに基づいて再配列される。ＶＬＣコードのＭＳＢが‘０’の場合は‘０’カウントが印加され、‘０’でない場合は‘１’カウントが使用される。例えば、コード'00001xxx’の場合４個の‘０’を有し、コード'1110xxx’の場合３個の‘１’カウントを有する。デコーティング過程はまず‘０’／‘１’カウントをデコーディングし、ＶＬＣコードの‘０’／‘１’カウントをＲＯＭテーブルアドレス発生器へ出力する。以降、アドレス発生器は残りのコードをデコーディングしてアドレスを発生させる。アドレスは２個部分けられるが、一つはオフセットで、もう一つはいわゆるマスクされたアドレスであって、ＶＬＣテーブルから得られる。アドレスは２個部分に対する論理和（ＯＲ）から得られる。アドレス発生器により提供される他の情報は、下記の通り示すことができる。
＊ＶＬＣｃｏｄｅｓｉｚｅ
＊ＳｐｅｃｉａｌＦｌａｇ：２ビットフラグは、Ｈ．２６１で‘ＥＳＣＡＰＥ’、‘ＥＮＤＯＦＢＬＯＣＫ’、‘ＳＴＵＦＦＩＮＧ’、または‘ＳＴＡＲＴＣＯＤＥ’に対するデコーディングステートマシンを表わす。
＊Ｈｉｇｈｄａｔａｅｘｔｒａｃｔｅｎａｂｌｅ：有効データは上位６ビットである。
＊Ｓｉｇｎ／ｅｖｅｎｅｎａｂｌｅ：このフラグは、デコーディングがＶＬＣのＬＳＢを、テーブルに基づいた符号または偶数ビットで抽出するべきであることを指示する。
＊ＶａｌｉｄＶＬＣ
＊Ｍａｓｋｓｈｉｆｔｂｉｔｓ及びｍａｓｋ：この両信号は、マスクされたアドレスを発生させるために印加される。
ＲＯＭテーブルに対して、ＭＰＥＧ−２のテーブル１４、１５とＨ．２６３のテーブル１２を除外しては、それぞれの位置に上位と下位ビットフォーマットから形成されたデータが貯蔵される。
【０２０２】
１．２．１テーブル１２／ＭＰＥＧ−２
このテーブルはテーブル２−Ｂ．５ｃ／ＭＰＥＧ−１及びテーブル５／Ｈ．２６１と同一である。
ＲＯＭテーブルフォーマット：ビット１０〜６：ラン；ビット５〜０：レベル
１．２．２テーブル１５／ＭＰＥＧ−２
テーブル１４／ＭＰＥＧ−２と同一なラン、レベル及びＶＬＣコードを有するので、このテーブルの大部分は、テーブル１４／ＭＰＥＧ−２と共有する。
ＲＯＭテーブルフォーマット：ビット１０〜６：ラン；ビット５〜０：レベル
１．２．３テーブル１２／Ｈ．２６３
このテーブルは、ＭＰＥＧ−２のテーブル１４、１５と比較する時、１個以上の出力値‘ＬＡＳＴ’を有する。
ＲＯＭテーブルフォーマット：ビット１１：ＬＡＳＴ；ビット１０〜４：ラン；ビット３〜０；レベル
１．２．４モーションコード／マクロブロック増加分
この節は、テーブル１／ＭＰＥＧ−２、テーブル１０／ＭＰＥＧ−２、テーブル２−Ｂ．１／ＭＰＥＧ−１、テーブル２−Ｂ．４／ＭＰＥＧ−１、テーブル１／Ｈ．２６１及びテーブル３／Ｈ．２６１及びテーブル１０／ｈ．２６３を取扱う。
【０２０３】
モーションコードに対して、ＶＬＣ＝１の場合を除いてはＬＳＢが符号ビットである。マクロブロック増加分に対して、ＬＳＢはＶＬＣ＝１の場合を除いては、偶数値フラグである。従って、テーブルの半分のみをデコーディングする。タイル(tile)符号／偶数ビットを無視する場合、テーブル１０／Ｈ．２６３の上位部分を除いては、２種類のテーブルは、同一なＶＬＣ値とデコーディング値を有する。デコーティングされた値は、６ビットまで発生し、これは２個のテーブル値を１つの位置に置くことができることを意味する。例え、テーブル１０／Ｈ．２６３の下位部分のデコーディング値が他のものと異なっても、タイル２進値は固定小数点のために同一である。すなわち、このすべてのテーブルを取扱うために、固定小数点として１６半分の位置を使用する。ＲＯＭアドレスを発生させるために、１つの簡単なＦＳＭを使用する。応用において、モーションコードがデコーディングされる場合、ＲＯＭテーブルは絶対値を提供する。一方、アドレス発生器が符号ビットをイネーブルさせると、デコーダーはＬＳＢを抽出するが、‘１の場合’は負（−）、‘０’は正（＋）を意味する。このアルゴリズムは下記のように示すことができる。
ｉｆ（ｓｉｇｎ＿ｅｎａｂｌｅ＝＝１）
ｉｎｃｒｅｍｅｎｔ＿ｖａｌｕｅ＝ｓｉｇｎ＊ＲＯＭ＿ｔａｂｌｅ＿ｖａｌｕｅ；
ｅｌｓｅ
ｉｎｃｒｅｍｅｎｔ＿ｖａｌｕｅ＝ＲＯＭ＿ｔａｂｌｅ＿ｖａｌｕｅ；
もし、マクロブロックアドレス増加テーブルがデコーディングされると、その結果はＲＯＭテーブル値と偶数フラグから得られる。例えば、ＲＯＭテーブルは‘５’の値を提供する。偶数フラグが‘ハイ’であれば‘１０’の結果が得られ、偶数フラグが‘ロー’であれば‘１１’の値が得られる。このアルゴリズムは下記のように示すことができる。
ｉｆ（ｅｖｅｎ＿ｅｎａｂｌｅ＝＝１）
ｉｎｃｒｅｍｅｎｔ＿ｖａｌｕｅ＝（ＲＯＭ−ｔａｂｌｅ＿ｖａｌｕｅ＜＜１）
｜（〜ｅｖｅｎ＿ｂｉｔ）；
ｅｌｓｅ
ｉｎｃｒｅｍｅｎｔ＿ｖａｌｕｅ＝ＲＯＭ＿ｔａｂｌｅ＿ｖａｌｕｅ；ＲＯＭテーブルフォーマット：ビット１１〜６：上位データ；ビット５〜０：下位データ
１．２．５マクロブロックパターン
この節はテーブル９／ＭＰＥＧ−２、テーブル２−Ｂ．３／ＭＰＥＧ−１、テーブル４／Ｈ．２６１（ＣＢＰ）を取扱う。
デコーティングされた値は６ビットまで発生するが、これは１個の位置に２個のデータを置くことができることを意味する。すなわち、このテーブルのすべてを取扱うためには、３２個の位置が使用される。
ＲＯＭテーブルフォーマット：ビット１１〜６：上位データ；ビット５〜０：下位データ
【０２０４】
１．２．６マクロブロックタイプ
この節は、テーブル２、３、４／ＭＰＥＧ−２、テーブル２−Ｂ．２／ＭＰＥＧ−１、テーブル２／Ｈ．２６１（ＭＴＹＰＥ）及びテーブル３、４／Ｈ．２６３（ＭＣＢＰＣ）を取扱う。
デコーティングされた値は５ビットまで発生する。ここでも上位／下位データ概念を使用する。ＲＯＭアドレスを発生させるために、１個の簡単なＦＳＭが使用される。
ＲＯＭテーブルフォーマット：ビット１１〜６：上位データ；ビット５〜０：下位データ
たとえ、いくつかのビットが各仕様によって相異なる意味を有するが、マクロブロックタイプのフォーマットは、ＭＰＥＧに基づいて各仕様に対して普遍的に定義されている。Ｈ．２６３は情報要求に基づいて２段階のデコーディングを必要とするが、これは下記のとおりである。
＊３ビットマクロブロックタイプを有するデコーディングＭＣＢＰＣ
＊マクロブロックタイプ、ＢＰフラグ及びピクチャータイプに基づいたマクロブロックタイプルックアップ
ＶＬＣテーブルでマクロブロックタイプのフォーマットは下記のとおりである。
【０２０５】
【表６３】
ＭＰＥＧのマクロブロックタイプフォーマット

【０２０６】
【表６４】
Ｈ．２６３のＭＣＢＰＣフォーマット

【０２０７】
【表６５】
Ｈ．２６１のマクロブロックタイプフォーマット

表から、３ビットマクロブロックタイプだけでなく、２ビットクロマパターンを得る。ここにおいて、マクロブロックタイプは、０〜４までの範囲を有する３ビット値である。上述のごとく、細部的なマクロブロックタイプの種類は、第２段階でデコーディングされる。デコーディングルックアップテーブルは下記のとおりである。
【０２０８】
【表６６】
Ｈ．２６３のマクロブロックタイプデコーディングルックアップテーブル

【０２０９】
１．２．７ＤＣＴＤＣサイズ
この節は、テーブル１２、１３／ＭＰＥＧ−２、テーブル２−Ｂ．５／ＭＰＥＧ−１を取扱う。ＶＬＣ構造によって‘１’カウントはここで‘０’カウントの代わりに使用される
ＲＯＭテーブルフォーマット：ビット１０〜６：上位データ：クロマ；ビット５〜０：下位データ：輝度。ビット１１とビット５は予約されている。
１．２．８ＣＢＰＹ
この節は、テーブル９／Ｈ．２６３を取扱う。このテーブルは二セットのデータを含むが、一つはインターピクチャーに関するもので、もう一つはイントラーピクチャーに関する。一セットの値は、他セットの値を反転させたものであって、ＲＯＭに一セットのデータが貯蔵できるようにする。ここにおいて、インタラーデータがＲＯＭに位置する。１つの４ビット値がＣＢＰＹ値を表わすことに使用される。
ＲＯＭテーブルフォーマット：ビット９〜６：上位データ；ビット３〜０：下位データ。ビット１１〜１０とビット５〜４とは予約されている。
１．２．９デュアルプライム（ｄｕａｌｐｒｉｍｅ）及びモード
この節はテーブル１１／ＭＰＥＧ−２及びテーブル７／Ｈ．２６３を取扱う。
この２個のテーブルは、非常に簡単で小さくて、直接デコーディングされ得る。
１．３エンコーディング
デコーディング節と同様に、エンコーディング過程は‘０’／‘１’カウント概念を使用する。ＲＯＭテーブルは、‘０’／‘１’カウント、‘０’または‘１’カウントに対して最初１に後続くコードのサイズ及び、最初／最後‘１’に後続くＶＬＣコードに対する情報を含む。このフォーマットによると、ＲＯＭテーブルのサイズは、テーブル１２／Ｈ．２６３において特殊エンコーディングで解決される４つを除いては、１２ビットに制限され得る。フォーマットは下記のとおりである。
【０２１０】
【表６７】
一般的なエンコーディングフォーマット

【０２１１】
【表６８】
テーブル／Ｈ．２６３エンコーディングフォーマット

【０２１２】
前記のテーブルで、ＶＬＣコードサイズは最初／最後‘１’を後続くＶＬＣコードのサイズである。ＶＬＣコードは、最初／最後‘１’を後続くＶＬＣコードである。‘０’カウントの場合、最初の‘１’を後続くＶＬＣコードが抽出され、そうでない場合ＶＬＣコードは最後の‘１’を後続くビットから抽出されなければならない。エンコーディングで‘１’カウントの適用は、デコーディングでのそれとは異なる。‘１’カウントは‘１’カウントフラグがアドレス発生器によりイネーブルされる場合のみに適用される。従って、ＶＬＣのＭＳＢが１であるが‘１’カウントフラグがローの場合、ＲＯＭテーブルの‘０’／‘１’カウント部分は０となり、これは‘０’カウントが適用されることを意味する。
次の例は、エンコーディングに対するすべての可能な場合を取扱う。
例１：ＶＬＣ＝0000011001、one_count_enable＝０
一般的な場合に対する結果：0101 100 01001
テーブル１２／Ｈ．２６３に対する結果：101 100 001001
例２：ＶＬＣ＝11001、one_count_enable＝０
一般的な場合に対する結果：0000 100 0101
テーブル１２／Ｈ．２６３に対する結果：000 100 001001
例３：ＶＬＣ＝11001、one_count_enable＝１
一般的な場合に対する結果：0010 011 00001
テーブル１２／Ｈ．２６３に対する結果
一般的なアドレスは、オフセットと入力値の加算によって発生される。
【０２１３】
１．３．１テーブル１４／ＭＰＥＧ−２
このテーブルはテーブル２−Ｂ．５ｃ／ＭＰＥＧ−１及びテーブル５／Ｈ．２６１と同一である。このエンコーディングは‘ＲＵＮ’、‘ＦＩＲＳＴＤＣ’、‘ＥＳＣＡＰＥ’、‘ＥＮＤＯＦＢＬＯＣＫ’入力を処理する。
エンコーディング結果：アドレスを発生するためにレベルまたはランと共に加算されるように印加されるオフセットアドレス
１．３．２テーブル１５／ＭＰＥＧ−２
二テーブルが同一なラン、レベル及びＶＬＣコードを有するので、このテーブルの大部分はテーブル１４／ＭＰＥＧ−２を共有する。いくつかの特殊な場合において、‘１’カウントが適用される。このエンコーディングは‘ＲＵＮ’、‘ＬＥＶＥＬ’、‘ＦＩＲＳＴＤＣ’、‘ＥＳＣＡＰＥ’、‘ＥＮＤＯＦＢＬＯＣＫ’入力を処理する。
エンコーディング結果：オフセットアドレス及び‘１’カウント指示者
１．３．３テーブル１２／Ｈ．２６３
上述のごとく、このテーブルは非常に特殊である。これを取扱うために他のフォーマットを使用する。不幸にも、ＶＬＣコードを示すことに１２ビットが使用できない幾つかの例外がある。その例外はテーブル９に示したとおりである。この例外はＲＯＭテーブルを使用せず、特殊にエンコーディングされ得る。
【０２１４】
【表６９】
１２／Ｈ．２６３でエンコーディングの例外

【０２１５】
エンコードは‘ＲＵＮ’及び‘ＥＳＣＡＰＥ’入力を処理する。
エンコード結果：アドレスを発生するために、レベルまたはランと共に加算されるように印加されるオフセットアドレス
１．３．４モーションコード／マクロブロック増加分
この節は、テーブル１／ＭＰＥＧ−２、テーブル１０／ＭＰＥＧ−２、テーブル２−Ｂ．１／ＭＰＥＧ−１、テーブル２−Ｂ．４／ＭＰＥＧ−１、テーブル１／Ｈ．２６１、テーブル３／Ｈ．２６１及びテーブル１０／Ｈ．２６３を取扱う。
デコーディング部分で説明したとおり、このすべてのテーブルに対して、１つのＲＯＭテーブルと１つのＦＳＭととを共有することができる。ＲＯＭテーブルから得られるＶＬＣコードは、完全なＶＬＣコードを作るために符号／偶数ビットと結合しなければならない。従って、このエンコーディングＦＳＭで処理する入力値は、そのＬＳＢがフラクションビット(fraction bit)であるモーションコードに対する絶対値と、１ビットの右側にシフトされたマクロブロックアドレス増加分である。
【０２１６】
エンコーディングは‘ＳＴＵＦＦＩＮＧ’及び‘ＥＳＣＡＰＥ’を処理する。
１．３．５マクロブロックパターン
この節はテーブル９／ＭＰＥＧ−２、テーブル２−Ｂ．３／ＭＰＥＧ−１を取扱う。
アドレスはオフセットとパターン値を加算した値である。
１．３．６マクロブロックタイプ
この節は、テーブル２、３、４／ＭＰＥＧ−２、テーブル２−Ｂ．２／ＭＰＥＧ−１を取扱う。
１．３．７テーブル３、４／Ｈ．２６３（ＭＣＢＰＣ）
ピクチャータイプ、マクロブロックタイプ及びスタッフィング(stuffing)フラグに対する情報が、ＲＯＭテーブルアドレスオフセットを発生させるために提供される。アドレスはオフセットアドレスとＣＢＰＣの和である。
１．３．８テーブル２／Ｈ．２６１（ＭＴＹＰＥ）
アドレス発生器が非常に複雑で、具現に対して考慮する価値がない。
１．３．９ＣＢＰＹ
デコーディング部分に述べたように、イントラーピクチャーデータのみをエンコーディングする。ピクチャータイプがインターピクチャーである場合、データはまず反転されなけらばならない。
アドレスはオフセットとＣＢＰＹ値を加算した値である。
１．３．１０ＤＣＴＤＣサイズ
この節は、テーブル１２、１３／ＭＰＥＧ−２、テーブル２−Ｂ．５／ＭＰＥＧ−１を取扱う。
輝度及びクロマに対する数個のＶＬＣコードが同一であるので、これに対して数個のＲＯＭテーブルを共有する。オフセットアドレスを発生させるために、クロマフラグ及び数個のビット値が使用される。オフセットと実際値を加算することによってＲＯＭアドレスを得ることができる。
【０２１７】
１．３．１１デュアルプライム (dual prime) 及びモード
この節は、テーブル１１／ＭＰＥＧ−２及びテーブル７／Ｈ．２６３を取扱う。
この２個のテーブルは非常に簡単で小さくて、直接エンコーディングされ得る。
２．０ハードウェア説明
ＶＬＣエンコーディング／デコーディングに対するハードウェアは、‘ＶＬＣ’ブロックに含まれる。このブロックは３個のサブブロックを含む。このブロックはＲＯＭテーブルアドレスまたはデコーディング／エンコーディングデータを発生させるために適用される。‘ＶＬＣ−ＤＥＣ’はＶＬＣをデコーディングし、ＲＯＭテーブルアドレスを発生させるために使用される。‘ＶＬＣ＿ＥＮＣ’はＶＬＣをエンコーディングするためのブロックであり、ＲＯＭテーブルアドレスまたはＨ．２６３のＴＣＯＥＦテーブルのための特殊エンコーディングを発生させる。‘ＬＯＯＫＵＰ’はＲＯＭテーブル値または特殊エンコーディング値に基づいてＶＬＣデータを出力する。
２．１ＶＬＣデコーディングアドレス発生器
ＶＬＣ＿ＤＥＣの核心は、デコーディングＦＳＭである。このＦＳＭは入力情報をデコーディングしアドレス発生を制御する。ＦＳＭの入力及び定義は下記のとおりである。
＊ＺＥＲＯ＿ＯＮＥＣｏｕｎｔ（１５ビット）：０／１カウント値を提供する。
＊ＺＥＲＯ＿ＯＮＥＣｏｕｎｔ（４ビット）：０／１カウント値を提供する。２個の相互に異なるビットカウント信号を使用する目的は、入力データ共有により、ゲート注文者(gate customer)を減少させるためである。大部の場合、１５ビットが使用される。
＊ＯＮＥＣｏｕｎｔｅｎａｂｌｅ（１ビット）：‘１’カウント指示者
＊テーブルタイプ（６ビット）：テーブルタイプ
【０２１８】
【表７０】
ＶＬＣ＿ＤＥＣ＿ＦＳＭテーブルタイプフォーマット
＊モード（９ビット）：動作モード

【０２１９】
VLC_DEC FSMモードフォーマット
ビット8 ビット7 ビット6 ビット5 ビット4 ビット3 ビット2 ビット1 ビット0
H.263 仕様ピクチャータイプクロマ第１ DC テーブル15 MB INC
仕様及びピクチャータイプに対する定義は、ピン定義で説明する。
ハードウェアを簡素化し、ＲＯＭアクセス時間を確保するために、特殊なアルゴリズムがこのデコードＲＯＭテーブルアドレスを発生させるために使用される。この過程は下記のとおりである。
【０２２０】
段階１：オフセットアドレス(OFFSET)を発生させる。
段階２：４ビットシフト量(MASK_SHFT)を発生させ、この量と共に右側シフト１６ビットＦＩＦＯ＿ＤＡＴＡを発生させる。以降、４個の最下位ビット(FOL_DATA)を抽出する。
段階３：段階２から得られる４個ビットを反転させる。
段階４：段階３から得られるデータをマスク(MASK)するために、４ビットマスク信号を発生させる。
段階５：段階４の結果をオフセットアドレスと論理和させる。その結果はＲＯＭテーブルアドレスである。
この段階を結合させると下記のとおりである。
Address＝OFFSET ｜ (BITREVERSE(Bit(3〜0）of(FIFO_DATA>>MASK_SHFT))＆ MASK)
ＦＳＭの出力は下記のとおりである。
＊ＭＡＳＫ（４ビット）：マスタデータ
＊ＯＦＦＳＥＴ（９ビット）：ＲＯＭテーブルオフセットアドレス
＊ＭＡＳＫ＿ＳＨＦＴ（４ビット）：シフト量
＊ＳＩＺＥ（５ビット）：ＶＬＣサイズ
＊ＳＰＥＣＩＡＬ＿ＦＬＡＧ（３ビット）：デコーディングのための余分の情報
【０２２１】
【表７１】
ＶＬＣ＿ＤＥＣの特殊フラグの定義

【０２２２】
＊ＶＡＬＩＤ＿ＶＬＣ（１ビット）：有効ＶＬＣコードフラグ
＊ＨＩＧＨ＿ＤＡＴＡ＿ＩＮＤＩＣＡＴＯＲ（１ビット）：ＲＯＭデータの中の上位６ビットを抽出する。
入力ピン：
＊ＦＯＬ＿ＤＡＴＡ（４ビット）：シフトされたＦＩＦＯ＿ＤＡＴＡ（上述した段階２参照）
＊ＣＮＴ（４ビット）：０／１カウント
＊ＯＮＥ＿ＣＮＴ＿ＥＮ（１ビット）：‘１’カウント指示者
＊ＭＯＤＥ（１４ビット）：テーブルタイプ及び他の情報
定義は下記のとおりである。
【０２２３】
【表７２】
ＶＬＣ＿ＤＥＣでＭＯＤＥフォーマット

【０２２４】
仕様：００＝ＭＰＥＧ−１；０１＝ＭＰＥＧ−２；１０＝Ｈ．２６１；１１＝Ｈ．２６３；
ピクチャータイプ：００＝予約；０１＝イントラー；１０＝予測；１１＝両方向；
＊ＦＩＦＯ＿ＤＡＴＡ（１６ビット）：データはＶＬＣを含む。
出力ピン：
＊ＲＯＭ＿ＡＤＲ（１０ビット）：ＲＯＭテーブルアドレス
＊ＭＡＳＫ＿ＳＨＦＴ（４ビット）：ＦＩＦＯ＿ＤＡＴＡに対するシフト量（上述した段階２参照）
＊ＳＩＺＥ（５ビット）：ＶＬＣサイズ
＊ＳＰＥＣＩＡＬ−０（３ビット）：特殊フラグ（ＦＳＭ出力参照）
＊ＶＡＬＩＤ＿ＶＬＣ（１ビット）：有効ＶＬＣフラグ
＊ＨＩＧＨ＿ＤＡＴＡ（１ビット）：偶数フラグの符号として、ＶＬＣのＬ
ＳＢの抽出指示者
＊ＦＵＬＬ＿ＤＡＴＡ（１ビット）：ＤＣＴ係数デコーディング時に、ハイの完全な１２ビットデータ構造
＊ＴＡＢＬＥ（６ビット）：ＦＳＭ入力に定義される。
＊Ｔ＿ＭＯＤＥ（９ビット）：ＦＳＭ入力にＭＯＤＥに定義される。
２．２ＶＬＣ＿ＥＮＣ
ＶＬＣエンコーディングコア部分でのように、ＶＬＣ＿ＥＮＣは可変長さコードをエンコーディングする。この部分の出力は、ＲＯＭテーブルアドレスまたはＶＬＣの特殊エンコーディングである。１．０節の説明のとおり、エンコーディングデータ構造は、Ｈ．２６３でＴＣＯＥＦのいくつかの特殊な場合を除いては、１２ビットデータフォーマットに従う。たとえ、１０ビット加算器がＲＯＭテーブルアドレスを発生させるために使用されるが、ハードウェア観点から見ると、ＶＬＣ＿ＤＥＣ部分より更に簡単である。
ＶＬＣ＿ＤＥＣと同様に、この部分の核心はＶＬＣ＿ＥＮＣというＦＳＭである。他のＦＳＭ、ＥＮＣ＿ＳＰは、特殊エンコーディングのために使用される。
ＦＳＭＶＬＣ＿ＥＮＣの入力信号は、この部分の入力ピンと同一である。
＊ＬＡＳＴ（１ビット）：Ｈ．２６３のＴＣＯＥＦに対するＬＡＳＴ値
＊ＲＵＮ／ＶＡＬＵＥ（６ビット）：ＤＣＴ係数テーブルがエンコーディングの途中であれば、この入力はＲＵＮを意味し、そうでない場合一般的な値、すなわちパターンを意味する。
＊ＬＥＶＥＬ（６ビット）：ＤＣＴ係数レベル
＊ＳＰＥＣＩＡＬ＿ＦＬＡＧ（２ビット）：ＶＬＣ＿ＤＥＣ部分で定義された特殊フラグ
＊ＴＡＢＬＥ（６ビット）：ＶＬＣ＿ＤＥＣと同一
＊ＭＯＤＥ（９ビット）：ＶＬＣ＿ＤＥＣと同一
ＲＯＭアドレス発生は非常に簡単である。ＦＳＭはアドレスを発生させるために値（ラン）またはレベルまたは０に加えられるオフセットアドレスを提供する。このＶＬＣが同一サイズと‘０’カウントを有するので、特殊エンコーディングのためには、出力はコードで復元される２個の最下位ビットである。
【０２２５】
出力ピンは下記のとおりである。
＊ＯＮＥ＿ＣＮＴ＿ＦＬＧ（１ビット）：ＶＬＣ構造が‘１’カウントを使用することを知らせる。
＊ＳＩＧＮ＿ＥＮ＿ＢＩＴ：ＶＬＣ構造が符号／偶数ビットをＶＬＣＬＳＢに置くことを知らせる。
＊ＳＰＥＣＩＡＬ＿ＥＮＣＯＤＥ（１ビット）：特殊エンコーディングフラグ
＊ＶＬＣ（２ビット）特殊エンコーディングされたＶＬＣコードＬＳＢ
＊ＡＤＲ＿Ａ（１６ビット）：オフセットアドレス。上位６ビットは０である。
＊ＡＤＲ＿Ｂ（１６ビット）：アドレスのまた他の部分。上位１０ビットは常に０である。
２．３ルックアップ
この節は、ＶＬＣデータのエンコーディング／デコーディングを提供する。このブロックは下記のような状況を処理する。
＊規則的な１２ビットエンコーディング／デコーディングＲＯＭテーブル値出力
＊ビット上位／下位デコーディングデータ出力
＊特殊エンコーディングデータの復元
要求した通り、出力データ０で満たされる。
入力ピン：
＊Ｄ＿ＡＤＲ（１０ビット）：ＲＯＭアドレスをデコーディングする。
＊Ｅ＿ＡＤＲ（１０ビット）：ＲＯＭアドレスをエンコーディングする。
＊ＥＮＣＯＤＥ（１ビット）：１：エンコーディング；０：デコーディング
＊ＨＩＧＨ（１ビット）：上位６ビットフラグを抽出する。
＊ＥＮＡＢＬＥ（１ビット）：完全な１２ビットデータフラグ
＊ＶＬＣ（２ビット）：特殊エンコーディングコード
＊ＳＰＥＣＩＡＬ＿ＥＮＣＯＤＥ（１ビット）：特殊エンコーディングコード
出力ピン：
ＬＯＯＫＵＰ（１６ビット）：ＶＬＣコード
【０２２６】
【発明の効果】
以上述べたように、本発明はビットストリーム処理器では、多様なビットストリームが実時間的に同時にエンコーディングまたはデコーディングされるように文脈を貯蔵することができるので、多重データストリームを同時に処理することができる。また、ビットストリーム処理器が単一算術命令またはブール命令を遂行するためにプログラムされないようにすることにより、ビットストリーム処理器が高速で動作できる。
【図面の簡単な説明】
【図１】本発明によるメディアカードのブロック図である。
【図２】本発明によるマルチメディア処理器のブロック図である。
【図３】図２に示された処理器の一部のビットストリーム処理器のブロック図である。
【図４】本発明によるコンピュータシステムのブロック図である。
【図５】本発明によるコンピュータシステムのブロック図である。
【図６】本発明によるコンピュータシステムのブロック図である。
【図７】図２に示された処理器のファームウェア構造を示す図である。
【図８】図１のシステムのためのアドレスマップを示す図である。
【図９】図１のシステムのためのアドレスマップを示す図である。
【図１０】図２に示された処理器のＤＳＰコアを示すブロック図である。
【図１１】図２に示された処理器の一部のベクトル処理器に適用されたパイプラインを示す図である。
【図１２】図１１のベクトル処理器の機能的なブロック図である。
【図１３】図１１のベクトル処理器において実行データ経路を示す図である。
【図１４】図１１のベクトル処理器においてロード及び貯蔵データ経路を示す図である。
【図１５】図２の処理器のキャッシュシステムのブロック図である。
【図１６】図１５のキャッシュシステムにおいて命令データキャッシュを示す図である。
【図１７】図２の処理器においてキャッシュ制御ユニットのデータ経路パイプラインを示す図である。
【図１８】図２に図示したシステムにおいてキャッシュ制御ユニットのアドレス処理パイプラインのためのデータ経路を示す図である。
【図１９】図２の処理器においてステートマシンを示す図である。
【図２０】図２の処理器においてステートマシンを示す図である。
【図２１】図２の処理器においてステートマシンを示す図である。
【図２２】図２の処理器においてステートマシンを示す図である。
【図２３】図１５のキャッシュシステムで使用されたアドレスフォーマットを示す図である。
【図２４】図２の処理器においてバスを示す図である。
【図２５】図２の処理器において仲裁制御ユニットを示す図である。
【図２６】図２の処理器に対するタイミング図である。
【図２７】図２の処理器に対するタイミング図である。
【図２８】図２の処理器に対するタイミング図である。
【図２９】図２の処理器に対するタイミング図である。
【図３０】図２の処理器においてメモリリクエスト信号を示す図である。
【図３１】図２の処理器においてメモリリクエスト信号を示す図である。
【図３２】図２の処理器においてメモリリクエスト信号を示す図である。
【図３３】図２の処理器においてバス仲裁制御ユニットを示す図である。
【図３４】図２の処理器に対するタイミング図である。
【図３５】図２の処理器に対するタイミング図である。
【図３６】図２の処理器に対するタイミング図である。
【図３７】図２の処理器においてバスインターフェース回路を示す図である。
【図３８】図２の処理器においてバスインターフェース回路を示す図である。
【図３９】図１のシステムに対する仮想フレームバッファ（ＶＦＢ）を示す図である。
【図４０】図１のシステムに対する仮想フレームバッファ（ＶＦＢ）を示す図である。
【図４１】図１のシステムに対するバスインターフェースを示す図である。
【図４２】図２のシステムに対するメモリコントローラーを示す図である。
【図４３】図２のシステムに対するメモリコントローラーを示す図である。
【図４４】図２のシステムに対するアドレスコントローラーを示す図である。
【図４５】図１のシステムで使用するフォーマットを示す図である
【図４６】図１のシステムで使用するフォーマットを示す図である
【図４７】図１のシステムにおいてステートマシンを示す図である
【図４８】図１のシステムに対するデータコントローラーのブロック図である。
【図４９】図１のシステムに対するタイミング図である。
【図５０】図１のシステムに対するタイミング図である。
【図５１】図１のシステムに対するタイミング図である。
【図５２】図２の処理器において装置インターフェース回路を示す図である。
【図５３】図２の処理器において装置インターフェース回路を示す図である。
【図５４】図１のシステムの各部に対するブロック図である。
【図５５】図１のシステムの各部に対するブロック図である。
【図５６】図１のシステムの各部に対するブロック図である。
【図５７】図１のシステムにおいてレジスタを示す図である。
【図５８】図１のシステムにおいてレジスタを示す図である。
【図５９】図１のシステムにおいてレジスタを示す図である。
【図６０】図１のシステムにおいてフレームバッファ及びビデオウインドを示す図である。
【図６１】図１のシステムに対するタイミング図である。
【図６２】図１のシステムにおいてレジスタを示す図である。
【図６３】図１のシステムに対するタイミング図である。
【図６４】図１のシステムで使用するバッファを示す図である。
【図６５】図１のシステムで使用するバッファを示す図である。
【図６６】図１のシステムで使用するバッファを示す図である。
【符号の説明】
１００：メディアカード
１０５、１２２：バス
１１０：マルチメディア処理器
１１２：Ｄ／Ａ変換器
１１４：ＣＯＤＥＣ
１２０：メモリバス
２１０：スカラー処理器
２２０：ベクトル処理器
２３０：キャッシュサブシステム
２４０：ＩＯＢＵＳ
２４２：タイマー
２４３：ＵＡＲＴユニット
２４５：ビットストリーム処理器
２５０：ＦＢＵＳ
２５２、２５５：インターフェース回路
２５８：コントローラー
２９０：データ移動器
３１０：インターフェースユニット
３２０：ＳＲＡＭ
３３０：ＶＬＣＦＩＦＯユニット
３４０：ＶＬＣＬＵＴＲＯＭ
３５０：制御ステートマシン
３６０：ＢＰコアユニット[0001]
BACKGROUND OF THE INVENTION
The present invention relates to data processing by a computer, and more particularly to video data processing by a computer.
[0002]
[Prior art]
Usually, computers have been used to compress and decompress system data. The system data includes video data including stop and / or moving image images. The system data includes audio data, for example, a moving image soundtrack. It is preferable to provide a method and circuit capable of high-speed processing of video data.
[0003]
[Problems to be solved by the invention]
Accordingly, an object of the present invention is to provide a method and a circuit capable of high-speed processing of video data.
[0004]
[Means for Solving the Problems]
In some embodiments, a computer system according to the present invention includes three processors that can be operated simultaneously: a scalar processor, a vector processor, and a bitstream processor. In encoding or decoding video data, the vector processor performs operations efficiently performed by a single instruction multiple data (SIMD) processor. Examples of such operations include 1) linear data conversion such as discrete cosine transform (DCT), and 2) motion compensation. The bitstream processor performs operations including operations on specific bits rather than words or half-words. Examples of such operations include MPEG-1, MPEG-2, H.264, and the like. 261, H.H. Huffman and RLC encoding and decoding used in H.263. The scalar processor performs high level video processing (eg, picture level processing), synchronizes the operation of the vector and bitstream processors, and controls the interface of the external device.
In some embodiments, the computer system can process multiple data streams simultaneously. As a result, the user of the computer system can perform two or more meetings and image conferences. In the bitstream processor, contexts can be switched so that various bitstreams are simultaneously encoded or decoded in real time, so that multiple data streams can be simultaneously processed.
[0005]
In some embodiments, scalar and vector processors are programmable in that each processor can be programmed to perform a single arithmetic instruction or a boolean instruction. In view of the fact that bitstream processors cannot be programmed to perform single arithmetic or boolean instructions, they are unprogrammable. Rather, the bitstream processor can be programmed to perform an overall video data processing operation on a set of video data. By preventing the bitstream processor from being programmed to perform single arithmetic instructions or Boolean instructions, the bitstream processor can operate at high speed. By making the scalar and vector processor programmable, it is easy to adopt a modified system from the video data encoding and decoding standards.
[0006]
DETAILED DESCRIPTION OF THE INVENTION
FIG. 1 shows a media card 100 that includes a multimedia processor 110. In this embodiment, the multimedia processor 110 is a type MSP-1EX (trade name) processor manufactured by Samsung Semiconductor Co., Ltd. whose permanent residence is in San Jose, California. The processor MSP-1EX is described in Appendix A below.
The processor 110 communicates with a host computer system (not shown) through the local bus 105. In some embodiments, bus 105 is a 32-bit, 33 MHz PCI bus. Digital video data output from the processor 110 is coupled to a D / A (digital / analog) converter 112. In addition to the video portion, the digital video data can include an audio portion, such as a movie soundtrack. The output of the converter 112 may be coupled to a TV set (not shown) or other system that processes analog data. In some embodiments, the processor 110 includes an input port for receiving digital video data output from an A / D (analog / digital) converter (see FIGS. 4-6).
[0007]
The processor 110 is connected to a codec (CODEC) 114. The CODEC 114 receives analog audio data from a deep recorder (not shown) or other device. The CODEC 114 receives analog telephone data from a telephone line (not shown). The CODEC 114 digitizes the analog data and transmits it to the processor 110. The CODEC 114 receives digital data from the processor 110, converts this data into an analog form, and transmits this analog data as necessary.
The processor 110 is connected to the memory 120 by a bus 122. In FIG. 1, a memory 120 is an SDRAM (synchronous DRAM), and a bus 122 is a 64-bit, 80 MHz bus. In other embodiments, other memories, bus widths, and bus speeds are used. Asynchronous memory and buses are used in some embodiments.
Some embodiments of the card 100 are Le. It is described in a US patent application specification (patent attorney reference number: M-4364 US) having the name of the invention of “Multiprocessor Operation in a Multimedia Signal Processor” filed on the same date as this application, with Nguyen as the applicant. The entire contents of which are incorporated herein by reference.
[0008]
FIG. 2 is a block diagram according to one embodiment of the processor 110. The processor 110 includes a scalar processor 210, a vector processor (VP) 220, and a bit stream processor (BP) 245. In some embodiments, processor 210 is a 32-bit RISC processor that operates at 40 MHz and supports the well-known standard ARM7 instruction set. Vector processor 220 operates at 80 MHz and is a single instruction multiple data (SIMD) processor with a 288 bit vector register. One embodiment of VP220 is a United States patent application with the title of “Efficient Context Saving and Restoring in a Multitasking Computing System Environment” filed on the same date as this application, filed with Song et al. Patent attorney reference number: M-4365 US), the entire contents of which are incorporated herein by reference.

Processors

210, 220 can be programmed to perform single arithmetic instructions or Boolean instructions or sequences of these instructions.
[0009]
In some embodiments, the bitstream processor 245 is designed not to be programmed to perform single arithmetic or Boolean instructions in order to perform video data processing at high speed. In particular, the BP 245 cannot be programmed to perform a single instruction such as ADD, OR, “ADD AND ACCUMULATE”, etc. Instead, the BP 245 is programmed to perform the video data processing operations described in Appendix A, Chapter 10. At the same time, scalar processor 210 and vector processor 220 can be programmed to perform single arithmetic or Boolean instructions. Thus, the processor 110 can be modified from the video standard.
[0010]
As shown in FIG. 2, the scalar processor 210 and the vector processor 220 are connected to the cache subsystem 230. The cache subsystem 230 is connected to a bus (IOBUS; 240) and a bus (FBUS; 250). In some embodiments, IOBUS 240 is a 32-bit, 40 MHz bus and FBUS 250 is 64-bit, 80 MHz.
The IOBUS 240 is connected to a bit stream processor 245, an interrupt controller 248, a full-duplex UART unit 243, and four timers 242. The FBUS 250 is coupled to a memory controller 258 that is coupled to the memory bus 122 (see FIG. 1). The FBUS 250 is connected to a PCI bus interface circuit 255 connected to the PCI bus 105. The FBUS 250 also includes a device interface circuit 252 (including a circuit for interfacing the video D / A 112 (see FIG. 1), the CODEC 114 and the video A / D converter (similar to those shown in FIGS. 4 to 6). Also called “Customer ASIC”. The processor 110 also includes a memory data mover 290.
[0011]
The processor 110 can process multiple data streams simultaneously. For example, when a user of the processor 110 has two or more meetings and an image conference, the processor 110 performs video and audio processing so that the user can view a large number of meetings. In order to process multiple video data streams, the processor 110 supports context switching. This means that the BP 245 switches between multiple data streams. In an image conference, each data stream can also be sent from a separate meeting that is far away. As an alternative, an additional data stream can be sent from the movie channel so that the user can attend a video conference and simultaneously watch a video conference or movie show. Context switching is described in Example 10.12. When the context is switched, scalar processor 210 stores the current context and initializes BP 245 to process other contexts.
[0012]
BP245 is a video data format as follows:
1. MPEG-1 as described in ISO / IEC standard 11172 (1992);
2. MPEG-2 as described in the document ISO / IEC JTC 1 / SC 29 N 0981 Rev (March 31, 1995);
3. H.264 described in “ITU-T Recommendation H.261” (March 1993). 261; and
4). H.264 described in "Draft ITU-T Recommendation H.263" (May 2, 1996). 263
Can be processed.
[0013]
The video data is processed by being divided into a scalar processor 210, a vector processor 220, and a bit stream processor 245, thereby realizing high-speed processing. More specifically, the vector processor 220 performs linear transformation (DCT or inverse DCT) and motion compensation. Such an operation is suitable for a vector processor. This is because these operations sometimes require the same instruction to be performed on different parts of the data. The bitstream processor 245 performs Huffman decoding / encoding and zigzag bitstream processing. The scalar processor 210 performs video and audio demultiplexing and synchronization and I / O interfacing operations.
Examples of encoding and decoding operations appear in Section 10.6.1 and 10.6.2 of Example 1. In the encoding operation, uncompressed digital data arrives from the frame memory 120 or host system (not shown) through the bus 105. In some embodiments, device interface circuit 252 includes a video A / D converter, and uncompressed data arrives from the converter. The vector processor 220 performs quantization, DCT, and motion compensation. The bit stream processor 245 receives the output of the VP 220 and generates a GOB (Group of Blocks) and a slice. In particular, the BP 245 performs Huffman and RLC encoding and zigzag bitstream processing. The scalar processor 210 receives the output of the BP 245 and performs picture layer coding, GOP (group of pictures) coding, and sequence layer coding. The scalar processor 210 then multiplexes the audio and video data and transmits the encoded data to the storage device or network through the bus (105 or 122). Transmission to the network includes transmission to a device interface circuit 252 coupled to the network of some embodiments.
[0014]
In decoding, the process is performed in reverse. The scalar processor 210 demultiplexes system data into video and audio components, and performs sequence hierarchy, GOP, and picture layer decoding of the video data. As a result, the generated GOB or slice is supplied to the bit stream processor 245. The processor 245 performs zigzag processing, Huffman and RLC decoding. The VP 220 receives the output of the BP 245 and performs inverse quantization, IDCT, and motion compensation. The VP 220 performs any pre-processing if necessary (eg, when flattening the edges of a picture image) and provides the reconstructed digital picture to the device interface circuit 252 or storage device. The scalar processor 210, the vector processor 220, and the bitstream processor 245 can operate on many blocks of data in parallel.
The scalar processor 210 processes the picture layer and the upper layer, thereby reducing communication inside the processor. This is because the picture layer and higher layers contain information that is used by the scalar processor 210 for control and I / O functions but not by the vector processor 220 and the bitstream processor 245. . An example of such information is the frame rate used by the scalar processor 210 to transmit the frame to the device interface circuit 252.
[0015]
FIG. 3 is a block diagram according to one embodiment of the bitstream processor 245. The signals shown in FIG. 3 are described in Section 10.5 of Example 1. This signal provides an interface between the bitstream processor 245 and the IOBUS 240 (see FIG. 2). In the BP 245, these signals are processed by the IOBUS interface unit 310 including the SRAM 320. The BP 245 includes a VLC FIFO unit 330, a VLC LUT ROM 340, a control state machine 350, and a BP core unit 360 including a register file and SRAM. The blocks in FIG. 3 are described in Section 10.4 of Example 1. ROM 340 has four standards: MPEG-1, MPEG-2, H.264. 261 and H.H. 263 includes a lookup table used in Huffman encoding and decoding. Despite the huge amount of information stored in the table, ROM 340 has a small size of 768 * 12 bits. The small size shares the table and is realized by other techniques as described in Section 4 of Example 1.
[0016]
While the invention has been illustrated and described in connection with certain preferred embodiments, the invention is not limited thereto and various modifications and changes may be made thereto without departing from the spirit and field provided by the claims. It can be easily understood by those having ordinary knowledge in the art. In particular, the present invention is not limited by any circuit, clock rate or timing of these embodiments.
[0017]
[Example 1]
MSP-1EX system specifications
Chapter 1 Technical Overview
This chapter provides a technical overview of the multimedia signal processor ("MSP-x") that hardware and software designers show.
1.1function
Multimedia signal processors (MSP-x) form a group of single-chip VLSI devices to provide a wide range of integrated functions for personal computer and orderer product applications.
The MSPs are based on a powerful vector processor structure that applies a single instruction multiple data (SIMD) model for computation for optimal cost / performance. Its characteristics are as follows.
* Complete programmability
* Based on ARM instruction set structure.
* Integrated 40MHz ARM7 RISC CPU core
* 80MHz vector processor for high-performance digital signal processing
* 2.56 Gops for 9-bit integer ALU operation
* 16-bit integer multiplication-2.56 Gops for cumulative operation
* 640M folps for 32-bit IEEE floating point addition
* 1280Mflop for 32-bit IEEE floating point multiplication & addition
* Unused 10 Kgates for selective ordering or graphics capabilities
* Based on 0.65 μm 3.3v / 5v CMOS technology.
* 128-pin-128-pin package
MSP initially supports four main functions.
*video
* Audio / Sound
* Long distance communication
* 2D / 3D graphics (selection)
1.1.1video
* All functions can be programmed with firmware.
* Real-time MPEG-1 decoding and encoding
* Real-time MPEG-2 decoding
* Nearly real-time MPEG-2 encoding
* Real time 324 decoding and encoding
* Image scaling for any screen size or resolution
* Color space conversion between RGB and YUV
* Image filtering for picture contour enhancement and noise reduction
* 4/3 full down conversion
1.1.2Audio / Sound
* All functions can be programmed with firmware.
* Real-time MPEG-1 audio decoding and encoding
* Real-time MPEG-2 audio decoding and encoding
* Real time 320. 324 audio decoding and encoding
* Real time G. 728 and G.G. 723 voice coding
* Real-time sound blaster emulation
* Wave table synthesis
* FM synthesis
1.1.3Telecommunication
1.1.3.1modem
* Standard asynchronous COM port interface (compatible with NS 16550A UART)
* V.V from 28.8K to 2.4Kbps. 34
* CCITT-V.4 with data rates for 4800, 9600 uncoded and 9600 bps trellis coded. 32 bis
* Compatibility of Hayes AT command set
* Call progress monitor
* V. 25bis auto dial
* DTMF and pulse dialing
* Asynchronous error recovery protocol
* V. 42 error correction
1.1.3.2facsimile
* 9600 bps or 7200 bps V.P. 29
* V.V of 4800 bps or 2400 bps. 27
* Call progress monitor
* DTMF and pulse dialing
* G3 transfer and others
* T. 4 / T. 30 movements
1.1.3.3Telephone response
* Recording greetings through a telephone set or microphone
* Automatically answer incoming calls and respond to pre-recorded messages
* Record messages from the other party
* Play messages left by the other party
1.1.4.2D / 3D graphics (Choice)
* BITBLT
* 2D line & polygon drawing and shading
* Geometries and lighting calculations for 3D points, lines and triangles
* 3D color calculation with texture mapping
* Blending
[0018]
1.2Hardware structure
1.2.1Overview
The MSP-1 multimedia coprocessors are designed to meet a variety of requirements, including integration levels, cost and performance. A block diagram including the MSP-1 processor is as shown in FIG.
The MSP-1 group performs the following pin-out options.
* MSP-1 is designed to be used as an entry level without using external SDRAM.
* MSP-1EX includes 32-bit memory for interfacing with external SDRAM.
* MSP-1F includes 64-bit memory for interfacing with external SDRAM.
* MSP-1G includes an integrated SVGA controller and a RAMDAC with accelerated 3D graphics acceleration.
FIG. 5 is a block diagram of a system including an MSP-1E processor.
1.2.2External codec
FIG. 6 is a block diagram of a system that includes an MSP-1 processor with an external codec.
[0019]
1.2.2.1Material list of MSP-1EX
The following is a material inventory presented for MSP-1EX.
* MSP-1EX
* 512K x 32-bit synchronous DRAM
* NTSC / PAL encoder (KS0119 from Samsung)
* Audio & long-distance communication CODEC (AD1843 from Analog Devices)
* Others (capacitors, resistors, amplifiers, connectors, etc.)
* Printed circuit board
[0020]
1.3Micro structure
1.3.1Overview
Basically, the MSP microstructure is composed of a very powerful DSP core and a memory & I / O subsystem defined by the orderer (see FIG. 2). The DSP core includes the following.
* 32-bit ARM7 RISC CPU that operates at 40 MHz and is used for general processing
* Vector processor that operates at 80MHz and is used for signal processing
* A shared cache subsystem that operates at 80MHz and has a 2KB instruction cache, a 5KB data cache, and a 16KB ROM cache. The data cache can be controlled by hardware or software.
* High-speed 64-bit bus (FBUS) that operates at 80MHz and interfaces with many internal FBUS peripherals
* Low speed 32-bit bus (IOBUS) that operates at 40MHz and interfaces with many internal IOBUS peripherals
Internal FBUS peripherals include:
* 32-bit 33 MHz PCI bus interface
* 64-bit SDRAM memory controller
* 8-channel DMA controller
* Orderer ASIC logic block. The orderer ASIC logic block provides a total of 10 Kgates including interfaces to various analog codecs and the I / O devices defined by the orderer. The interface logic supports the Samsung KS0119NTSC encoder and the Analog Devices AD1843 codec.
[0021]
* Memory data mover used to DMA data from host memory to MSP local SDRAM memory
* Bitstream processor for processing video bitstreams
* 16450 UART serial line
* 8254-compatible timer
* 8259-compatible interrupt controller
MSP also includes special registers (MSP control registers) used for software controlled initialization and interrupts.
[0022]
1.4MSP-1EX pin description
1.4.1Total: 256 pins
1.4.2PCI bus interface (53 pins)
CLK Clock input pin
RSTL Input pin reset, acti blow
AD [31: 0] Address and data bus pins
C_BE0L Control & byte 0 enable pin, acti blow
C_BE1L Control & byte 1 enable pin, acti blow
C_BE2L Control & byte 2 enable pin, acti blow
C_BE3L Control & byte 3 enable pin, acti blow
PAR parity pin
FRAMEL cycle frame pin, acti blow
IRDYL initiator preparation pin, activ blow
TRDYL Target preparation pin, acti blow
STOPL Stop transaction pin, acti blow
LOCKL Lock transaction pin, acti blow
IDSEL initialization device selection input pin
DEVSEL device selection pin, acti blow
REQL bus request pin, acti blow
GNTL bus approval pin, activ blow
PERRL Parity error pin, active blow
SERRL system error pin, acti blow
INTAL interrupt pin, actibro
1.4.3Other (6 pins)
TCK JTAG test clock input pin
TDI JTAG test data input pin
TDO JTAG test data output pin
TMS JTAG test mode selection input pin
TRSTL JTAG test reset input pin
CLK Clock input. This is a 40 MHz clock input pin.
1.4.4KS0119 NTSC / PAL encoder interface (24 pins)
Frame synchronization output to KS0119 for SFRS 3-wire host interface
Serial clock output to SCLK KS0119
SDAT Serial data I / O
Horizontal sync signal input to BGHS MSP
Vertical synchronization signal input to BGVS MSP
MSSEL Master selection
Pixel data output to PD [15: 0] KS0119
Pixel clock output to BGCLK KS0119
PROMCSL BIOS PROM chip selection
[0023]
1.4.5AD1843 audio & telecommunications codec interface (6 pins)
A43SCLK Serial clock input / output. SCLK is a bidirectional signal that provides the clock as an output to the serial bus when the bus master (BM) pin is driven HI and accepts the clock as an input when BM IN is driven LO.
A43SDFS Serial data frame synchronous input / output. The SDFS is a bidirectional signal that provides a frame synchronization signal as an output to the serial bus when the bus master (BM) pin is driven HI, and accepts the frame synchronization signal as an input when the BM pin is driven LO.
Serial data input to AD1843 output from A43SDI MSP. All control and playback transfers are 16-bit MSBs.
Serial data output output from A43SDO AD1843 and input to MSP. All status & control register read and playback transfers are 16 bit long MSBs.
[0024]
1.4.6Memory bus interface (87 pins)
RAS1L output pin (active blow). This is a row address strobe that latches the row address from MA [11: 0] into the internal row address buffer of the selected SDRAM bank.
CAS 1L output pin (acti blow). This is a column address strobe that latches the column address from MA [11: 0] into the internal column address buffer of the selected SDRAM bank.
MWEL output pin (active blow). This is a write enable for the SDRAM.
MAI [11: 0] Output pins. Row and column address signals multiplexed to the SDRAM.
MD [63: 0] I / O SDRAM data pins
MA23 Output pin. Memory address bit <23>
MA24 Output pin. Memory address bit <24>
DQM output pin. After the clock, the SDRAM data is set to high impedance and the output is masked. (This pin is used only for synchronous DRAM interface.)
MCKE output pin. The SDRAM system clock is masked to stop the operation from the next clock cycle.
MCS0L Output pin (acti blow). SDRAM chip selection for lower 32 bits
MCS1L output pin (acti blow). SDRAM chip selection for upper 32 bits
MR. DYH Output pin. SDRAM preparation signal
MEMCLK output pin. This is the clock output pin for the SDRAM.
1.4.7Power supply
VDD 3.3 volt power supply pin
VCC 5 volt power pin
VSS Ground pin
MSP-1EX pin designation
[0025]
[Table 1]

[Table 2]

[Table 3]

[Table 4]

[Table 5]

[Table 6]

[Table 7]

[Table 8]

[0026]
1.5Firmware structure
1.5.1Overview
MSP offers many powerful and open application environments through highly optimized binding and system management functions (executed by ARM7) of vectorized DSP firmware libraries (executed by vector processors) provide.
MSP separates signal processing development and host application development to provide scalable performance, cost-effective multimedia & communication, convenient use and easy handling, etc. It also reduces application development and maintenance costs.
1.5.2Firmware structure
The MSP firmware system structure is as shown in FIG. The shaded area represents the MSP system element and the remaining margin represents the underlying PC application and operating system.
1.5.2.1MOSA (Multimedia operating system structure)
MSP's real-time operating system kernel is called "MOSA", which is a subset of Microsoft's real-time kernel MMOSA.
MOSA is a real-time, robust, multitasking capable, preemptive operating system that is utilized for multimedia applications implemented on MSP. This performs the following main functions:
* Interfacing between host window 95 and windows NT
* Downloading application firmware selected from the host
* Scheduling MSP tasks for execution on ARM7 and vector processors
* Management of all MSP system resources including memory & I / O devices
* Communication synchronization between MSP tasks
* Reporting MSP related interrupts, exceptions and status conditions
MOSA operates exclusively on ARM7.
For more details, refer to the MMOSA real-time kernel specification.
1.5.2.2Multimedia library module
The multimedia library module provides a board-range module that performs functions such as data compression, MPEG video & audio, audio coding and synthesis, sound blaster compatible audio, and the like. Each module is optimized in an MSP environment and designed to perform in a multitasking environment.
[0027]
1.5.3Telecom library
1.5.3.1Overview
With appropriate DSP firmware, MSP can be used to support intercepted voice applications, answer incoming phone calls, and store messages on the hard disk. The system speaker can also use a microphone to service a half-duplex speakerphone. The incoming and outgoing phone calls are sensed and used in the system. Also, the phone progress tone can be heard through the selected telephone handset, system speaker, stereo headphones or audio output channel under program control.
[0028]
1.6Programming model
1.6.1Overview
From a hardware perspective, MSP is a single-chip solution that includes two CPUs and a number of integrated peripherals. From a software perspective, MSP is a high performance digital signal processing (DSP) device that resides on the PCI bus.
Control of the MSP by the host CPU can be realized by any one of the following.
* Read / write MSP control & status register through PCI bus or
* Shared data structure in host system memory
* Shared data structure in MSP local memory
The execution of the MSP program always starts with the ARM7 CPU, which can in turn initialize the first dependent execution stream in the vector processor. Control synchronization between the ARM7 CPU and the vector processor is accomplished by any ARM7 coprocessor instructions (STARTVP, INTVP, TESTVP) and special instructions (VJOIN, VINT) in the vector processor. Data transmission between the ARM7 CPU and the vector processor may be performed by a data movement command executed in the ARM7.
The ARM7 CPU is generally responsible for the host interface, resource management, and I / O device processing as well as most interrupt and exception processing. The vector processor is responsible for all digital signal processing and any special interrupts such as coprocessor interrupts (generated by the vector processor in ARM7) and hardware stack overflow (in the vector processor).
The MSP also includes many peripheral devices integrated for interfacing with various I / O devices. All peripheral addresses are memory mapped and can therefore be accessed with standard memory load and store instructions (either by an ARM7 CPU or one of the vector processors).
[0029]
1.6.2Power application, reset & initialization
After power is applied, the MSP automatically enters a self-test sequence to confirm the function accurately. The self-test sequence includes:
* Initialization of all internal MSP registers
* Perform self-test diagnosis of semiconductor chips to confirm all elements of MSP
And the self-test sequence is expected to last until near <tds> seconds. At the end of the self-test sequence, the MSP prepares to perform MSP firmware, including:
* MSP initialization software loading and execution
* MSP real-time operation system overloading and execution of kernel MMOSA
MSP supports the following three types of reset.
* Hardware control system reset via PCI bus
* Software control system reset by PCI system reset bit in MSP control register
* Vector restart in MSP control register, software control restart by bit (restart)
[0030]
1.6.3PCI array register
An I / O device for the PCI bus, MSP is defined in PCI Rev2.1 and includes a set of configuration registers as shown in Table 9.
[0031]
PCI array register
[Table 9]

[0032]
1.6.3.1Device & Vendor Identifier Register
See PCI bus specification Rev2.1 for more details.
1.6.3.2Status & command register
See PCI bus specification Rev2.1 for more details.
1.6.3.3Class code & calibration identifier register
See PCI bus specification Rev2.1 for more details.
For MSP-1EX, the class code is defined as 03 and the subclass is 0.
1.6.3.4Other registers
See PCI bus specification Rev2.1 for more details.
1.6.3.5MSP base address register (MSP BASE)
This register stores the base address for the MSP device. This address is entered by the host system software (Windows 95 / NT) and is used by the MSP hardware to address the memory.
1.6.3.6VFB base address register
This register stores the base address for the VGA virtual frame buffer. This address is entered by the host system software (Windows 95 / NT) and used by the MSP hardware to emulate the VGA frame buffer.
[0033]
1.6.3.7Extended ROM base address
See PCI bus specification Rev2.1 for more details.
1.6.3.8Interrupt line register
See PCI bus specification Rev2.1 for more details.
1.6.4ARM7 CPU
The ARM7 RISC CPU is an MSP master processor, which includes a 32-bit data path and has a standard ARM7 instruction set structure. ARM7 also includes special coprocessor instructions to interface with the vector processor.
1.6.5Vector processor
The vector processor is an MSP DSP engine that includes a 288-bit data path and acts as a coprocessor to ARM7. Such functions are described in the vector processor structure document.
The vector processor 220 operates at 80 MHz and has a 6 stage pipeline: fetch, decode, issuer, register access, execute and write. Including. This is optimized for DSP related processing.
[0034]
1.6.6Virtual memory management
MSP-1EX does not support virtual memory management.
1.6.7Interrupt & execution processing
In MSP, interrupt & execution processing is usually performed by ARM7.
All internal I / O device interrupts enter the internal 8254 interrupt controller to prioritize between them and send the highest priority interrupt to ARM7 for further processing.
1.6.8Physical memory address map
The ARM7 and vector processor program shows all MSP input / output devices memory mapped by physical memory as shown in FIG.
The MSP address map shown by ARM7 (or vector processor) starts at 0 and extends to 4GB.
Addresses in the region from 2 GB to 4 GB are mapped to host (Pentium) PCI addresses from 0 to 2 GB according to the following relational expression.
Host PCI address: = ARM7 address−8000 0000 (in hex) With this mapping, ARM7 (or vector processor) uses 2GB to 4GB address to access host PCI memory address from 0 to 2GB. Can be used. ARM7 cannot access 2GB or more host PCI memory address.
Further, the host (Pentium) program indicates all input / output devices that are memory-mapped according to a somewhat limited physical memory as shown in FIG.
When looking from the host (Pentium)
* MSP_BASE is the beginning of the MSP address map.
* MSP_BASE + 7DFFFFF is the end of the MSP address map.
* The MSP address map is defined only in the 128 MB range.
[0035]
MSP I / O device address map
[Table 10]

[0036]
1.6.9MSP host control register
MSP-1EX includes special registers used for initialization and interrupts by the host (Pentium processor).
[0037]
[Table 11]
MSP control register definition

[0038]
bit <0> PCI system reset. This bit is used by the host (PENTIUM) to fully reset the entire MSP system hardware, including all MSP related internal / external I / O devices. After resetting the PCI system, the MSP processes a standard reset sequence that includes all self-test diagnostic runs on the chip for ARM7, vector processor and I / O actions. Such a reset has the same effect as a hardware system reset.
bit <1> Restart of ARM7 & vector processor. This bit is used by the host (PENTIUM) to restart ARM7 and the vector processor. This restart is distinguished from a complete PCI system reset in the sense that the MSP does not process any normal reset sequence and does not perform any self-test diagnostics on the chip. When this bit is set, ARM7 starts executing from address 0 and the vector processor enters idle mode. At this time, any internal or external I / O devices are not affected.
bit <2> MSP interrupt request from PENTIUM. This bit is used by the host (PENTIUM) to directly interrupt the SP and is linked to any one of the inputs of the internal 8259 programmable interrupt controller (PIC) used to interrupt ARM7. The This bit is set by the host (PENTIUM) and cleared by ARM7.
bit <3> PCI host interrupt recognition. This bit is used by the host (PENTIUM) to recognize the PCI host interrupt request generated by the MSP. This bit is set by the host (PENTIUM) and cleared by ARM7.
bit <31: 4> Reservation
[0039]
1.6.10MSP ARM7 control register
MSP-1EX has special registers that are used to interrupt the host by the ARM7 processor.
[0040]
MSP ARM7 control register definition

[0041]
bit <0> PCI host interrupt from MSP. This bit is used by the MSP to interrupt the host through active confirmation of the PCI INTA # pin on the PCI bus. This bit is set by ARM7 and cleared by the host (PENTIUM) through the PCI bus.
bit <31: 1> Reserved
1.6.11.MSP internal μROM
The internal ROM consists of a total of 16 Kbytes and includes the following.
* ΜROM initialization software
* Self-test diagnostic software
* Various system management software
* Various library subroutines
* Cache for instruction and data constants
The address map is as shown in the following table.
[0042]
Internal μROM address map
[Table 12]

[0043]
[Table 13]

[0044]
1.6.12MSP internal SRAM
The internal SRAM performs a cache or local memory function depending on selections determined by the MSP Vector & Control & Status Register (VCSR).
In the local memory mode, the address space starts at location <MCP_BASE>: 040 0000 and is mapped to the internal SRAM part.
1.6.13Peripheral device inside MSP
The MSP also has a number of peripheral devices that reside on two internal buses: an Fbus operating at 64 bits and 80 MHz and an IObus operating at 32 bits and 40 MHz.
The device on Fbus includes:
* Memory controller for external synchronous DRAM
* Virtual frame buffer interface
* PCI bus controller for external PCI bus
* Customer ASIC interface
* 8-channel DMA controller
* Memory data mover (for data transmission between host memory and SDRAM)
* KS0122 CODEC serial line
* KS0119 CODEC serial line
* AD1843 CODEC serial line
On the other hand, the device on IObus includes the following.
* 8254-Compatible programmable interval timer
* 8259-Compatible programmable interrupt controller (8 levels)
* 16450-compatible UART serial line
* Bitstream processor for MPEG bitstream decoding & encoding
The register address map of such peripheral devices is as shown in the table.
[0045]
[Table 14]

[Table 15]

[Table 16]

[Table 17]

[0046]
Internal peripheral device register address map
1.6.14IOBUS peripheral device
1.6.4.14.18254-compatible programmable interval timer
The MSP includes a standard 8254-compatible programmable interval timer for use as software having the following functions.
* Has 3 independent 16-bit counters.
* Supports 6 programmable counter modes.
All counters are programmed with an entry in the control word register and an initial count.
* Control word register
This register has various control information for the timer. The bit definition of this register is as shown in the table.
[0047]
[Table 18]
Control word register

[0048]
* Status register
This register has status information for the timer.
*

Counter

0, 1, 2
These three registers are elements that are mainly counted by a timer. Each counter has a 16-bit width, can be preset, and counts down with each binary number in BCD mode. The input, gate and output of this register are configured by the selection of MODES stored in the control word register. The three counters are completely independent.
[0049]
1.6.1.14.28259-compatible programmable interrupt controller Roller (PIC)
The MSP programmable interrupt controller is a very common standard 8259 in all x86-based personal computers, and its functions include:
* Support 8 priority levels.
* Programmable interrupt mode
* Individual request mask ability
In MSP-1EX, eight levels of interrupt inputs are assigned to various I / O devices as follows.
* Level 0 (highest) is assigned to the 8254 timer.
* Level 1 is assigned to the virtual frame buffer (VFB).
* Level 2 is assigned to the customer ASIC logic block containing the DMA controller.
* Level 3 is assigned to the bitstream processor.
* Level 4 is assigned to the PCI bus interface.
* Level 5 is assigned to <tbd>.
* Level 6 is assigned to <tbd>.
* Level 7 is assigned to 16550 UART.
The output of the interrupt controller is coupled to the interrupt request line (nFIQ) of the ARAM7 RISC CPU.
* Register description
There are three 8-bit registers used to initialize the operation of the PIC as follows.
* Initialization command word 1 (ICW1)
* Initialization command word 2 (ICW2): Not used for MSP-1EX.
* Initialization command word 3 (ICW3): Not used for MSP-1EX.
* Initialization command word 4 (ICW4)
There are also three 8-bit registers used to control the PIC operation as follows.
* Operation control word 1 (OCW1)
* Operation control word 2 (OCW2)
* Operation control word 3 (OCW3)
All these registers are specially encoded in both the address part (bit <0>) and the data part. For more details, refer to the standard 8259 specification.
[0050]
8259 Register Description
[Table 19]

[0051]
1.6.4.14.316450-compatible UART serial line
The MSP includes a 16450-compatible UART serial line that is used as an interface to an external serial I / O device. For more details, refer to the standard 16450 specification.
1.6.14.4Bitstream processor
A bitstream processor is a specialized logic block that processes video bitstream data, and this functionality includes:
* Variable length Huffman decoding and encoding
* Unpacking and packing of video data in zigzag storage format
* Various bit-level processing
The bitstream processor operates as a simultaneous processing unit and is controlled by software by a vector processor or ARM7. For more details, refer to the bitstream processor section.
1.6.15FBUS peripheral device
The FBUS peripheral devices are as follows.
* Customer ASIC logic interface
* 8-channel DMA controller
* Video encoder serial line interface to Samsung's KS0119
* Audio & Telecom Serial Line Interface to Analog Devices AD1843
1.6.5.1ASIC interface Logic interface
This section contains the interface logic for all external CODECs and customer-defined ASIC logic blocks. All of these blocks are implemented in hardware and do not have a program-visible register. Refer to the ASIC interface part for more details.
1.6.15.2DMA controller
MSP-1EX includes a DMA controller on a chip having the following functions.
* 8 independent DMA channels
* Enable / disable control for individual DMA channels
* IO device for memory transfer or reverse transfer
* Address increase and decrease
For more details, refer to the ASIC interface part.
[0052]
1.6.5.3Memory data mover
MSP-1EX also includes a special memory data mover. This memory data mover is used to move data between host (PENTIUM) memory and MSP local SDRAM memory. The memory data mover is basically a special DMA controller including the following registers.
* MSP current address register: This 32-bit register defines the SDRAM memory address early in the memory data transfer. This register can be entered or read by ARM7 and the initial value must be loaded by ARM7. The address is incremented based on the data transfer size.
* Host current address register: This 32-bit register defines the host memory address early in the memory data transfer. This register can be entered or read by ARM7 and the initial value must be loaded by ARM7. The address is incremented based on the data transfer size.
* MSP stop address register: This 32-bit register defines the SDRAM memory address at the end of the memory data transfer. This register can be entered or read by ARM7 and used in comparison with the MSP current address register. If they match, the memory data mover generates an MSP End-Of-Process signal.
Host stop address register: This 32-bit register defines the host memory address at the end of the memory data transfer. This register can be entered or read by ARM7 and used in comparison with the host current address register. If they match, the memory data mover generates a host End-Of-Process signal.
Status register: This register contains status information associated with the memory data mover. The bit encoding is as follows.
<0>: MSP EOP. This bit determines whether the memory data mover has reached the MSP stop address. If ARM7 initializes the source current address register, ARM7 is reset to 0080 0000 (hex). This bit is only read by ARM7 and should not be filled.
<1>: HOST EOP. This bit determines whether the memory data mover has reached the stop address of the host. If ARM7 initializes the host current address register, ARM7 is reset to 8000 000 (hex). This bit is only read by ARM7 and should not be filled.
Control register: This register contains information associated with the memory data mover. This bit encoding is as follows.
<0>: Direction. This bit determines the direction of data transfer. When this bit is “0” (default), the direction of data transfer is from the host (PENTIUM) memory to the MSP SDRAM memory, and when this bit is “1”, the direction of data transfer is from the SDRAM to the host memory. . This bit must be filled in by ARM7.
<1>: Interrupt enable. This bit determines whether the memory data mover interrupts ARM7 at the end of the data transfer. This bit must be filled in by ARM7.
<2>: DMA enable. This bit enables the memory data mover to operate. This bit must be filled in by ARM7.
<3>: Data transfer size. When this bit is “0” (default), the data transfer size of each memory is 32 bytes, and when it is “1”, it is 64 bytes. This bit must be filled in by ARM7.
[0053]
1.6.15.4KS0119 video encoder serial line interface -Face
The KS0119 video encoder serial line interface includes:
* Double-buffer receive data buffer register containing read data from codec
* Double-buffer transmission data buffer register including codec entry data
* Control & status register including various control & status information for serial line
[0054]
[Table 20]
KS0119 Video Encoder Serial Line Interface Register

[0055]
The bit encoding of the control & status register is as follows.
bit <0>: Full state of received data. This bit is set when the serial line receives 8-bit data from KS0119CODEC. If interrupt enable (bit <7>) is set, an interrupt request is also generated in ARM7.
bit <1>: The transmission data buffer is empty. This bit is set when the serial line is prepared to send data to KS0119. If interrupt enable (bit <7>) is set, an interrupt request is also generated in ARM7.
bit <7>: Interrupt enable. This bit is used to enable ARM7 to request an interrupt.
1.6.15.5AD1843 audio & telecom serial line interface
The AD1843 serial line interface includes:
* A set of double-buffered registers containing data read from the codec
* A set of double-buffered registers containing the data to be entered into the codec
* Control & status register including various control & status information for serial line
For more details, refer to the AD1843 codec interface portion.
1.6.16Instruction performance
Table 21 shows the instruction performance in a vector processor cycle count where each cycle is 12.5 ns. Assume that the external memory bus width is 64 bits and has a 40 MHz page mode clock. All instruction performance is given in 32-byte vector mode. The rules are as follows:
* Ras: The number of cycles required for the external memory to make the first access. Generally 75ns or 6 cycles are required.
* Latency: the number of samples to execute the first instruction.
* Rate: the number of cycles that exist between similar consecutive instruction executions.
If the latency is the same as the rate, only one number is used.
[0056]
Instruction execution performance
[Table 21]

[Table 22]

[Table 23]

[0057]
Chapter 2 DSP Core
This chapter describes the specifications of the DSP core that hardware and software designers are showing.
2.1Overview
The DSP core is a fundamental element in MSP and is solely responsible for all operations. This DSP core is configured as follows.
* A 32-bit ARM7 RISC CPU that operates at 40 MHz and is used for general-purpose data processing, such as real-time OS, interrupt and exception handling, and input / output device management.
* Vector processor that operates at 80 MHz and is used for digital signal processing such as discrete cosine transform, FIR filtering, renormalization, video motion estimation, etc. This vector processor is initialized by ARM7, can operate simultaneously with ARM7, and is synchronized with ARM7 by a special control command.
* Operates at 80MHz, 1KB instruction cache and 1KB data cache for ARM7, 1KB instruction cache and 4KB data cache for vector processor, shared 16KB integration for ARM7 and vector processor A cache subsystem consisting of a command & data cache ROM. The data cache for the vector processor can be controlled by hardware or software. The cache subsystem interfaces with ARM7 through a 32-bit data bus and with the vector processor through a 128-bit data bus.
* A 32-bit, 40 MHz input and output bus (IOBUS) that interfaces with various internal peripherals such as bitstream processors, interrupt controllers, timers and UARTs.
* 64-bit, 80 MHz high-speed input / output bus (FBUS) that interfaces with PCI bus controller, memory controller, DMA controller and customer ASIC logic block.
A block diagram of the DSP core is as shown in FIG.
[0058]
2.2ARM7 RISC CPU
2.2.1Overview
The ARM7 RISC CPU is a general purpose 32-bit RISC processor core. This ARM7 RISC CPU interfaces with the vector processor through a standard coprocessor interface and handles most non-computational centralized functions such as real-time OS, IO device interrupt processing and communication with the host CPU. Used for.
ARM7 CPU has the following characteristics.
* Extremely static operation ideal for power sensitive applications.
* Low power consumption: 0.6mA / MHz @ 3V production.
* High performance: 25 MIPs @ 40 MHz (40 MIPs peak) @ 3V.
* Large and small endian operation modes
* High-speed interrupt response for real-time applications (22 clock cycles at 40 MHz)
* Simple and powerful instruction set.
* Approximately 6mm²Very compact layout.
2.2.2register
ARM7 has 31 general purpose registers and 6 status registers, ie a total of 37 registers. The programmer is always provided with 16 general purpose registers and one or two status registers. In all processor modes such as user, supervisor, IRQ, FIQ, Abort and Undefined, R0 and R15 are directly accessible.
All registers except R15 are used generically and are used to maintain data or address values. R15 maintains a program counter (PC). Status is register CPSR-current program status register has an ALU flag and a current mode bit.
R14 is used as a subroutine link register and receives a set of R15 data when branch and link instructions are executed. In other cases, R14 can also be used as a general purpose register.
[0059]
General-purpose registers and program counters
[Table 24]

[0060]
[Table 25]
Program status register

[0061]
2.2.3exception
An exception refers to an abnormal condition that occurs during the processing of an instruction, which leads to a change in control flow. The seven types of ARM7 exception operations are listed below from higher priority to lower priority.
* Reset (highest priority)
* Cancel (abort) (data)
* FIQ
* IRQ
* Abort (prefetch)
* Undefined instruction trap, software interrupt (lowest priority)
[0062]
[Table 26]
Exception vector table

[0063]
2.2.4Instruction set
All ARM7 instructions are conditionally executed, which means that the ARM7 instruction is executed or not executed with the N, Z, C, V flag values in the CPSR register.
ARM7 instructions are divided into various categories as follows.
* Branches and linked branches (B, BL)
* Data processing (AND, EOR, SUB, RSB, ADD, ADC, SBC, RSC, TST, TEQ, CMP, CMN, ORR, MOV, BIC, MVN)
* PSR transfer (MRS, MSR)
* Multiplication and multiplication-accumulation (MUL, MLA)
* Single data transfer (LDR, STR)
* Block data transfer (LDM, STM)
* Single data swap (SWP)
* Software interrupt (SWI)
* Coprocessor data operation (CDP) (this is a group of instructions)
* Coprocessor data transfer (LDC, STC)
* Coprocessor register transfer (MRC, MCR)
2.3Vector processor
2.3.1Overview
The vector processor is a powerful digital signal processor that utilizes a unit instruction multiple data (SIMD) structure for maximum performance, in parallel on multiple data elements to achieve very good performance. Consists of a pipelined RISC engine that operates. Multiple data elements are packed in a 576 bit vector, which can be calculated at the following rate:
* 12.5ns-32 8 / 9-bit fixed point arithmetic operations per cycle or
* 12.5ns-16 16-bit fixed point arithmetic operations per cycle
* 12.5ns-8 32-bit fixed point or floating point arithmetic operations per cycle
2.3.2Execution pipeline
The vector processor uses a six-stage pipeline as shown in FIG. 11 to execute instructions. Most 32-bit scalar operations are pipelined at one instruction rate per cycle, while most 576-bit vector operations are pipelined at one instruction rate every two cycles. All Loads & Stores overlap with arithmetic operations and are performed independently by separate load & store hardware.
To reconcile design complexity and performance, vector processors can use hardware interlocks to check resource and data dependencies, out of order, and generate or execute instructions, etc. . This feature particularly improves the performance during periods when data caches are lost due to loading and storage.
[0064]
2.3.3Hardware microstructure
The vector processor is composed of four main functional blocks as described in FIG.
* Command word extraction unit (IFU)
* Instruction word decoder & issuer
* Command word execution data path
* Load & Storage Unit (LSU)
The instruction word fetch unit is in charge of processing to control the flow of instruction words to a subroutine such as branch and jump. The IFU has 16 entry queues consisting of instruction words prefetched for the current execution stream and 8 entry queues consisting of instruction words prefetched for the branch target stream. The IFU can receive 8 instruction words from the instruction word cache every cycle.
The instruction word decoder & issuer is responsible for decoding and scheduling for all instruction words. Even though the issuer can schedule most of the instruction words non-sequentially, depending on execution resources and operand data availability, the decoder can process one instruction word per cycle, always sequential from the IFU. Arriving instruction words can be processed.
The vector processor achieves most of its performance through multiple 288-bit data paths (see FIG. 13) operating at 12.5 ns / cycle, in this case including:
* Register file with 4 ports that can support 2 reads and 2 entries per cycle
* Eight 32 bits that generate 12.5 ns every time one of 8 32-bit multiplications (integer or floating point format), 16 16-bit multiplications, and 32 8-bit multiplications × 32 parallel multiplier
* Generates 12.5 ns every time one of 8 36-bit ALU operations (integer or floating point format), 16 16-bit ALU operations or 32 8-bit ALU operations 8 36-bit ALU
The load and store units are each designed to interface with the data cache through a separate read & fill data bus having a 288 bit width, as described in FIG.
[0065]
2.3.4Interrupts & exceptions
The vector processor recognizes only the following two special conditions.
* CPINT (coprocessor interrupt) instruction word executed by ARM7 program
* Hardware stack overflow resulting from nested jumps & multiplications executed by the vector processor program, more detailed, the vector processor handles these two special conditions Refer to the vector processor structure document for the correct method.
Other interrupts and exception conditions generated from the MCP are handled only by ARM7.
[0066]
2.4Cache subsystem
2.4.1Overview
The cache control unit (CCU) interfaces with the ARM7 core, vector execution units (LSU, IFU), memory (MCU, PCI, DMA, CODEC) and IO devices (BP, UART, timer, interrupt controller). The CCU interfaces with a high speed (80 MHz) FBUS and a low speed (20 MHz) IOBUS. The CCU is effectively a central data transmission unit between all internal CPU core units and peripheral IO devices. Refer to the block diagram (pp. 1-10) of the MSP-1E system spec for a detailed description of the CCU in the MSP chip.
To support a very high performance cache subsystem, the design of the CCU uses a transaction based protocol that supports all read and write operations. Any unit that needs to access the memory may generate a request at the CCU control unit. An arbiter in the control unit approves the request based on a fixed priority and replies 'transaction_id' with the requestor. The requester stores this 'transaction_id' so that the retransmitted data can be recognized when the data actually arrives. While CCU control is processing a request from one unit (which may require many cycles if a cache miss occurs), a new request from another unit with other 'transaction_id' It may be approved in the next cycle. Such a method of pending requests does not cause blocking of consecutive requests from other units, thus enabling high performance experiments. Currently, the CCU can accept and approve one read request and one fill request simultaneously in one cycle.
[0067]
The interface unit (FBUS) for the memory is composed of a four-entry address queue and a one-entry write-back latch. At best, the FBUS has one pending refill (read) request from the ARM instruction word cache, one pending refill (read) request from the VEC instruction word cache, one entry request from the VEC data cache, and a dirty ( The dirty) cache line can support a single write-back request from the VEC data cache.
In addition, the cache memory itself is optimized for high performance. The MSP cache system has an on-chip cache SRAM and a cache ROM. The cache SRAM is composed of four different banks to prevent thrashing between the ARM CPU and the vector core or the instruction word and data. The cache ROM provides a high-speed and high-density data storage area for ARM7 and the vector core. For example, the tag is not changed for the cache ROM, but the use of the valid bit becomes impossible and the data is returned from the external memory. In short, the cache memory on the chip includes the following blocks.
* 1 KB direct mapped instruction word cache and 1 KB direct mapped write-back data cache with 32-bit data bus interface to ARM7
* 1KB direct mapped instruction word cache with 256 bit bus interface to vector instruction word fetch unit
* Write-back data cache with 4KB direct mapping and 256-bit bus interface to vector execution units. The data cache consists of dual ports and can provide 256 bits of read data every 80 MHz cycle and support 256 bits of entry data.
* 4KB VEC data cache can be created by scratch-pad operation under software control.
* Instruction word & data ROM cache shared or integrated for use with ARM7 and vector processors. The interface to ARM7 is through the same 32-bit bus as its instruction word cache, and the interface to the vector processor is through the same 256 bits as its instruction word cache.
* 5 ports:
-Read / fill port for ARM7
-Read port for instruction fetch unit of vector processor
-Read / fill port for vector processor load / store unit
-Read / write port for IOBUS of vector processor
-Read / fill port for FBUS
* 32 x 256-bit SRAM (~ 1KB) for ARM7 CPU instruction cache
* ARM7 32x256-bit SRAM (~ 1KB) for CPU data cache
* 128 x 56 bit SRAM for vector processor data cache
(~ 4KB)
* 32 x 256 bit SRAM for vector processor instruction word cache
(~ 1KB)
* 512 x 256-bit SRAM for data & instruction word cache
(~ 16KB)
The vector data cache is controlled by hardware control or software control.
[0068]
2.4.2Cache subsystem structure
FIG. 15 is a block diagram of the MSP cache system, which includes the following block IDC (Instruction Data Cache), cache ROM, CCU_DATA_DP, CCU_ADR_DP, CCU_CTL, and CCU_SM. Details of each sub-block will be described later.
2.4.2.2IDC
The instruction word and data cache (IDC; see FIG. 16) is an SRAM memory on the chip and is used to provide instruction word and data cache access. This cache is composed of four banks for one array: ARM_IC (1 KB), ARM_DC (1 KB), VEC_IC (1 KB), and VEC_DC (4 KB). In any cycle, this cache accepts one read request and one fill request. The tag RAM has two read ports. The read port address and entry port address are compared to the internal cache tag for a hit or miss condition. The data RAM has only one read port accessed by the read port address. The tag RAM and the data RAM are filled in using different sets of entry addresses. Therefore, in order to access the cache array, four sets of cache bank selection signals and three sets of line indexes are required.
IDC has the following characteristics.
* Maps directly to write-back rules.
* The cache line size is 64B, but the data width is 32B, which corresponds to the size of the vector data width of the MSP chip.
* Each line has two valid bits, one for the high vector and the other for the low vector. The data cache has two dirty bits, one for each piece of data.
* The tag size for ARM_IC, ARM_DC and VEC_IC is 22 bits (address bits 10 to 31), and the tag size for VEC_DC is 20 bits (address bits 12 to 31).
* The line index bits for ARM_IC, ARM_DC, and VEC_IC are 5 bits (address bits 5 to 9), and the line index bits for VEC_DC are 7 bits (address bits 5 to 11).
* VEC_DC (4KB) can be recreated into a scratch-pad under software control.
* V_CLEAR signal is used to globally reset all of the cache line valid bits at once. Later V_CLEAR can selectively reset only individual banks.
[0069]
2.4.2.3Data path pipeline
See FIG.
2.4.2.4Address route pipeline
The data path for the address processing pipeline is as shown in FIG.
CCU ADDRESS DP
2.4.3 interface
2.4.3.1data type
The CCU processes different data types from the multiple requesting units described in Table 15.
[0070]
CCU operation when processing different data types
[Table 27]

[0071]
2.4.3.2ARM interface
While the ARM7 CPU core operates at 1/2 the MSP chip frequency (40 MHz), the CCU operates at the MSP chip frequency (80 MHz). The synchronization between the two clocks is important at the time of design. Generally, the clock generator unit switches MCLK on the rising edge of CLK1. Also, the overall reset signal coupled to ARM7 is de-asserted when CLK1 and MCLK are low. In this way, the two units are properly synchronized. ARM7 has only one input bus (ARM_DATA <31: 0>) for instruction words and data, but the MSP chip has a dedicated instruction word cache (ARM_IC, 1KB) and data cache (ARM_DC, 1KB). Can distinguish between the two types of requests using ARM_NOPC.
In order to further improve the performance, the CCU adds a microinstruction word cache (UI_CACHE, 32B) and a micro data cache (UD_CACHE, 32B) located between the main cache and the ARM7 core. This cache has 8 words, each consisting of continuous code and data. These microcaches have their own tag (27 bits), tag comparator and valid bit. All valid bits are cleared during the system reset period.
The ARM7 microcache performs the role of a prefetch buffer rather than the actual cache. During the ARM7 read period, the address (ARM_A, 31: 0>) is always compared to the tag. The hit reads back the instruction word or data through ARM_DATA <31: 0>. One microcache then sends the request to the CCU along with the address, data type and other control information. The CCU's arbiter logic acknowledges that requests from all units make read requests. Currently, in getting approval, ARM7 has the highest priority over other blocks. The reason is that ARM7 rarely makes a request unless the ARM7 microcache has a miss. However, the CCU can have an internal hold cycle to provide multiple cycle requests or address queue full conditions. During this period, no external requests are approved.
An entry from ARM7 always invalidates UD_CACHE if the address hits UD_TAG. No attempt has been made in designing UD_CACHE as a write-through or write-back cache. By invalidating at the time of UD_CACHE entry hit, the data between ARM_DC and UD_CACHE can be matched.
The CCU controls arm_nwait while sending a read or write request to ARM_IC or ARM_DC. In general, the CCU does not hold arm_nwait during the entry period. Once the entry request is approved without looking at ccu_write_hold2, ARM7 simply brings the data in ARM_DATA <31: 0> from the next cycle. The CCU has an internal entry buffer to store data. ARM7 can continue to execute the instruction word. However, the CCU always holds arm_nwait for one cycle, even if the data is in the main cache. If the read request misses the main cache, more cycles are held until the data is returned from the external main memory. The machine in the ARM_CCU interface state illustrated in FIG. 19 describes the conditions under which the CCU controls arm_nwait.
[0072]
In FIG. 19:
START: Start state for the state machine when there is no request or read data is returned or a write request is generated without holding.
HOLD: The CCU approves an ARM7 request for reading or filling, and cancels the approval with a hold signal.
TAG: The CCU checks the tag with the read address.
MISS: Read address has one miss and ccu sends a refill request to an external dram.
DATA: Read data is returned, and the CCU sends the returned data to the microdata cache.
2.4.3.3FBUS interface
The CCU_FBUS interface state machine (F_SM) is as shown in FIG. In FIG. 20:
IDLE: Idle state
REQ: Sends a read or fill request to the FBUS arbiter.
GRT1: The approved size is larger than 8B.
GRT2: The approved size is larger than 16B.
GRT3: The approved size is larger than 24B.
GRT4: Drive data for the last cycle
The data reception state machine (D_SM) is as shown in FIG. In FIG.
IDLE: Idle state
ONE: The first 8B data is received from Fdata <63: 0>.
TWO: The second 8B data is received from Fdata <63: 0>.
THREE: The third 8B data is received from Fdata <63: 0>.
FOUR: The fourth 8B data is received from Fdata <63: 0>.
REFILL: Refills the IDC before returning data to the request.
RDY: Prepare to return the data to the requester.
[0073]
2.4.4Read and write operations
The read and fill state machine is as shown in FIG.
2.4.4.1 Read operation
In MSP, IDC (Instruction and Data Cache) operates in three stages of pipeline cycles: request cycle, tag cycle, and data cycle. In a cache hit situation, the IDC can return an instruction word or data every cycle.
The cache controller unit (CCU) is responsible for arbitration between ARM7, the vector processor unit, FBUS and IOBUS for cache SRAM access. The CCU monitors bus requests from these four masters and grants the bus to the winner with a specific ID number. The CCU also generates a cache address bus and read / write control signals to access the cache and compare tags.
If there is a cache hit, the bus master who won the arbitration can access the cache for a read / fill operation. If there is a cache miss, the CCU generates a request without waiting for lost data returned from the main memory and then assists the bus master. So the bus master with a cache miss should maintain the ID number. Thereafter, when the requested data is in the cache, the CCU sends a GRANT signal to the bus master that has lost the data having the same ID number. This bus master accepts or ignores the data.
When a cache miss occurs, line fetch is performed to receive data from the main memory. The line size is defined as 64 bytes, so the CCU performs 8 consecutive memory accesses (64 bits each time) to supply data from main memory to the cache.
* Request cycle:
The CCU accepts read requests from multiple units (ARM, IFU, LSU, IO) at CLK1. The requester displays a request signal (1su_req) and a read / write signal (1su_rw) at the beginning of CLK1. At the end of CLK1, the CCU acknowledges one in this read request by driving ccu_grant_id [9: 0]. If ccu_grant_id [9: 6] is matched with the requester's unit_id, the request is approved. The requester must latch ccu_grant_id [5: 0] because ccu_grant_id [5: 0] is the transaction_id associated with the request.
If the request is approved, the requester will respond to the address (1su_adr [31: 0]) and other data such as CLK2 (1sh_ccu_off) and data type (1su_vec_type [1: 0], 1su_data_type [2: 0]). Control information is sent to the CCU.
If ccu_rd_hold_2 is not displayed at the end of CLK2, the request is sent completely to the CCU and the requested data is returned after some time. However, when ccu_rd_hold_2 is displayed, the requester subsequently sends the address and control information while canceling the request approved at CLK1. Since all previous grant_id information is still valid, there is no need to generate further identical read requests in the next cycle. ccu_rd_hold_2 remains constant at CLK1 until released by the CCU at CLK2.
ccu_rd_hold_2 is a timing critical signal that is used by the requester to inform the CCU that it is busy processing other things in the current cycle and that the approved request has not yet been processed.
* Tag cycle
If the request is approved and not later canceled in the request cycle, the request enters the tag comparison phase of cache access. The CCU uses an address (1su_adr [11: 5]) and a bank selection signal (requester) to select a line for tag reading. The tag hit signal (ccu_1su_hit_2) is known at the end of CLK2. Data is restored in the next cycle due to a hit situation. The read port tag is output and latched by CLK.
The address queue status is evaluated in this cycle. Tag miss and 'almost_full_address_queue' display 'ccu_rd_hold_2' signal. The CCU state machine will not process any new read requests, but will further attempt aborted tag comparisons.
Since each cache line (64B) contains two vectors, the valid bit of the accessed vector must be valid to obtain a tag hit. For reading twice the vector (64B) data, two valid bits must be valid to get a tag hit. The cc_off action always triggers a tag miss and the request is posted in the address queue.
* Data cycle
This is a cycle in which the CCU returns data to the requester. Data is put on ccu_dout [127: 0] together with lower 16B driven by CLK1 and upper 16B driven by CLK2. For 64B data requests, one additional cycle is used to terminate the transmission.
The CCU always drives ccu_data_id [9: 0] in the initial 1/2 cycle of CLK2 to inform the requester that data will be restored at the next CLK1. The requester always compares ccu_data_id [9: 0] for proper return data. A tag hit is used as an indicator of return data.
If there is a tag miss in the tag cycle and the address queue is not full, the CCU starts a cache line fetch while posting the lost address, id information and other control information at CLK1 in the four entry address queue. . Currently, each address queue contains approximately 69 bits of information. The memory address latch is loaded at CLK2, and an FBUS request is generated at the next CLK1.
[0074]
2.4.4.2Filling action
The IDC entry operation operates in a three-stage pipeline cycle: request cycle, tag cycle, and data entry cycle. In an entry address hit situation, the IDC can enter data into the cache data array every cycle.
* Request cycle:
The CCU accepts entry requests from multiple units (ARM, LSU, IO) at CLK1. The requester displays a request signal (1su_req), a read / write signal (1su_rw), and a vector type (1su_vec_type [1: 0]) at the beginning of CLK1. At the end of CLK1, the CCU approves any one of these entry requests. Entry approval for different units is realized by displaying an approval signal (ccu_1su_wr_grant) directly on the request unit. Since no data is returned, there is no need for the request unit to receive a transaction_id from the CCU. At CLK2, the requester must supply the address (1su_adr [31: 0]), cc_off signal (1su_ccu_off) and data type (1su_data_type [2: 0]). Similarly for the read case, the CCU displays ccu_wr_hold_2 near the end of CLK2 to inform the requester that the request has been acknowledged but not processed in the current cycle. The requester continues to drive the address, cc_off signal and data type information until ccu_wr_hold_2 is released. Thereafter, in the next cycle, the requester supplies entry data to ccu_dout [127: 0].
* Tag cycle
If the request is approved and not later canceled in the request cycle, the request enters the tag comparison phase of cache access. This cycle compares the entry port address tag. The CCU uses an address (1su_adr [11: 5]) and a bank selection signal (requester) to select a cache line. The tag hit signal (ccu_1su_hit_2) is known to the end of CLK2. A cc_off entry always induces a tag miss and entry data is placed on the FBUS for external entry.
The requester starts driving data to ccu_din [143: 0] by the lower 16B of CLK1 and the upper 16B of CLK2. For 64B data transfer, the requester takes one additional cycle to drive the data. The CCU has an internal entry data latch to hold this data. If this entry hits the cache (one or two cycles are used to actually fill the cache with data) or misses the cache (uses the least number of cycles to fill the data) The requester considers the entry complete.
* Data entry cycle
This cycle is a cycle in which the CCU enters actual data in the cache due to a cache hit situation. If there is a tag miss in the tag cycle, the CCI handles this differently depending on the data type.
If the data type is 32B and the line is clean (and the two vectors are also clean), the CCU just overwrites the current line with new tags and new data. In addition, while the vector being accessed is displayed as valid and dirty, other vectors in the same line are kept invalid.
If the data type is less than 32B, this cycle is partially data entry. This partial data is stored in a temporary register. The CCU fetches the missing half line (32B) from memory, loads it, and returns it to the cache. The partial data is then written to the cache line with the appropriate byte enable signal.
For every entry miss that has a dirty cache line, the CCU first copies the dirty line. Since the dirty data has not been used yet, the CCU displays a hold in the approval logic to prevent new read or fill requests from being approved. Thereafter, in order to fetch the dirty cache line data, the internal reading is started using the dirty line. Eventually, the write-back address and data are supplied to the memory.
[0075]
2.4.5Programming model
All of the cache subsystems are controlled by hardware using load and store instructions, so no software-visible registers are required.
2.4.6IDC and ROM address formatIs as shown in FIG.
[0076]
Chapter 3 Explanation of IOBUS
This chapter describes the specifications of IOBUS shown by hardware designers.
3.1Overview
IOBUS is designed for low speed standard peripherals used in the system. This bus serves as the main interface between the MSP cache control unit (CCU), bitstream processor (BSP) and timer / interrupt controller, and all other IO peripherals such as UART. The bus format is very similar to the Intel IO bus. The bus arbiter control logic constantly monitors the bus for requests and uses a round-robin system to generate the appropriate request-approval. The potential bus master always displays the bus-request and waits for the bus-approval to be displayed before occupying the bus. The bus master always drives the address and control lines for the duration of the protocol.
[0077]
IOBUS is a synchronous bus that operates at 40 MHz overall. All approvals on the MSP IOBUS occur in the first cycle after the request is actively sampled. This bus can process up to 16 bytes for 4 cycles (4 bursts). This is accomplished by using two size bits that inform the bus arbiter of the transmission size requested by the bus master.
IOBUS has a 32-bit address and data multiplexer. The address always appears before the data. The IOB_ALE (address latch enable) signal is used by the receiving device to latch the address. Even if 8-bit devices are coupled to the bus, all bus access assumes 32-bit transmission. According to normal rules, 8-bit devices use the lower 8 bits [7: 0] of the bus, and 16-bit devices use the lower 16 bits [15: 0] of the bus. If a 16-bit device wishes to communicate with an 8-bit device, the correct data should be placed in the lower 8 bits of the bus so that the 8-bit device can find and latch the data. If there are multiple requests in the same period, an unapproved requester must always hold the request until the IOBUS arbiter approves it. There are many "bus-access cycles" or 4 * 32 bit transmissions (up to 16 bytes) for requests allowed in such systems. The block transmission is always divided into a large number of 32-bit transmissions.
All bus approvals are generated by the IOBUS arbiter. However, there is parallel decoding logic that always monitors the address (when valid) and generates the appropriate chip selection (for the next clock cycle) at the destination. Chip selection is always valid for only one cycle, after which the address is displayed for all read and write requests. Each IOBUS node has a dedicated chip selection as input. See pin description and timing diagram.
The 2-bit size information is generated by the master after being approved by the bus arbiter and is valid for the subsequent two bus cycles. When CS is displayed to determine the bus transmission cycle, the selected slave must acquire size information. Also, when reading or writing, the IOBUS arbiter keeps track of the transmission size to determine that the bus cycle is finished before it starts looking for new requests. There is no difference between the data during burst-bus transmission (reading or writing).
In data read transmission, the READY signal is used to inform the requester when data is valid and to initiate this data latch. This READY signal is generated by the bus master and the slave.
To satisfy this protocol, all IOBUS nodes need to design an IOBUS interface before processing the request. This interface must satisfy the following specs:
[0078]

[0079]
3.2Pin description
Hereinafter, definitions of addresses, data, and control signals for the system IOBUS viewed from the bus master side will be described. See FIG. 24 showing the IOBUS structure definition. As mentioned above, IOBUS is a multiplexed address / data bus.
“Xxx” is three character codes indicating requester names (ccu, bsp, urt, tmr, int).
* System IOBUS signal definition
[0080]
[Table 28]

[0081]
3.3Logic definition
The IOBUS arbitration control unit is as shown in FIG.
3.4IOBUS timing
The IOBUS read timing (transmission size = 1 word (4 bytes)) is as shown in FIG. 26, and the IOBUS entry timing (transmission size = 1 word (4 bytes)) is as shown in FIG. The IOBUS reading timing (transmission size = 4 words (16 bytes)) is as shown in FIG. 28, and the IOBUS entry timing (transmission size = 4 words (16 bytes)) is as shown in FIG.
[0082]
Chapter 4 FBUS Description
This chapter describes the specs of FBUS on the hardware designer side.
4.1Overview
The memory controller, PCI, customer order semiconductor, and cache subsystem ion interface with the system bus “FBUS” through unmultiplexed address and data bus lines. One central FBUS arbitration control logic monitors the request and generates an approval using a priority scheme. The bus master (address and data source) always displays a bus request and waits for approval. In the normal state, grants occur in the same cycle where no request pending the bus is used by any other master / slave (all grants are generated jointly). Once the master receives a bus acknowledge, the address / data / control line is sent to the next cycle. The “Data Ready” signal always processes the actual data to inform the receiver that it has begun the next cycle latch.
In order to maximize bus bandwidth, four consecutive requests are received / transmitted in a pipeline back to back fashion and require a “request FIFO” to serve the four requests. To do. The memory controller has four deep request FIFOs and two deep data FIFOs. Due to such protocol characteristics, “AF_FULL” and “DF_FULL” signals are required. These indicate an address FIFO full and a data FIFO full, respectively. FBUS uses an acknowledgment count and request size bus to support 8, 16, and 32 byte data transmission.
[0083]
Each FBUS unit has control logic for requesting a bus. This logic varies from unit to unit depending on the application (memory / PCI / cache, etc.). However, the actual bus arbitration unit is the same for each unit and is duplicated in all submodules. This unit acts as a medium between the external bus master / slave and the internal unit logic. For example, in the case of a memory controller, once CAS is activated, the memory controller displays an internal request to the FBUS arbitration logic through an internal signal indicating that FBUS needs to be used. In response to this request, the FBUS controller displays the request on the external system to the memory controller and waits for approval. Once the acknowledgment is received, the address / data control is transmitted from the first entry of the response and the data FIFO of the memory controller.
[0084]
The system request size for the memory controller can be from 1 byte to a maximum of 32 bits. For request sizes greater than 32 bytes, the source / requester uses the FBUS size bit to initialize multiple requests. This is due to the limitations of the SDRAM memory bus (one or two (Samsung SDRAM 1M * 16). SDRAMs have eight wraps to achieve the full 32 bytes required by the rest of the system. For a request of 32 bytes or less, all 32 bytes are fetched from the SDRAM, but only the desired number of bytes are transmitted to the destination.
The 10 bit requester ID buses are validated by a “chip select” signal (the same cycle as the address / data).
All FBUS nodes generate a 3 bit “Destination ID” in the FBUS arbiter. These 3 bits are validated with the request and represent the destination of the request. The destination ID bits [1: 0] are decoded from the requester ID input as follows.
[0085]
Requester ID [9: 6] Source Destination ID [1: 0]
0000 Reserved N / A
0001 ARM7 N / A
0010 FU N / A
0011 LSU N / A
0100 CCU 00
0101 ASIC 11
0110 MEM 01
0111 PCI 10
1xxx reservation
The destination ID bit [2] is used to indicate a read / fill request status. This helps the FBUS distinguish between an address request (read) and an address / data request (fill).
In a normal state, the acknowledge count bits “grCNT [1: 0]” indicate the number of FBUS cycles that the requester needs the bus. For a loopback request, the request informs the bus master of the request length. The FBUS master controller indicates approval by two approval count bits.
FBUS is a split transaction bus that supports posted reads. This is when the requester requests the bus and once approved, this FBUS drives the address and ends the transaction. At some later time, the slave / data source uses the destination ID and returns the data by returning the same request 112 to the requester. Such characteristics greatly improve the bus bandwidth and allow other masters to use FBUS more quickly. Refer to the timing diagram for more details.
4.2Pin description
Hereinafter, the address, data, and control signal of the system FBUS will be described. As mentioned above, FBUS is a demultiplexed address / data bus.
“Xxx” is three character codes representing requester names (mem, pci, asc, ccu).
[0086]
System FBUS signal definition
[Table 29]

[Table 30]

[0087]
30 shows a memory read request FBUS flow, FIG. 31 shows a memory entry request FBUS flow, FIG. 32 shows a master / slave “non-memory” request FBUS flow, FIG. 33 shows the central FBUS arbitration control unit.
34 to 36 are FBUS timing diagrams, and FIG. 34 shows memory request FBUS timing (8-byte data transmission is shown, and a plurality of data cycles of 16/32/64/128 bytes are used). . FIG. 35 shows a memory read request FBUS timing (transmission size = 8 bytes), and FIG. 36 shows a memory return entry request (transmission size = 32 bytes).
[0088]
Chapter 5 PCI Bus
This chapter describes the PCI glue logic specs that interface with the PCI core and internal FBUS.
5.1Overview
The MSP_1E PCI controller is designed to satisfy PCI bus spec revision 2.1. See this standard spec for more details.
The PCI unit includes two main sections: PCI core and FBUS 'glue' logic. The PCI core interfaces with external PCI devices that operate primarily at a PCI bus speed of 33 MHz. The FBUS 'glue' logic interfaces with a Samsung FBUS operating at 80MHz. This 'glue' logic interfaces between the PCI core and the FBUS. Speed synchronization can be realized using a FIFO at the two ends of the sub-block.
The Samsung PCI core also includes virtual frame buffer logic and all the VFB registers necessary to interface with ARM7 through FBUS.
The only feature for this PCI unit is the host CPU MSP chip and the interrupt processing from the MSP chip to the host CPU. This will be described in more detail.
[0089]
5.1.1Samsung PCI core block diagramThis is as shown in FIG.
5.2PCI FBUS interface logic (see Figure 38)
The PCI core sub-block interfaces with the MSP internal FBUS and the SAND Micro PCI core. Address and data are stored in the FIFO at two ends. This sub-block also serves to synchronize the PCI signal and the FBUS clock.
PCI core logic may be FBUS master and slave devices. Most accesses go to local SDRAM memory through 64-bit FBUS. See the FBUS chapter for a description of the FBUS protocol.
The PCI FBUS control logic also includes a virtual frame buffer register and control. This register is programmed by ARM7 through FBUS. See block diagram.
5.3PCI VFB logic
FIG. 39 is a VFB block diagram, and FIG. 40 is a VFB register.
5.4PCI core logic
The MSP PCI core is fully satisfied with the PCI 2.1 spec. An additional item is the number of registers added for interrupt and software MSP reset. The host CPU can be interrupted by setting a PCI host interrupt request from the software in ARM7, MSP (bit <3>) in the MSP control register. This causes the PCI core logic to interrupt the host CPU by setting an interrupt pin on the PCI bus (INTA #). Thereafter, the host CPU recognizes the interrupt through the PCI host interrupt recognition (bit <4>) of the MSP control register. This causes the interrupt line to become inactive.
The MSP PCI core can also basically receive interrupts from the host CPU that are interrupts to ARM7. Since the PCI spec does not support any interrupt input pins, the MSP interrupt request (bit <2>) from the host in the MSP control register is used to provide this function. The host CPU can set this bit to represent an interrupt to ARM7. Next, upon recognizing the host interrupt, ARM7 clears this register. See the block diagram of FIG. For FIG. 41, three registers mapped to the MSP area that is not the PCI space are required.
Please refer to the PCI 2.1 spec for detailed information on the actual PCI core.
[0090]
Chapter 6 Memory Controller
6.1 This chapter explains the specifications of the memory controller in terms of hardware and software designers.
6.2Overview
MSP memory controllers have a number of features and programmability levels for trade-offs between cost and performance. The memory controller interfaces with the main system bus “FBUS” operating at 80 MHz and the DRAM chip. In order to achieve an 80 MHz clock frequency, a synchronous DRAM is used at an early design stage.
Eventually, the memory subsystem supports standard fast page DRAM, extended data output (EDO) DRAM and synchronous DRAM. The memory bank size is limited to two external banks that can be interleaved.
Early synchronous DRAM memory controllers have the minimum features necessary to operate a DRAM. The following shows the basic features of the first pass memory controller.
* Samsung synchronous DRAM support
* One memory bank (1M x 16) using two SDRAM chips
* Cas-Before-Ras (CBR) refresh support
* Partial entry support to initialize Read-Modify-Write operation
* Internal bank interleaving support (ping-pong through MA [11])
* 80MHz memory and processor bus (1: 1) frequency match
* Programmable refresh rate
* Address and data queuing for efficient use of the system bus
* Manual "two bank precharge" support
The MSP memory controller has two main sub-components: a data controller and an address controller. The data controller stores read data from the DRAM and has a read and write data queue for writing data from the processor bus. The data controller also includes RMW logic for byte entry. All control over the data controller comes from the address controller.
The address controller has a request queue, a response ID queue, memory access decoding logic, a page comparator logic, a RAS / CAS state machine, a refresh state machine, and all necessary control signals used by the data controller.
The SDRAM memory clock is the same as the system clock. The SDRAM receives each set of control signals.
[0091]
6.2.1Memory controller block diagramIs as shown in FIG.
6.2.2Memory controller flowIs as shown in FIG.
6.3Address controller (AC)
In the memory controller, the address controller section serves not only to manage the data controller but also to generate all DRAM control. This section of the MSP memory controller is also responsible for the FBUS interface address and control path. The following block diagram shows a number of sub-sections of the address controller unit.
6.3.1Address controller block diagramIs as shown in FIG.
6.3.2Memory controller request FIFO
The MSP memory controller has four deep request FIFOs that store the FBUS address and control information for dispatch to the actual memory controller state machine. Each entry in the request FIFO has a “valid” bit indicating that the particular entry is valid. The memory controller state machine always supports the lowest entry in the ENTRY_0 FIFO. Once a request is provided and the column address strobe (CAS) is activated, the memory controller displays a clear signal to clear this entry. FIFO FULL / EMPTY status is initialized to shift barrel shift valid contents to entry 0.
The MSP memory controller request FIFO format is as shown in FIG.
[0092]
6.3.3Memory controller address decode / map
The address decoding logic mainly serves to generate an 11-bit SDRAM row address MA [10: 0] and an 8-bit column address MA [7: 0]. This address line is driven directly to the SDRAM address input [11: 0]. The memory address bit [11] is used to toggle between the internal SDRAM bank and improved memory bus usage for performance.
This memory address is generated using a programmable multiplexer provided through a register that represents:
-Current system cache line size
-Number of internal banks
-Internal bank interleaving
The system cache line offset is 5 bits for a 32 byte cache line. FIG. 46 shows the proposed memory address format generated from the FBUS system address for 16 MB DRAM. This multiplexed memory address is valid for one cycle with RAS and CAS strobes as directed by the memory controller state machine.
The MCU can perform an 8-byte entry without instructing a read-modify-entry operation. However, bit [2] of the FBUS address is always zero because it only starts the address. This bit is mapped to bit [0] of the SDRAM address, which is one of the three bits representing the starting address as described below.
Faddr [4: 2] entry sequence (WRAP = 8)
000 0-1-2-3-4-5-6-7
010 2-3-4-5-6-7-0-1
100 4-5-6-7-0-1-2-3
110 6-7-0-1-2-3-4-5
These are all even starting addresses and are sequences supported by the MCU.
All read operations assume 32 bytes and start address is
(000) = rna [2: 0] = Faddr [4: 2].
[0093]
6.3.4Memory controller state machine
The MSP memory controller has one master controller state machine. This state machine is responsible for generating all timings (RAS / CAS / WE / CS / DQM) for the SDRAM control signals. The state machine monitors the request FIFO for valid entries that are always in entry 0. Once a valid bit is detected, the state machine kicks off the SDRAM sequence start. Also, the Page_hit signal is monitored from the page comparator to determine whether RAS precharge is necessary.
The RAS procharge is performed on the current active / release bank. The manual precharge sequence includes displaying CS, RAS, WE and MA [11] to activate the zero state. The internal bank selection bit MA [11] is used to select a bank for precharging. For reading: The precharge command is displayed after data is received from the SDRAM to avoid data collisions. In the case of entry, precharge is issued after the last bit of data has been entered into the memory. Once the precharge command is completed, the specific bank is in an idle state for the next memory operation. According to the SDRAM spec, the precharge command can be generated any time after tRAS (min) (here 60 ns) is satisfied. However, due to the current length of 4 wraps, the memory controller state machine generates a precharge command after data is read / written to memory.
The following shows the SDRAM parameters used with the MSP memory controller.
[0094]
[Table 31]
SDRAM parameters

[0095]
* TRAS can be used in 5 cycles to achieve 60ns column access time for synchronous DRAM. See memory controller timing diagram.
6.3.4.1State machine diagram
FIG. 47 shows an SDRAM memory controller RAS / CAS state machine diagram.
6.4Memory controller refresh
The synchronous DRAM needs to be refreshed every 32 ms (15.6 us) to maintain the data in each storage cell. Synchronous DRAM also supports two-mode refresh: auto-refresh and self-refresh.
[0096]
6.4.1SDRAM auto refresh
Using standard auto-refresh, the two internal banks are alternately refreshed by the internal counter. Since the number of rows is 4096, autorefresh requires a 2048 autorefresh cycle to refresh the entire DRAM.
The auto refresh command is generated by indicating that CKE and WE are high and CS, RAS & CAS are low. This command is displayed only when two banks are idle.
The time required to finish the auto refresh is
tRC (min) / cycle time = 100ns (spec) /12.5ns = 8 cycles (80MHz)
6.4.2SDRAM self-refresh
Self-refresh is yet another mode used for Samsung SDRAM. This is a preferred refresh mode for data maintenance and low power operation in general. Here, the SDRAM disables all input buffers except the internal clock and CKE.
When CS, RAS, CAS and CKE are low and WE is high, the self-refresh mode is entered. The self-refresh mode requires SDRAM clock shunting and retry using the CKE signal, so the MSP memory controller does not use this refresh mode.
6.4.3Manual refresh
This refresh mode requires a state machine / counter design. The counter times out every 15.6 us and displays a refresh strobe in the memory controller logic. The memory controller then ends the current refresh and immediately initiates an SDRAM refresh cycle. This cycle has no restrictions in the idle state and is quite similar to the automatic refresh cycle.
[0097]
6.5Data controller (DC)
In the memory controller, the data controller section is provided primarily as a data queue for writing data from the processor or reading data from the SDRAM. The controller also has merge merge logic for every partial fill (byte fill). Partial entry begins by kicking off the DRAM read and then merging the data, and finally writing more completely modified words into the memory. Thus, any request following a partial entry sequence must take a performance hit.
6.5.1Data controller block diagramIs as shown in FIG.
[0098]
6.6Pin description
This controller provides the following package pins:
* RAS_I: Output pin (acti blow). This is a row address strobe for latching the row address from MA [11: 0] into the internal row address buffer of the selected DRAM bank.
* CAS_I: Output pin (acti blow). This is a column address strobe for latching the column address from MA [11: 0] into the internal column address buffer of the selected DRAM bank.
* WE_I: Output pin (active blow when writing). This is for driving the write enable input pin of the DRAM.
* MA [11: 0]: Output pin. Multiplexed row and column address signals for DRAM.
* DQM: Output pin. After masking the clock and output, the SDRAM data output is set to high impedance. (This pin is only used for synchronous DRAM interface.)
* CS_I: Output pin (acti blow). Disabled or enabled for selected SDRAM operation. (This pin is only used for synchronous DRAM interface.)
* CLK: Output pin. This is a clock output pin for the synchronous DRAM, which is used only in the SDRAM and has the same phase as the MSP system clock.
[0099]
6.7Memory controller timing diagramIs as shown in FIGS. 49 to 51. Items related to FIG. 49 are as follows.
-Assumed for Samsung SDRAM
-Memory and system operating at 80MHz.
-1 or 2 external SDRAM (1M x 16).
-Length of 4/8 programmable wrap to fetch line from memory.
-TRCD = 3.
-TCAS = 3.
-Internal delay = 2 clocks.
-Memory latency = 8 cycles (8 x 12.5 = 100 ns).
-System data from memory is delayed by about 2 cycles due to arbitration (read data).
[0100]
6.8Programmable model
From the programmer's perspective, the control registers associated with the memory controller are as follows:
6.8.1SDRAM reset register (R / W)
This register is reset after each system reset. This is a 1-bit register that transmits a reset_sdram signal that starts the SDRAM power-on sequence. This register is set to 1 at system reset. This register must be cleared by software in order for the SDRAM to operate.
Bit 0 is set by a system reset and cleared to operate the SDRAM.
Programming address:
Faddr [31:20] = 12'h010
Faddr [3: 0] = 4'b1011
6.8.2SDRAM burst type register (R / W)
This register programs the SDRAM burst type. This is a 1-bit register that is programmed to zero for sequential burst types.
Programming address:
Faddr [31:20] = 12'h010
Faddr [3: 0] = 4'b1010
Bit 0 is reset together with the system reset and cleared to operate the SDRAM.
6.8.3SDRAM refresh register (R / W)
This register programs the SDRAM refresh value. This is a 12-bit register programmed through FBUS.
Programming address:
Faddr [31:20] = 12'h010
Faddr [3: 0] = 4'b1001
Bit 11-0 is reset with a system reset and programmed to a refresh value of 4E0.
6.8.4SDRAM RAS precharge (tRP) register (R / W)This register programs the SDRAM RAS precharge value. This is a 3-bit register that is programmed through FBUS.
Programming address:
Faddr [31:20] = 12'h010
Faddr [3: 0] = 4'b1000
Bits 2-0 are reset upon system reset and programmed to 1 or 2 or 3.
6.8.5SDRAM CAS waiting time (tCAC) register (R / W)
This register programs the SDRAM CAS latency. This is a 3-bit register that is programmed through FBUS.
Programming address:
Faddr [31:20] = 12'h010
Faddr [3: 0] = 4'b0011
Bits 2-0 are reset with a system reset and programmed to 1 or 2 or 3.
6.8.6SDRAM RAS CAS waiting time (tRCD) register (R / W)
This register programs the SDRAM RCD latency. This is a 3-bit register that is programmed through FBUS.
Programming address:
Faddr [31:20] = 12'h010
Faddr [3: 0] = 4'b0010
Bits 2-0 are reset with a system reset and programmed to 1 or 2 or 3.
6.8.7SDRAM WRAP LENGTH register (R / W)
This register programs the wrap length of the SDRAM for the data. This is a 3-bit register that is programmed through FBUS.
Programming address:
Faddr [31:20] = 12'h010
Faddr [3: 0] = 4'b0001
Bits 2-0 are reset with a system reset and programmed into 1, 2, 4, or 8.
6.8.8SDRAM NOP TIME register (R / W)
This register programs the SDRAM NOP time for the power-on sequence. This is a 16-bit register programmed through FBUS.
Programming address:
Faddr [31:20] = 12'h010
Faddr [3: 0] = 4'b0000
Bit 15-0 is reset with a system reset, and is programmed to 200 us according to the clock frequency.
[0101]
Chapter 7 ASIC Interface
This chapter explained the specifications of the ASIC interface unit.
7.1Overview
The ASIC interface unit (see FIG. 52) has one programmable 32-bit DMA, multiple FIFOs and control blocks. The ASIC interface block interfaces the main system bus (FBUS) operating at 80 MHz with the CODEC interface block that interfaces MSP, AD1843 (audio, telephone), KS0122 (video capture), KS0119 and VGA. The current assumption is to run all CODEC interfaces and DMA controllers at full FBUS speed to avoid any synchronization problems.
The customer ASIC block has three main sections: the FBUS master / slave interface, the MSP 8-channel DMA controller and the actual CODEC. Data is transferred from the FBUS to the CODEC or from the CODEC to the FBUS. However, the address is generated only from the DMA controller. Then, this address may be an FBUS mapped by the FBUS interface logic. All entries from other FBUS nodes program only the registers in the CODEC section. For all other traffic, a response with size and ID information must be read. Refer to the FBUS specification.
The following are features for the ASIC interface unit.
* Supports 32-bit basic DMA function (8 channels-1 channel for each codec).
* Two 4 deep x 64 bit data FIFOs.
* 1 deep x 52 bit request FIFO.
* One 2 deep x 52 bit responsive FIFO.
* Supports Master / Slave for FBUS and CODEC interface block.
* Operating frequency: Up to 80MHz.
* Access support for memory IO and memory access to IO.
* Top priority support for channel 0 used for KS0119.
* Special address bus support to realize high performance for KS0119.
This customer interface logic supports three different CODECs.
* Audio and telephone CODEC (AD1843). The CODEC has a bidirectional 64-bit data bus that communicates with the DMA controller. (Channel 4 → DAC1, Channel 5 → DAC2, Channel 6 → ADC left side, Channel 7 → ADC right side)
* Video capture CODEC (KS0122). This CODEC has a bidirectional 64-bit data bus and can initialize M → IO and IO → M requests to the DMA (channel 2).
* Video backend CODEC (KS0119). This CODEC receives data directly from the memory controller (channel 0).
[0102]
ASIC interface block
7.2Direct memory access (DMA) controller
The DMA controller has registers that are used for address generation and interpretation. This DMA controller has 8 independent channels. Each channel has a current address register and a stop address register. The start and stop address registers are programmed earlier through the configuration block. The current address register is loaded every time a DMA request is generated from any one of the 8 CODECs. Once the FBUS grants access, this DMA address is incremented every cycle until the current address is matched with the stop address register. At that time, the DMA controller generates a signal “EOP (End Of Process)”. This signal triggers an interrupt in the process. All eight DMA channels have a common arbitration unit that controls the multiplexer and address compare block.
The DMA controller supports access between the IO memory, the memory and the IO, and the memory and the memory. Each time the CODEC attempts to communicate with the DMA, the CODEC displays a DMA_REQ signal and waits for a DMA acknowledgment signal “DACK” from the DMA. Once recognized, the CODEC drives M-IO signals and data. The DMA controller selects the appropriate channel according to the approved DACK. See block diagram.
[0103]
7.3DMA register description
7.3.1Current address register
Each channel has a 29-bit current address register (bits <31: 3>) that requires all addresses to be arranged in 8 bytes. In effect, this register is a 29-bit counter. This register is read by ARM7 and the initial value is loaded from ARM7 through FBUS. This address is incremented based on the data transmission size. The address currently in the address register is transmitted to the address generation block to load the address on the FBUS through the multiplexer. The current address register holds the address value in the idle state.
7.3.2Stop address register
Each channel has a 29-bit stop address register (bits <31: 3>) that requires all addresses to be arranged in 8 bytes. This register is filled in by ARM7 through FBUS. This value is used in the comparison block to be compared with the current address. If the current address matches the stop address, the DMA controller generates an “EOP” signal for each channel.
7.3.3Status register
This register stores information indicating whether each channel only reaches the stop address. Bits <7: 0> defines which channel has reached the stop address and is reset when ARM7 initializes the current address register through the CCU.
This register is read by ARM7, which cannot fill this register.
7.3.4Control register
This register stores information on the operation of the DMA controller. Bits <7: 0> defines which DMA channels are enabled for operation. This bit is reset every time the channel reaches the stop address, and ARM7 sets this bit to restart operation. When the arbitrary channel enable bit is “0”, the DMA does not send DMA_ACK to the CODEC even if the CODEC sends DMA_REQ to the DMA. Bits <19:16> defines whether any pair of DMA channels are linked together to operate as a double-buffer. For example, if channel 0 and channel 1 are connected as a double-buffer, if the current address of channel 0 reaches the stop address, the DMA controller automatically switches channel 1 and the current address of channel 1 is the stop address. The DMA controller automatically switches to channel 0. Bit <28:21> stores information related to the read / write mode of each channel. If any bit in this bit is set to “1” by ARM7, the corresponding channel is used for the read operation and the remaining channels are used for the write operation. Bit <31> defines whether the DMA has sent an EOP signal to the interrupt controller. If this bit is "0", the DMA will not send an EOP even if any channel reaches the stop address.
[0104]
7.3.5Mask register
Each bit in the control register is linked to a mask bit in the mask register. If the mask bit is “0”, the corresponding bit in the control register is prevented from being updated. Initially, this register <31: 0> is set to FFFFFFFF (hex).
7.3.6programming
Start and stop addresses are programmed by ARM7 through FBUS.
The FBUS mapping values are as follows.
CCU → 0040_0000-007F_FFFF,
MCU → 0080_0000-047F_FFFF,
PCI → 0800_0000-FFFF_FFFF.
In address programming, Address [26: 0] is set based on a table.
[0105]
DMA register address map
[Table 32]

[0106]
[Table 33]
Status register encoding

[0107]
Control register encoding
[Table 34]

[0108]
7.4CODEC initialization
The customer ASIC unit supports the initialization of each CODEC. In effect, ARM7 is responsible for CODEC initialization through the customer ASIC unit. The customer ASIC unit has an address decoder for generating a request signal for each CODEC. Each time the customer ASIC unit attempts to communicate with any CODEC, it sends a request signal to the CODEC and waits for a recognition signal from the CODEC. After receiving the acknowledge signal, the customer ASIC unit sends the data and address to the CODEC.
When ARM7 attempts to read configuration data in any CODEC through the CCU, the customer ASIC unit sends the address to the CODEC. When the customer ASIC unit receives data from the CODEC, it returns transaction IX to the CCU. At this point, configuration data is transmitted to the ARM 7 through the CCU.
[0109]
[Table 35]
CODEC configuration register FBUS address map

[0110]
FIG. 53 shows a customer ASIC network.
4).I / O pin definition
[0111]
I / O pin definition for customer ASIC unit
[Table 36]

[0112]
Chapter 8 AD1843 CODEC Interface
8.1
This chapter describes the AD1843 CODEC interface.
8.2Overview
The AD1843 CODEC interface block is for interfacing between AD1843 serial burst MSP DMA modules. The AD1843 transmits and receives data and control / status information through the serial port. The AD1843 has four pins responsible for the serial interface: SDI, SDO, SCLK, and SDFS. The SDI pin is for serial data input to the AD1843, and the SDO pin is for serial data output from the AD1843. The SCLK pin is for the serial interface clock.
In AD1843 internal and external communications, data bits are transmitted after the rising edge of SCLK and require that they be sampled on the falling edge of SCLK. The SDFS pin is for serial interface frame synchronization. The AD1843 CODEC interface is based on the master mode, which means that the SCLK and SDFS signals are generated by the AD1843. The default SCLK frequency is 12.288 MHz and one frame cycle is 48 KHz.
The basic structure of the CODEC interface is based on DMA. The AD1843 interface specifies four different DMA channels: channel 4 also for DAC1, channel 5 for DAC2, channel 6 for ADC left side, channel 7 for ADC right side. The channel transmission size from DMA to DMA is 64 bits at a time. Thus, DMA channel 4 and channel 5 carry two different 32-bit data (16 bits for the left side and 16 bits for the right side). On the other hand, the DMA channel 6 and the channel 7 send four different 16-bit data at a time from the CODEC interface to the SDRAM.
The DAC1 and DAC2 interfaces recognize that the data is valid when the flag bit of each channel is set. The DAC1 and DAC2 interfaces request DMA after checking flag bits. When the flag bit is reset, the DAC1 and DAC2 interfaces do not generate a DMA request. The actual operation of the flag bit is controlled by the DMA clock. The DMA block does not generate a DMA acknowledge signal when the flag bit is reset. If the ADC left and right FIFOs are not full, no DMA request is generated. The software must check the ADC flag register and read the remaining data through the data bus. After reading these data through the data bus, the FIFO becomes free, and a DMA request is generated when the FIFO is full.
The AD1843 control register is read and filled by transmitting a read / fill request with the control register address of the control word input. When reading is requested, the contents of the addressed control register are transmitted during the next frame, and when entry is requested, the data to be filled must be transmitted to AD1843 slot 1. To improve MSP performance, the programmer must check the control flag register before reading or writing the CODEC control register. When the flag bit of the control flag register is set, reading and writing operations of the CODEC register are possible.
[0113]
8.3DMA channel designation
DMA channel 4DAC1 left side, right side
DMA channel 5 DAC2 left side, right side
DMA channel 6 ADC left side
DMA channel 7 ADC right side
8.4Data format for DMA
The data size is 64 bits and is configured as follows.
[0114]

[0115]
8.5Basic address
04C0_4000 DAC1 BASE
04C0_5000 DAC2 BASE
04C0_6000 ADCL BASE (left channel)
04C0_7000 ADCR BASE (Right channel)
8.6Register map
[0116]
[Table 37]

[0117]
8.7Register definition
8.7.1Control register entry data input

The most significant bit (MSB) is the first data input bit transmitted.
8.7.2Control word input

r / w Read / fill request. Reading from or filling in the control register is generated every frame. Setting “1” indicates reading of the control register, but resetting this bit to “0” indicates entry of the control register.
ia4: 0 Control address register for reading or writing
8.7.3Control register data output

Control register contents previously addressed in the frame
8.7.4ADC flag register

r4v-r1v Valid ADC right data is in the buffer. Indicates which data in the buffer is valid.
14v-11v Valid ADC left data is in the buffer. Indicates which data in the buffer is valid.
8.7.5The first data on the left side of the ADC

The first data on the left side of the ADC in the buffer
8.7.6Second data on the left side of the ADC

Second data on the left side of the ADC in the buffer
8.7.7Third data on the left side of the ADC

Third data on the left side of the ADC in the buffer
8.7.84th data on the left side of ADC

4th data on the left side of the ADC in the buffer
8.7.9Control flag register

wf1 Control register entry flag. Once set, the CODEC prepares to receive control register data.
rf1 Control register read flag. Once set, the CODEC prepares to transmit control register data.
[0118]
Chapter 9 Video Codec
9.1Overview
The video codec logic interfaces to the KS0119 and KS0122 chips on the evaluation board and to the DMA module on the MSP chip. KS0119 CODEC also provides a screen refresh operation. For this operation, a direct data path to the MCU module is implemented as shown in FIG.
9.2Upper module definition
The upper module has three submodules as shown in FIG.
-KS0119 screen refresh module
-KS0122 video data capture module
3-wire serial host interface and module to access the KS0119 and KS0122 chip configuration registers.
9.3DMA channel designation
DMA CH0 KS0119 CODEC
DMA CH1 reservation
DMA CH2 KS0122CODEC
DMA CH3 reservation
DMA CH4 AD1843 audio CODEC
DMA CH5 AD1843 audio CODEC
DMA CH6 AD1843 audio CODEC
DMA CH7 AD1843 audio CODEC
DMA CH8 reservation
DMA CH9 reservation
9.4.3-wire host interface module
This module interfaces to the KS0119 and KS0122 chips where all registers inside the chip are accessed through a serial interface. The 3-wire serial interface module supports the functions of the communication protocol on these chips and includes registers for the KS0119 and KS0122 interface logic. See FIG.
9.5EPROM interface
The KS0119 IO pin is used to load program data immediately after the system is reset and is used as an interface to an external EPROM as part of the MSP-1EX boot initialization. See pin specification for more details.
EPROM is a memory mapped with addresses from C0000H to DF FFFH.
9.6KS0119 register description
KS0119 has the same base address CODEC_REQ0 as 04B0 0000, which is extended to 04BF FFFF.
9.6.1KS0119 register address map
KS0119 register address map
[0119]
[Table 38]

[0120]
9.6.2Frame size register
As shown in FIG. 57, this register controls the frame size transmitted to the CODEC chip, and the minimum frame length is 3 bytes.
9.6.3Chip ID register
This register stores the CODEC chip ID value, but stores 03H for KS0119 entry and 83H for KS0119 reading.
9.6.4Control / data register
This register tells the CODEC chip KS0119 the fact that the next transmitted byte is the register index or data byte. For KS0119, 08H indicates that the next byte is an index, and 09H indicates that the next byte is data.
9.6.5Index / data 0 register
This register stores the index value for the CODEC chip configuration register or data 0 bytes depending on the value transmitted in its previous byte. Refer to the communication protocol in the programming reference section.

9.6.6Data

1 register
This register stores data to be entered in the CODEC register Index + 1.

9.6.7Data

2 register
This register stores data to be written to the CODEC register Index + 2.

9.6.8Data

3 register
This register stores data to be entered in the CODEC register Index + 3.
9.6.9KS0119 logic control register
Bit designation for the KS0119 control register is as shown in FIG.
9.6.10HS and VS polarity
This register defines the polarity of the horizontal sync and vertical sync signals. A value of 0 is defined as active blow, while a value of 1 is defined as active high. The bit specification is as follows.
Bit <0>: VS polarity
Bit <1>: HS polarity
9.6.11. HS offset
The active signal is generated after this offset value, and this offset value is defined as 00H.
9.6.12.VS offset
The active signal is generated after this offset value, and this offset value is defined as 00H.
9.6.13Status registerIs as shown in FIG.
9.6.14Read data serial interface register
This register stores valid data from the serial port after the read flag represents a transition from a busy state to a ready state.
9.6.15Read PROM data register
This register stores valid data when the PROM flag is in the ready state.
9.6.16Refer to programming
9.6.16.1Configuration and initialization
Video display hardware is made to operate in two modes: VGA overlay mode and VGA emulation mode. This mode operation is controlled by setting a bit in the logic control register.
MSSEL: 0 for VGA overlay mode,
1 for VGA emulation mode.
In the VGA overlay mode, the presence of a VGA card is required on the PC system.
-The monitor cable is connected to the MSP card.
-Supported VGA resolution is up to 800x600.
The display buffer is required to be the same size as the VGA setting.
[0121]
In order to set a video window that fills the color key square area in the VGA frame buffer by software, the video data must be written in the square area of the same size and position as the square area in the VGA frame buffer in MSP SDRAM. . See FIG.
The KS0119 chip recognizes the color key and switches the VGA input port to the video input port. The DMA channel 0 start address is set by software at the upper left side of the SDRAM video output buffer, and the length of the DMA record is the bits per pixel used in the resolution and video data set in the VGA card (4: 2: 2 = 16 bits per pixel).

9.6.16.2Serial protocol

3 for KS0119 Interface
When setting the configuration registers in the KS0119 chip, the protocol is as follows.
-A minimum of two frames are required to be transmitted to the peripheral chip.
The first frame is for setting the index of the configuration register.
-The second frame is for reading or writing data (register contents).
The frame size register is set with an appropriate length by software, and the serial access bit is set to 1. Then, before changing the frame size register, all bytes required for the frame are loaded by software, and the CODEC interface logic waits until all bytes are loaded before frame serialization begins.
The first transmitted frame is for setting an index and has a frame size of 3. See FIG.
[0122]
The second frame is for setting a register, and the frame size is 3.
After each data byte, the chip automatically increments the index by one, which allows a continuous register to be set by transmitting multiple bytes of data to the CODEC interface logic that supports up to four data bytes. To.
If a read or write operation is performed, the software checks the status register read and write flags for valid data during the read operation, or fill flag = ready before transmitting the next frame To check.
The following example shows the stage of setting the KS0119 data sheet.
Since the two registers have consecutive indexes, these two bytes can be loaded into a single frame. First, the index must be set as follows:
Load frame size register (Address = 04B0_0000H) with -83H value (frame size = 3, serial access bit setting)
Load ID register with -03 value (Address = 04B0_0001H)
-Load data / control byte: 08H value that informs KS0119 that the next byte is an index (Address = 04B0_0002H)
A load index register having a value of −6H (Address = 04B0_0003H)
The serial interface detects whether or not the contents in the frame size register match, and starts frame transmission, and the entry flag in the status register is set to a busy state. Before transmitting the next frame, software checks the flag in the status register. If the flag is ready, the software can load the value for the next frame.
9.7KS0122 register description
KS0122 has a base address corresponding to 04C0 2000, which is extended to 0420 2FFF.
9.7.1KS0122 register address map
[0123]
[Table 39]

[0124]
9.7.2Frame size register
As defined in FIG. 62, this register controls the frame size transmitted to the CODEC chip, and the minimum frame length is 3 bytes.
9.7.3Chip ID register
This register stores the CODEC chip ID value, but stores 04H for KS0122 entry and 84H for reading KS0122.
9.7.4Control / data register
This register tells the CODEC chip KS0122 that the next transmitted byte is the register index or data byte. For KS0122, 00H indicates that the next byte is an index, and 01H indicates that the next byte is data.
9.7.5Index / data 0 register
This register stores the index value for the CODEC chip configuration register or data 0 bytes depending on the value transmitted in its previous byte. Refer to the communication protocol in the programming reference section.

9.7.6Data

9.7.7Data

9.7.8Data

3 register
This register stores data to be entered in the CODEC register Index + 3.
[0125]
9.7.9KS0122 logic control register
Bit designation for the KS0122 control register is as follows.
bits <1: 0>
00 4: 2: 2 format
01 4: 1: 1 format
10 CCIR656 format
9.7.10Status register
Bit <0>: Field status
0: Even field
1: Odd field
Bit <1>: VS status
VS from 0: 1 to 0
VS from 1 to 1
9.7.11Read data serial interface register
This register stores valid data from the serial port after the read flag represents a transition from a busy state to a ready state.
9.7.12Serial protocol 3-wire interface to KS0122
When setting the configuration register in the KS0122 chip, the protocol is as follows.
-A minimum of two frames are required to be transmitted to the peripheral chip.
The first frame is for setting the index of the configuration register.
-The second frame is for reading or writing data (register contents).
[0126]
The frame size register is set to an appropriate length by software, and the serial access bit is set to 1. Then, before changing the frame size register, all bytes required for the frame are loaded by software, and the CODEC interface logic waits until all bytes are loaded before frame serialization begins.
The first transmitted frame is for setting an index, and the frame size is 3. See FIG.
The second frame is for setting registers, and the frame size is 3.
After each data byte, the chip automatically increments the index by one, which allows continuous registers to be set by transmitting multiple bytes of data to the CODEC interface logic that supports up to four data bytes. To.
If a read or write operation is performed, the software checks the status register read and write flags for valid data during the read operation, or fill flag = ready before transmitting the next frame To check.
The following example shows the stage of setting the KS0122 data sheet.
To set the values for chroma key byte 0 and byte 1, the index for this register is 6AH for byte 0 and 6BH for byte 1. See the KS0122 data sheet.
Since the two registers have consecutive indexes, these two bytes can be loaded into a single frame. First, the index must be set as follows:
[0127]
Load frame size register (Address = 04B0_0000H) with -83H value (frame size = 3, serial access bit setting)
Load ID register with -03 value (Address = 04B0_0001H)
-Load data / control byte: 08H value (Address = 04B0_0002H) informing KS0122 that the next byte is an index.
A load index register having a value of −6H (Address = 04B0_0003H)
The serial interface detects whether or not the contents in the frame size register match and starts frame transmission, and the entry flag in the status register is set to busy. Before transmitting the next frame, software checks the flag in the status register. If the flag is ready, the software can load the value for the next frame.
[0128]
Chapter 10 Bitstream Processor
10.1
This chapter describes the functional requirements for designing one bitstream processor (BP) in the main MSP processing engine for video data compression and decompression applications.
10.2Abbreviation
A / V audio and video
BP Bitstream processor (MSP block)
CCU cache control unit (MSP block)
Common intermediate format with luminance sample resolution of 352 x 288 at CIF 29.97 Hz
DCT discrete cosine transform
DMA direct memory access
DSM digital storage media
FBUS Fast bus (MSP internal data bus)
GOB block group
GSTN general switched telephone network (PSTN already known)
HDD hard disk driver
I / F interface
IOBUS I / O bus (MSP internal peripheral bus)
ITU-T-601 Table reference for digital coding of color television signals (formerly also referred to as CCIR601) with respective sample resolutions of 720x480 at 29.97 Hz and 720x576 at 25 Hz. However, the display resolution may be 720 × 480 or 704 × 480.
LSB least bit
LUT lookup table
MPEG motion video expert group
MSB most significant bit
MSP Samsung Multimedia Signal Processor
Quarter_CIF with a luminance of 176 × 144 at QCIF 29.97 Hz
RLC RUN_length and level code
SDRAM synchronous dynamic random access memory
Information input and format for MPEG-1 video table standards with luminance resolution of 352x240 at 29.97Hz for SIF NTSC and 352x288 at 25Hz for PAL
TSD defined
VLC variable length code
VP vector processor (MSP block)
10.3Main features
MPEG-1, MPEG-2, H.264. 261 and H.H. Supports 263 encoding and decoding applications and a syntax to interpret in slice (or GOB) layers.
* Perform RLC processing in real time
* MPEG-1, MPEG-2, H.264 261 and H.H. Perform Huffman code processing in real time using all Huffman tables in the H.263 video standard.
* Supports two forward / reverse zigzag scan conversion methods.
* IOBUS that interfaces at a maximum transmission rate of 731.4 Mbits / sec (32-bit @ 40 MHz)
* Maximum operating clock frequency is 40 MHz
* Huffman codec look-up including 9.2Kbit ROM for lookup table.
* Includes 320-byte internal SDRAM.
* Support pre-emptive and cooperative context switching modes
* Target gate calculation for control path is 6Kgates + RAM and ROM
[0129]
10.4Overview
The bitstream processor (BP) is one of the four MSP internal peripherals. This is a hardware organizational block to support several bitstreams in video compression and decompression states.
Such a device was specifically designed for bit_level processing because the VP and ARM7 inside the MSP do not have an efficient architecture for such bit manipulation.
Such a BP transmits and receives data through a 32-bit bus called IOBUS having a maximum transmission rate of 731.4 Mbits / sec.
The BP operates as an independent processing device and is controlled by ARM7 or VP software.
Moreover, in particular, the BP encodes and decodes all information contained in the slice or GOB and below, and sends and receives data to / from the CCU. The BP also performs forward and reverse zigzag transformations to encode and decode differential DC coefficients.
Further, such a BP uses differential motion vectors for decoding to recover the motion vector, and has two special modes: dual_prime mode with MPEG-2 encoding; Except for the prediction mode in H.263 encoding and decoding, the opposite operation is performed in encoding.
If it is assumed that the BP operates in a simple mode, the BP starts processing a slice or GOB, and the BP is interrupted after the slice or GOB processing is completed. Such an operation is performed by the full-duplex mode encoding and decoding the slice or GOB by interleaving.
If ARM7 wishes to momentarily switch the BP to another task, the BP will support a pre-existing context switching mode that completes the BP process before the current slice or GOB is completed.
[0130]
FIG. 3 shows a block diagram of the BP.
As shown in FIG. 3, the BP includes five block IOBUS interface devices, a VLC FIFO device, a VLC LUT ROM, a control state machine, and a BP core device. Input / output data is operated by an IOBUS interface device that includes a 16 × 32 bit ram. This supports all data movement and interrupt requests. The VLC FIFO device prepares the next data word for a data decoding operation and performs output data packing for the data encoding operation.
The VLC look-up table ROM has a size of 768 × 12 bits that stores all necessary information for processing of all Huffman codes. When designing a control state machine, it controls all encoding and decoding. The BP core device is a small processor that includes an adder, a comparator, a barrel shifter, a register file, and a 128 × 16 bit RAM. Bit manipulation is useful for the core described above.
[0131]
10.5Signal definition
Table 45 shows signals required for the BP external interface. The signal at the end of the character “1” indicates active_low.
In the “Direction” column of Table 1, “B”, “I”, and “O” are bidirectional signals, meaning an input signal and an output signal, respectively.
[0132]
BP signal definition
[Table 40]

[0133]
10.6Data flow diagram for encoding / out-coating
Here, for example, typical video encoding and decoding
Includes application data flow. Here is the flow of audio data
Not described in detail.
10.6.1For encoding
Stage E1: Low (RAWA / V data input)
Normal input video and audio signals are sampled, digitized by an external codec, and provided to the user ASIC. However, in a multimedia PC environment, some VGA control boards also include a frame grabber and a sound capture. Therefore, raw A / V data is transmitted from either one of the user ASIC or the PCI bus interface. The customer ASIC or PCI bus contains a small buffer of 32 BYTES. The data in this buffer is transmitted to the external SDRAM through the FBUS using the DAM operation. Such data movement is initialized by the ARM 7 after the power supply is reset.
Stage E2: Pre-filtering by VP
First, the VP fetches image data stored in the SDRAM at the start of the VP data (generally a scratch pad area). The VP then temporarily filters such pixels before scaling the space. After pre-filtering, the video resolution is normally converted from ITU_T_601 size to CIF or QCIF size. This VP records the prefiltered results for the external SDRAM.
[0134]
Stage E3: VP data compression
The VP further fetches SDRAM pre-filtered data into the VP data cache so that compression is performed according to the rules presented in the corresponding standard. The VP normally performs forward DCT / forward adaptive quantization, motion prediction, macroblock type determination, and the like.
After performing such a process, the VP must also record a result with appropriate head information in the VP data cache. In practice, this VP data cache area is used as a BP input buffer. A flag signal is used to check the state of the buffer.
Step E4: BP initialization by ARM7
Before the BP is actually operated, ARM7 must initialize the BP's initial register.
Such initialization is not performed during 128 cycles after the power-on reset signal is applied. In particular, ARM7 must initialize the I / O buffer address and BP instruction register, and must specify the number of macroblocks encoded in the slice or GOB.
After initializing such a register, the ARM 7 must set the BP enable flag to perform the BP process.
Step E5: Bitstream process by BP
If any one of the two input buffers is full, the BP starts reading data through the IOE JS. That is, the BP can read data only when the buffer is full. The BP converts 8 × 8 block data in a zigzag format. The result is directly RLC and Huffman encoded.
Such Huffman encoded results can be transmitted to either one of the ARM7 data cache or SDRAM. The BP should only write to the output buffer if the buffer is free so that it does not overflow. In the last example of this process, if the number of processed macroblocks is the same as the number of macroblocks specified by ARM7, BP will interrupt ARM7 at the byte and position of the last data, and the current slice Alternatively, the GOB process is terminated.
[0135]
Stage E6: Bitstream formation and A / V multiplexing by ARM7
ARM7 combines the Huffman encoded data and syntax parameters to create the final bitstream and repeats the process.
And ARM7 can also operate with the upper layers of slice or GOB multiplexed audio and video bitstreams. This result is written to SDRAM by ARM7.
Step E7: VP network interface (video conferencing selection) For video telephony or video image conferencing applications, the VP is H. V.324 for 324GSTN video phone. 34 modem or H.264 Network interfaces such as the 1400 series interface for 320ISDN video conferencing terminals have performed their functions.
Stage E8: Last bitstream output
The last bit stream stored in the SDRAM is transmitted to either one of the customer ASIC or PCI. Normally, the user ASIC block is used for the network interface, and the PCI bus interface is used for recording device (eg, HDD) data storage.
When this data moves, the DMA data transmission initialized by ARM7 is used.
[0136]
10.6.2For decoding
Stage D1: Bitstream fetch
The bit stream compressed in the multimedia environment is supplied from any one of the CD-ROM driver, HDD, and network interface.
Therefore, this bit stream is either one of the customer ASIC or the PCI bus. Data stored in 32 bytes of the customer ASIC or PCI bus is transmitted to the SDRAM using DMA.
Stage D2: Network interface by VP (selection for video conference) In video conference, data is first transferred to V. A 34 or 1400 series network interface routine is performed. The VP writes the result for the SDRAM.
Step D3: A / V demultiplexing and header analysis by ARM7 ARM7 moves the data in the SDRAM to the ARM7 data cache and performs A / V bitstream demultiplexing. For the video bitstream, ARM7 also retrieves all start codes and analyzes the header until a slice GOB is detected. ARM7 stores the decoded bitstream syntax parameters in a special area of SDRM by ARM7. The demultiplexed audio and video bitstreams are each transmitted to a rate buffer in the SDRAM. The rate buffer size may be different for each operation. For example, for video speed buffer size, MPEG-1 recommends 370 Kbits and MPEG-2 recommends 1.835 Mbits.
[0137]
Step D4: BP initialization by ARM7
Performing this stage is similar to stage E4 of the previous subsection, except that it does not require register initialization for the number of encoded macroblocks. That is, the initialization must not be performed during 128 cycles after the power-on reset signal is applied.
Step D5: BP bitstream process
After initializing the BP for a special slice or GOB, the recovered data is transmitted to two buffers.
The BP reads data through IOBUS which checks the status of the full flag. The BP analyzes the syntax parameters if the input data contains a headword.
If the next bit followed by BP is recognized by the Huffman code, the Huffman decoding is performed within the top 4 cycles for each Huffman code. If the Huffman decoding is a DCT AC coefficient, the result of the Huffman decoding is a decoded RLC representing a 64 pixel component.
The reconstructed pixels, on the other hand, are converted to zigzag and finally transmitted to the two output buffers so that the VP performs forward quantization. The BP continues to perform this process after detecting an initial code that is not a slice or GOB. If this is not detected, the BP will interrupt the ARM7 with byte and bit position information for the last used data. Then, ARM7 searches for the next slice or GOB start code and repeats such a process.
[0138]
Step D6: VP data restoration
Using the result of step D5, the VP performs image reproduction using inverse quantization, inverse DCT, and motion vectors. After completing the encoding process, the VP stores the result in SDRAM.
Step D7: Subsequent process of VP
Before the video and audio data is transmitted to the digital / analog converter, the pixels perform the above process so that the VP obtains the preferred output resolution and image.
Such results are also stored in SDRAM.
Stage D8: Raw (RAW) A / V data output
Finally, the reproduced audio and video data in the SDRAM are output using DMA. Further, such data movement is initialized by ARM7. Current video overlay technology allows the PCI bus to transmit data to the video source, and finally the data is transmitted to either one of the customer ASIC or the PCI bus.
[0139]
10.7Programming model
10.7.1BP base device address
The BP has the next 32-bit basic device address.
<MSP_BASE> <BP_BASE> <Address_Offset>
Here, <MSP_BASE> is 5 bits defined by the MSP base PCI device address, <BP_BASE> is 7 bits equal to 7'b 1111100, and <Address_Offset> is 20 bits allocated to the BP internal register. is there.
Therefore, the address range assigned to the BP in the entire MSP I / O device address map is from 27'h 7C0_0000 to 27'h 7CF_FFFF.
10.7.2Internal register description
The internal register set is shown in the table, and all registers in the table can be written or read by ARM7 or VP.
[0140]
BP internal register
[Table 41]

[0141]
* BP-MODE [31: 0] (read-only, no default value)-this register defines the video standard type and various image level information, details are given in subsection 10.8.1 Yes.
* BP_CONTROL [31: 0] (Read / Write, default value is “32'h 0000_0000”) — This register contains various control parameters for BP operation. ARM7 or VP sets each flag in this register, and some flag is set by BP. The bit specification is shown in subsection 10.8.2.
* IBUF0_START [31: 0] (read-write, no default value)-This register defines and initializes an initial address by ARM7 to be input buffer 0 of the BP input bidirectional buffer. The initialization value for IBUF0_START is always smaller than IBUF0_END, and IBUF0_START [3: 0] is the same as 4'b0000.
* IBUF0_END [31: 0] (read-only, no default value)-This register defines the last address in input buffer 0 of the BP input bidirectional buffer, the contents of which are described in section 10.11 Yes.
* IBUF1_START [31: 0] (read-write, no default value)-This register initializes the start address of ARM7 so that the input buffer of the BP input double buffer is 1. The initialization value of IBUF1_START is always smaller than IBUF1_END, and IBUF1_START [3: 0] is the same as 4'b0000. This content is described in section 10.11. * IBUF1_END [31: 0] (read-only, no default value)-This register defines the last address so that the input buffer 1 of the BP input double buffer is 1. This content is described in section 10.11.
* OBUF0_START [31: 0] (read-write, no default value)-This register initializes the start address of ARM7 so that the output buffer of the BP output double buffer is zero. The initialization value of OBUF0_START is always smaller than OBUF0_END, and OBUF0_START [3: 0] is the same as 4'b0000. This content is described in section 10.11.
* OBUF0_END [31: 0] (read-only, no default value)-This register defines the last address so that the output buffer of the BP output double buffer is zero. This content is described in section 10.11.
* OBUF1_START [31: 0] (read-write, no default value)-This register initializes the start address of ARM7 so that the output buffer of the BP output double buffer becomes 1 by ARM7. The initialization value of OBUF1_START is always smaller than OBUF1_END, and OBUF1_START [3: 0] is the same as 4'b0000. This content is described in section 10.11.
* OBUF1_END [31: 0] (read-only, no default value)-This register defines the last address so that the output buffer of the BP output double buffer is 1. This content is described in section 10.11.
* SAVE_ADR [31: 0] (read-only, no default value)-This register defines the initial address of the SDRAM to store the BP internal context when the preemptive context switching mode is required. For related material, see subsection 10.12.1.
[0142]
* VALID_BYTE_ADR [31: 0] (read write, no default value)-This register represents the last valid data byte position of the input double buffer for decoding or the output double buffer for encoding. The purpose of this register is for handshaking between ARM7 and BP. In general, additional information is required for the valid bit position of valid byte data, which is contained in the BP_CONTROL [31: 0] register. Details are in section 10.13.
* BP_STATUS [31: 0] (read-write, default value is “32'h 0000_0000”) — This register represents the various internal states of the BP. All the bit positions of the bottom 2 bytes (for example, BP_STATUS [15: 0]) are an interrupt condition that allows ARM7_IRQ to be set to “1”. This register can be accessed in two ways. All 32-bit registers using ARM7 or VP address 27'h7C0_0050 can be read or written. However, generally ARM7 and VP preferably write (or reset) the contents of the BP_STATUS register in bit units. The BP also supports this characteristic content by assigning addresses ranging from 27'h7C0_0030 to 27'h7C0_004F for each bit of BP_STATUS. Such bit content is described in subsection 10.8.3.
* BP_INT_MASK [15: 0] (Read-only, default value is “16hFFFF”)-Each bit of this register corresponds to the interrupt condition by BP_STATUS [15: 0], and BP_STATUS [15: 0] A logical value (and-ed) with the condition before being coded internally. If one mask bit is set to “0”, the corresponding interrupt condition is unconditionally set to “0” (eg, disabled). Detailed content for such interrupts is described in Section 10.9.
* V_MB_SIZE [7: 0] (read-only, no default value)-This register represents the vertical size of the image to be encoded or decoded. Here, this value means the number of macroblocks. For example, if the vertical size is 288 pixels, V_MB_SIZE [7: 0] = 288/16 = 18. ARM7 must always be set before starting BP encoding and decoding operations.
[0143]
* H_MB_SIZE [7: 0] (read-only, no default value)-This register represents the horizontal size of the image to be encoded or decoded. Here, this value means the number of macroblocks. For example, if the vertical size is 352 pixels, H_MB_SIZE [7: 0] = 352/16 = 22. ARM7 must always be set before starting BP encoding and decoding operations.
* ARM7_IRQ [0] (read-only, default value is “0”) — This register is a 1-bit flag for requesting an interrupt to ARM7 and is directly connected to the ARM7_IRQ output port. If any bit of BP_STATUS [15: 0] is set to “1”, this flag is set. Then, ARM7 resets this flag.
10.8BP I / O data word format
This section includes BP input / output instruction word data and macroblock data word format.
10.8.1BP_MODE register format
The 32-bit BP_MODE register at address 27'h7C0_0000 has the following format given in table 25: That is, BP_MODE [31] = PARAM_SET2 [7] and BP_MODE [0] = SF [0] are represented.
[0144]
[Table 42]
BP_MODE register format

[0145]
* Standard_format [SF] —The video standard used is defined in Table 26. The SF must always be defined by ARM7 before BP is enabled for all video encoding and decoding applications.
[0146]
[Table 43]
SF definition

[0147]
* Picture_type (PT) —Video coding type is defined in Table 27.
PT value 00 is MPEG-1, MPEG-2 and H.264. This is a special case for H.263 applications. In particular, D_video is assigned to a video type for MPEG-2 even though it is not used for MPEG-2. The reason is that the MPEG-1 bitstream is from a subset of the MPEG-2 bitstream.
[0148]
[Table 44]
Definition of PT

[0149]
* Picture_structure (PS) —Video structure information is defined in Table 43. Furthermore, since the PS value 00 is illogical, an error is caused.
[0150]
[Table 45]
Definition of PS

[0151]
* Parameter_set0, 1 and 2 (PARAM_SET0, PARAM_SET1, PARAM_SET2)-these 3 bytes are MPEG-1, MPEG-2 and H.264 It is defined by various parameters used in H.263. The definition for each parameter set is described in the table.
[0152]
[Table 46]
Definition of PARAM_SET0

[0153]
* Intra_dc_precise (IDP) —2-bit intradc defined in MPEG-2 The precision parameter must be set to 00 for MPEG-1 applications.
* Top_field_first (TFF)-MPEG-2 flag used for motion vector encoding and decoding.
* Frame_pred_frame_dct (FPFD) -MPEG-2 flag indicates that frame_DCT and frame prediction are used.
* Cancellation_motion_vectors (CMV) or advanced_prediction_mode (AP) -in MPEG-2, this flag indicates that a motion vector is used in a macroblock between videos. H. At 263, this flag is set to 1 if the improved prediction mode is ON. Otherwise set to 0.
This flag must be set to 0 for the next standard.
* Intra_vlc_format (IVF) -MPEG-2 flag determines the VLC table format for macroblocks between videos.
* The alternate-scan (AS) -MPEG-2 flag determines the order of coefficients to be encoded and decoded.
* Vertical_size_flag (VSF) or continuous_presence_multipoint (CPM)-MPEG-1 and MPEG-2, set to 1 if the vertical size of this flag video exceeds 2800 lines, otherwise set to 0 I must. H. At 263, this flag is set to 1 if the current multipoint mode is used continuously, otherwise it is set to 0.
[0154]
[Table 47]
Definition of PARAM_SET1 and PARAM_SET2

[0155]
10.8.2BP_CONTROL register format
Table 47 shows the bit specifications for the BP_CONTROL [31: 0] register (address 27'h7C0_0004).
[0156]
BP_CONTROL register format
[Table 48]

[0157]
* BP_enable (BP_EN)-If this flag is set to 1 by ARM7 or VP, the BP performs processing. Thus, all other register structures are completed before this flag is set. If the BP finishes processing, this flag is cleared by the BP.
* Software_reset (SOFT_RESET) —When the flag is set by ARM7 or VP, the BP interrupts the current process, returns to all initial registers in the default state, and enters the idle state. The ARAM 7 can set the BP_EN flag to further start the BP process. The BP hardware reset signal is active blow.
* Pause (PAUSE) —When the flag is set to 1 by ARM7 or VP, the BP aborts the current processing operation. The user sets the BP_EN flag to execute the cancel operation.
* Detect_start_code (DETECT_START_CODE)-When the flag is set to 1 by ARM7 or VP, the BP looks for the next start code in the data in IBUF0. Therefore, the user must set the preferred address for IBUF0_START and IBUF0_END. Such a command word works properly if the BP is idle. Therefore, ARM7 should preferentially send a software reset command to the BP before sending this command to the outside if the BP is not idle.
* Step (STEP)-If this flag is set to 1 by ARM7 or VP, the BP performs one state of the current operation process. This is a very necessary feature for debugging. ARM7 should send a stop command preferentially so that this stage operation is enabled.
Context_switching_mode (CTX_MODE) —When the flag is set to “1” and CTX_SWITCH is set to “1” by ARM7 or VP, the BP performs the preemptive switching mode. If this is set to “0” by setting CTX_SWITCH to “1”, the BP performs a cooperative context switching mode. Setting CTX_MODE without setting CTX_SWITCH to “1” does not affect the BP process. See section 10.12 for details of context switching.
* Context_reload_request (CTX_RELOAD) —When the flag is set to “1” by ARM7 or VP, the BP further loads the context already stored in the SDRAM. Then, the BP reads the stored context from the address SAVE_ADR [31: 0]. See section 10.12 for details on context switching.
* Error_handle_mode (ERR_HANDLE_MODE)-This flag is used to perform a BP error recovery process when an error occurs in the transmitted compressed bitstream.
If the input bitstream is invalid data, the BP interrupts ARM7 and checks the contents of this flag. When this flag is set to “1”, the BP automatically looks for the next start code. If the start code is a slice or GOB, the BP further performs this process. When this flag is set to “0”, the BP does not look for the next start code and operates in the idle state. Handshaking between BP and ARM7 is described in Section 10.13.
* Number_of_macroblocks_to_be_encoded (NO_MBS [15: 0])-This register contains 16 bits representing the number of macroblocks encoded in a slice or GOB. Using such a bit resolution up to 65535, the macroblock is encoded in either a thras or GOB. Here, the value “0” is not allowed as the number of macroblocks.
10.8.3BP_STATUS register format
BP_STATUS [31: 0] (address 27'h 7C0_0050) is shown in Table 54.
[0158]
BP_STATUS register format
[Table 49]

[0159]
* Input_buffer_0_done (IBUF0_DONE) —This flag uses all the data in the input buffer 0 by the BP. This flag is set by BP and cleared by ARM7 or VP. Such a flag represents an interrupt state.
* Input_buffer_1_done (IBUF1_DONE)-This flag indicates that all the data in the input buffer 1 has been used by the BP. This flag is set by BP and cleared by ARM7 or VP. Such a flag represents an interrupt state.
* Output_buffer_0_full (OBUF0_FULL)-This flag indicates that the output buffer 0 is filled by BP. The flag is set by BP and cleared by ARM7 or VP. Such a flag represents an interrupt state.
* Output_buffer_1_full (OBUF1_FULL) —This flag indicates that the output buffer 1 is filled by the BP. The flag is set by BP and cleared by ARM7 or VP. Such a flag represents an interrupt state.
* BP_processing_done (BP_DONE)-This flag indicates that when the BP encodes or decodes a slice or GOB, it detects a start mode that is not a slice or GOB. This flag is set by BP and cleared by ARM7 or VP. Such a flag represents an interrupt state.
* Context_switching_done (CTX_SW_DONE)-This flag indicates that the BP is prepared to switch from context switching mode to other work. This flag is set by BP and cleared by ARM7 or VP. Such a flag represents an interrupt state.
* Context_reload_done (CTX_RELOAD_DONE) —This flag indicates that the reload operation has been completed because of the context in which the BP was stored from the address SAVE_ADR [31: 0]. This flag is set by BP and cleared by ARM7 or VP. Such a flag represents an interrupt state.
[0160]
* BP_error_flg (BP_ERR)-This flag indicates that an error is generated at the BP while processing data. This flag is set when BP_ERR_CODE [7: 0] (BP_STATUS [31:24] is not zero. The detailed contents are described in the subset 10.9.2.
* Input_buffer_0_full (IBUF0_FULL)-This flag indicates that the data in input buffer 0 is filled by ARM7 or VP. This flag is set by ARM7 or VP and cleared by BP.
* Input_buffer_1_full (IBUF1_FULL)-This flag indicates that the data in the input buffer 1 is filled by ARM7 or VP. This flag is set by ARM7 or VP and cleared by BP.
* Output_buffer_0_done (OBUF0_DONE) —This flag indicates that all the data in the input buffer 0 has been used by ARM7 or VP. This flag is set by ARM7 or VP and cleared by BP.
* Output_buffer_1_done (OBUF1_DONE) —This flag indicates that all the data in the input buffer 1 has been used by ARM7 or VP. This flag is set by ARM7 or VP and cleared by BP.
* Valid_bit_position (VALID_BIT_POS [2: 0]) — For the next step, the 3-bit information stored in VALID_BYTE_ADR [31: 0] represents a valid bit position in the data byte. In video encoding, BP should set a value and ARM7 must perform the following process from this bit position. In video decoding, ARM7 should set a value and BP must perform processing from such bit positions.
* BP_error_code (BP_ERR_CODE [7: 0])-8 bit information represents what kind of error has occurred from the BP. A zero value indicates that no error occurred. Details are described in subsection 10.9.2.
[0161]
10.8.4Input data format for decoding and output data format for encoding
In such a case, the data consists of a substantially compressed bit stream. Such data must include the initial code, header parameters, and data compressed by the corresponding standard. Such bitstreams are packeted byte by byte, but some operations do not require byte allocation. Such a bitstream includes data for various slices or GOBs.
10.8.5Input data format for encoding and output data format for decoding
In such a case, the data substantially consists of macroblock header information, motion data, and pixel coefficient data. This type of data format is defined as follows.
10.8.5.1Macroblock header word
The macroblock header is always composed of 6 bytes and has the following data format given in the table 33.
[0162]
Macroblock headword format
[Table 50]

[0163]
Here, the parameters shown in the table are defined as follows.
* Vertical_macroblock_address (VMA) or group_number (GRNO)-such a byte represents the position of a vertical macroblock having a value between 1 and 255. The first vertical position is described as 1 instead of 0. In exceptional cases, H.C. At the time of H.261 encoding, such a field represents group_number information indicating the position of the block group.
* Horizontal_macroblock_address (HMA) or macroblock_position (MBPS)-such a field represents the position of a horizontal macroblock having a value between 1 and 255. The first horizontal position is described as 1 which is not 0. In exceptional cases, H.C. During 261 encoding, such a field represents any one of the 33 possible positions of the macroblock in the GOB.
* Macroblock_intra (I) —set to 1 if the current macroblock is inter-coded and set to 0 otherwise. * Macroblock_pattern (P) —Set to 1 if the current macroblock contains an encoded block, otherwise it is set to 0.
* Macroblock_quant (Q)-set to 1 if the current macroblock has a new quantum scale parameter, otherwise it is set to 0.
* Macroblock_motion_forward (MF) —Set to 1 if the current macroblock is forward predicted, otherwise set to 0.
* Macroblock_motion_backward (MB)-if the current macroblock is backward predicted or H.264 If B-blocks is included in H.263, it is set to 1, otherwise it is set to 0.
* Dct_type (DT), loop_filter (LF) or advanced_prediction (M4) —Bit 2 of byte 2 has a different meaning in each operation. This is not used in MPEG-1. In MPEG-2, it means dct_type. If the macroblock is field DCT coded, this flag is set to 1. If it is frame DCT coded, it must be set to zero. H. At 261, this flag is set if a loop filter is used in the current macroblock. Otherwise set to 0. H. At 263, it is set to 1 if the current macroblock uses the improved prediction mode, and is set to 0 otherwise.
* Motion_type (MT) -2 bit field represents frame_motion_type or field_motion_type used in MPEG-2, as shown in Table 56 and Table 57.
[0164]
[Table 51]
Meaning of frame_motion_type

[0165]
[Table 52]
Meaning of field_motion_type

[0166]
* Quantizer_scale (Q_SCALE)-an integer not displayed in the range 1 to 31 to scale the reproduction level of the DCT coefficient level. All macroblock headers must contain an appropriate value for such a parameter even if its value is identical to its previous macroblock (ie, macroblocks_quant is zero) value. In encoding, the user is responsible for writing appropriate values in such fields. At the time of decoding, the BP must write the Huffman-decoded quantizer scale value in this field. If the current macroblock does not contain a Huffman code in this field, the BP must write the scale value of the previous macroblock.
* Coded_block_pattern_0 (CBP_0) —A 6-bit code represents a block encoded with the current macroblock.
put it here,
CBP — 0 [5] ==> Luminance (Y) 0 block
CBP — 0 [4] ==> Luminance (Y) 1 block
CBP — 0 [3] ==> Luminance (Y) 2 blocks
CBP — 0 [2] ==> Luminance (Y) 3 blocks
CBP_0 [1] ==> Hue Blue (Cb) block
CBP — 0 [0] ==> Hue Red (Cr) block
* Coded_block_pattern_1 (CBP_1) -H. At 263, there is an additional coded_blocks_pattern for the B-blocks of the PB frame. here,
CBP_1 [5] ==> Luminance (Y) 0 block
CBP_1 [4] ==> Luminance (Y) 1 block
CBP_1 [3] ==> Luminance (Y) 2 blocks
CBP_1 [2] ==> Luminance (Y) 3 blocks
CBP_1 [1] ==> Hue Blue (Cb) block
CBP_1 [0] ==> Hue Red (Cr) block
* Logical_channel_indicator (LCI)-2 bit information for the GOB logical channel is just H.264. In H.263, only continuous multipoints are used.
* Frame_id (FID) -H. 2-bit information of GOB frame ID for H.263
* Macroblock_address_indicator (MBA_INC)-represents 2 bytes information to represent the value by which the current macroblock address is incremented. This information is always provided by the BP as additional information and the user does not need to set the input format. Any value specified by the input macroblock header word is ignored by the BP.
* Previous_dc_luminance (PRE_DC_Y)-2 byte information for the dc value of the luminance block in the previous macroblock. If the macroblock is skipped, a reset value is transmitted. This information is always provided by the BP as additional information and the user does not need to set the input format. Any value defined by the input macroblock header word is ignored by the BP.
* Previous_dc_chrominance_blue (PRE_DC_Cb)-2 byte information for the dc value of the blue color block in the previous macroblock. If the macroblock is skipped, a reset value is transmitted. This information is always provided by the BP as additional information, and the user does not need to set the input format. Any value defined by the input macroblock header word is ignored by the BP.
* Previous_dc_chrominance_red (PRE_DC_Cr)-2 byte information for the dc value of the red color block in the previous macroblock. If the macroblock is skipped, a reset value is transmitted. This information is always provided by the BP as additional information and the user does not need to set the input format. Any value defined by the input macroblock header word is ignored by the BP.
[0167]
10.8.5.2Motion data word
Each macroblock header must have an additional header word if the macroblock contains a motion vector, first consider the case of MPEG-1 and MPEG-2. Such a standard will have an additional header word format, illustrated in Table 59 for motion vectors, when any one of the following processes occurs.
Condition 1) When MF = 1 or (I = 1 and CMV = 1),
Condition 2) When MB = 1
[0168]
General motion vector data format for MPEG-1 and MPEG-2
[Table 53]

[0169]
In Table 53, all element values are half-pixel precision. The FS0, FS1, FS2, and FS3 are 1-bit flags for confirming field selection in each motion vector. If no field is selected, the flag must be set to zero. The reason is that MPEG-1 does not use field selection information, so such a flag is set to zero.
One exceptional case occurs with MPEG-2 encoding of dual prime motion vectors. In such a case, the forward motion vector is composed of 16 bytes (substantially 8 bytes are used), and the format is as shown in Table 37. Normally, BP is a video encoding application and converts motion vector values to differential values. However, the motion vector component of the table 37 is a differential value that is immediately input by the Huffman encoder. Dual prime motion vectors are operated by BP for MPEG-2 decoding applications.
[0170]
[Table 54]
Motion vector data format in dual prime mode for MPEG-2 encoding

[0171]
H. 261 and H.H. H.263 has a slightly different motion vector data format. In most cases, a single motion vector component value can be sufficiently represented by one byte. Depending on the contents of the MF and M4 flags, the corresponding motion compensation macroblock is at least two, and often has 10 motion vector components. The data format of the motion vector data is shown in Table 60.
[0172]
H. 261 and H.H. Motion vector data format for H.263
[Table 55]

[0173]
10.8.5.3Pixel coefficient data word
The four video compression standards have different maximum pixel bit lengths due to the quantization level. Such a comparison is shown in the table below.
[0174]
[Table 56]
Input / output pixel bit resolution

[0175]
Therefore, the pixel data formats for MPEG and video image conferencing standards are different as shown in Table 56.
[0176]
Pixel coefficient data format
[Table 57]

[0177]
10.9Interrupt condition
If the BP meets the interrupt conditions described in this section, it checks the ARM7_IRQ flag and interrupts ARM7. The BP has a set of two interrupt conditions: a default and an error condition. Such a condition is stored in BP_STATUS [15: 0]. If any one bit is set by the BP, the ARM7_IRQ signal is activated. All such conditions can be masked by setting the corresponding bits in the BP_INT_MASK [15: 0] register.
[0178]
10.9.1Default interrupt condition
* Default condition 0 (BP_STATUS [0]) — When processing of input buffer 0 is completed, the BP must check ARM7_IRQ which also sets the IBUF0_DONE flag.
* Default condition 1 (BP_STATUS [1]) — When processing of the input buffer 1 is completed, the BP must check ARM7_IRQ which also sets the IBUF1_DONE flag.
* Default condition 2 (BP_STATUS [2]) — When processing of input buffer 0 is completed, the BP must check ARM7_IRQ which also sets the OBUF0_DONE flag.
* Default condition 3 (BP_STATUS [3]) — When the processing of the input buffer 1 is completed, the BP must check ARM7_IRQ for setting the OBUF1_DONE flag.
* Default condition 4 (BP_STATUS [4])-for video encoding, when ending a slice or GOB designed by ARM7, or for video decoding, when arriving at the initial code instead of the slice or GOB, and The BP must confirm the ARM7_IRQ that sets the BP_DONE flag.
* Default condition 5 (BP_STATUS [5])-When the context storage operation is terminated in the pre-emptive context switching mode, or when the current slice or GOB is terminated in the cooperative context switching mode, the BP sets the CTX_SW_DONE flag. ARM7_IRQ must be confirmed.
* Default condition 6 (BP_STATUS [6]) — When the context reload is over, the BP must check ARM7_IRQ which sets the CTX_RELOAD_DONE flag.
* Default condition 7 (BP_STATUS [7]) — Currently, BP_STATUS [7] is maintained. Therefore, such a bit should be set to “0”. Normally, such default interrupt conditions are not recommended to be masked using BP_INT_MASK [7: 0]. However, in certain operations, the user may wish to mask Default condition 1.
[0179]
10.9.2Error interrupt condition
If an error occurs at the BP, the BP sets the BP_ERR flag so that an ARM7 interrupt is required. At the same time, the BP sets appropriate data from non-zero values in the BP_ERR_CODE field of the BP_STATUS register. Such 8-bit BP_ERR_CODE has the following meaning.
* BP_CODE = 8'b0000_0000: No error occurs
* BP_ERR_CODE = 8'b0000_0001: The BP_MODE register is improperly set
* BP_ERR_CODE = 8'b0000_0010: The horizontal macroblock position is set inappropriately
* BP_ERR_CODE = 8'b0000_0011: The vertical macroblock position is set inappropriately
* BP_ERR_CODE = 8'b0000_0100: Inappropriate VLC for increasing macroblock address
* BP_ERR_CODE = 8'b0000_0101: Inappropriate VLC for macroblock type
* BP_ERR_CODE = 8'b0000_0110: Inappropriate VLC for macroblock motion code
* BP_ERR_CODE = 8'b0000_0111: Improper cancel motion vector marker bit
* BP_ERR_CODE = 8'b0000_1000: Inappropriate VLC for the coded block pattern
* BP_ERR_CODE = 8'b0000_1001: Inappropriate VLC for block DCT dc size
* BP_ERR_CODE = 8'b0000_1010: Inappropriate DCT dc value
* BP_ERR_CODE = 8'b0000_1011: Inappropriate VLC for block DCT ac coefficient
* BP_ERR_CODE = 8'b0000_1100: The block # exceeds 64 in one macroblock
* BP_ERR_CODE = 8'b0000_1101: Inappropriate f_CODE value (for example, the value is 0)
* BP_ERR_CODE = 8'b0000_1110: Inappropriate VLC for block DCT ac coefficient
* BP_ERR_CODE = 8'b0000_1111: Inappropriate IBUF and OBUF address setting
* BP_ERR_CODE = 8'b0001_0000: The least significant 4 bits of the BP input / output buffer start address are not zero
Other BP_ERR_CODE values are stored.
[0180]
10.10Detailed functionality requirements
10.10.1IOBUS interface
All data movement between the BP and the CCU is performed through the IOBUS. The IOBUS is a 32-bit @ 40 MHz synchronous bus that contains multiplexed addresses and data. Since at least 7 cycles are required to transmit 16-byte data through the IOBUS, the maximum transmission speed of the IOBUS is 91.4 Mbytes / sec (731.4 Mbits / sec).
The BP can be a master or slave for all IOBUS read and write transmissions. When the BP operates as a master, a requester signal must be sent to the IOBUS arbiter. If the IOBUS does not exist, the arbiter feeds the BP and sends a device select signal. The competition of data through the IOBUS is one of the following three categories. Two or four 32-bit pixel data containing this pixel element, a 32-bit compressed bitstream word, and one of the syntax / control parameters for encoding and decoding operations. In addition to information such as timing diagrams for the IOBUS interface, users are encouraged to review the MSP IOBUS specification described above.
10.0.2Block layer processing
10.10.2.1Zigzag scan rules
The BP supports two zigzag scan conversion matrices presented in the MPEG video standard. Such 8 × 8 block data transmitted between the VP and the BP all contain 64 components.
10.10.2.2RLC code
For RLC decoding, the BP generates zero and level data according to the Huffman decoding result of the DCT ac coefficient. If 64 pixel end_of_block signals are detected before the data is generated in one 8 × 8 block, the RLC decoder will produce the remaining zero data. For RLC encoding, the BP generates a run-length and level code by counting adjacent zero data and combining with the next non-zero data. If all of the remaining data is equal to zero, it is better to generate end_of_block rather than generating RLC for the remaining data. The operation cycle for the RLC code proceeds by the number of zeros thus generated.
[0181]
10.10.2.3Huffman code
The BP Huffman code includes MPEG-1, MPEG-2, H.264, and so on. 261 and H.H. Supports all Huffman tables recommended by the H.263 video table standard. If all ROM words are 12 bits, all tables can be implemented with lookup tables. However, some Huffman tables that are simple or very complex can be implemented using hardwired logic. The decoder table implemented using the lookup table ROM is summarized in the table.
[0182]
[Table 58]
Rom size required for Huffman decoder lookup table

[0183]
The encoder table summarizes the contents that require a larger ROM size than the decoder table.
[0184]
[Table 59]
Rom size required for Huffman encoder lookup table

[0185]
From the table, the overall required ROM size for the Huffman encoder and decoder is 768 × 12 bits. The table does not include the stuff code, escape_code, the sign bit of the DCT coefficient, and the end_of_block code operated by the state machine. The operating cycle for each of the Huffman codes is described in the table.
[0186]
[Table 60]
Huffman code processing cycle

[0187]
Finally, the JPEG decode table indicates that it cannot be implemented using the above approach. However, in the dc_coeff_next_0 table, a JPEG encoding application can be used.
10.10.2.4Differential dc value
In the case of an Intra block, the BP also calculates the differential dc coefficient of the first element of 8 × 8 block data and reproduces the dc value with the already transmitted differential dc coefficient.
10.10.2.5Uncoded block
The BP does not support uncoded blocks. The VP and ARM7 perform uncoded blocks. The BP represents a non-coded block with coded_block_pattern appearing in the word of the macroblock header, so that the VP and ARM7 process such a block.
10.10.2.6Block transmission order
In one macroblock transmitted for encoding and decoding, the block order is as follows: Luminance (Y) blocks 0, 1, 2 and 3, Hue Blue (Cb) and Hue Red (Cr) It is a block.
10.30.3Macroblock layer processing
10.10.3.1Differential motion vector
The BP calculates a differential motion vector from the motion estimation result, and reproduces the motion vector with the transmitted differential motion vector except in the following cases.
* The first case is a dual prime mode in the case of MPEG-2 video encoding. In this case, the motion vector transmitted to the BP is a vector '[0] [0] [1: 0]', and the vector '[r] [0] [1: 0]'.
(See section 7.6.3.6 of the MPEG-2 video standard).
* In the second case, 263 is an improved prediction mode. In this case, four motion vectors and such values must be transmitted from / to the BP as differential values.
10.10.3.2Skipped macroblock
The BP does not support skipped macroblocks. The VP and ARM 7 process the skipped macroblock. In the VP and ARM7 for processing a skipped macroblock as described above, the BP writes horizontal and vertical macroblock addresses in the header word of the macroblock.
10.10.3.3Macroblock staff code
In MPEG-1, if a macro block stuff code occurs in one cycle, the BP must discard it. However, in MPEG-1 encoding, the BP prevents the user from including the macroblock stuff code in the macroblock layer header. Generally, such a staring code is used to control the output video rate buffer. Therefore, it is recommended to insert zero stuff bits between start codes instead of inserting macroblock stuff codes.
For MPEG-1 and MPEG-2 applications, the bitstream output must be byte-aligned to the slice layer. Even if the bitstream output is byte-aligned to the image layer. For H.263 applications, byte-align to the GOB layer. However, H. The output of the 261 encoder is not byte-aligned. Therefore, the bit stream forming the routine in ARM7 is programmed in consideration of such a difference. If the amount of data for the last data transmission through the IOBUS is 16 bits or less in the case of encoding, the BP automatically performs a zero-fill operation at the end of the slice.
[0188]
10.10.4.2Extra slice information
In decoding, the BP discards any extra slice information included in the slice header of the MPEG-1 or MPEG-2 bitstream. In encoding, the BP does not insert any extra slice information requested by the user. If the user still wants to include this information in the MPEG-1 or MPEG-2 bitstream, this information may be inserted into the bitstream previously encoded by the BP.
10.10.4.3Intra slice
In the MPEG-2 slice layer bitstream, the parameter intra_slice is used to indicate that the current slice is composed only of intra macroblocks. This information is not used in the decoding process and is intended to assist DSM applications when performing fast forward or fast reverse functions. Therefore, the BP discards this information for decoding application, and inserts 0 in intra_slice in the slice layer header for encoding application.
10.10.4.4Slice or GOB start code
MPEG-1, MPEG-2, H.264. At 261, every picture has at least one slice or GOB start code. However, H. The H.263 picture does not have a GOB start code and header information. In particular, any H.264. The first GOB in the H.263 picture has no start code and header information. Therefore, the input bit stream is H.264. If it is for H.263, the BP state machine must process the macroblock layer immediately. In addition, if a GOB start code is found while the bitstream is being decoded, the BP decodes the start code and continues the process without interrupting ARM7.
[0189]
10.11Input / output double buffer interface
10.11.1General explanation
The input and output buffers are implemented by double buffers. Therefore, as shown in FIGS. 64 and 65, four memory buffers IBUF0, IBUF1, OBUF0, and OBUF1 are used.
As shown in FIGS. 64 to 65, each buffer has a start and end address and a full and complete flag. To determine each buffer size, the user must enter appropriate values for the start and end addresses for each buffer.
Once the buffer source processor completes the entry for the buffer, it sets a full flag and starts filling for the other banks. When the bank sync processor finds that all the banks to be accessed are filled, it reads the data. If the bank is free, the sink sets a completion flag and checks the other bank's full flag.
The four start addresses are updated by the BP as described in measure 10.7.2. Each register for the start address stores the last byte address accessed by the BP each time the BP accesses the input or output buffer. Therefore, ARM7 sets the corresponding start address when any one of the flags IBUF0_DONE, IBUF1_DONE, OBUF0_FULL, and OBUF1_FULL is set.
The last 4 bits of the start address are always set to zero by ARM7. The reason is due to the internal data allocation structure between FBUS, CCU and IOBUS. In addition, each last address must be set so that the total number of bytes of each buffer size is a multiple of 16. In addition, the minimum buffer size is 64 bytes for MPEG-1 and MPEG-2, and H.264. H.261 and H.264. It is recommended that 128 bytes for H.263. This is to prevent performance degradation due to frequent interruption of BP to ARM7.
[0190]
10.1.1.2Handling of abnormal buffer status
When the two output buffers are full, the BP stops processing and falls into an idle state regardless of the input double buffer status. When the OBUF0_DONE or OBUF1_DONE flag is set, the BP automatically exits this idle state.
When the two input buffers are free, the BP does not need to stop processing and continues processing until the processing of data remaining inside is completed. However, as soon as the two input buffers are free, BP interrupts ARM7. If the input buffer is still free after the end of the remaining data processing, the BP falls into an idle state. When the IBUF0_FULL or IBUF1_FULL flag is set, the BP automatically exits this state.
The idle state described in this section is different from the other idle states described in this specification. The reason is that normally a control command of ARM7 is required to exit from another idle state.
[0191]
10.11.3Physical implementation of I / O buffer: example
Usually it is up to the user to determine the location and size of the BP input and output buffers. The user implements the buffer in the VP data cache, the ARM7 data cache, or the scratch pad area of the SDRAM. Even if the implementation of the BP input and output double buffer is somewhat limited, there is an efficient way to implement the buffer.
Here, a special example for implementing a rate buffer in video decoding application will be given. In this case, the user tries to implement the BP input buffer as a circular buffer. Here, it is assumed that SDRAM is used and the complete rate buffer is divided into four blocks as shown in FIG.
Initially, the user can set Rate_Buffer_Block_0 and Rate_Buffer_Block_1 to IBUF0 and IBUF1, respectively. This can be done by setting as follows.
IBUF0_START = Rate_Buffer_Address_0;
IBUF0_END = Rate_Buffer_Address_1;
IBUF1_START = Rate_Buffer_Address_2;
IBUF1_END = Rate_Buffer_Address_3.
If all of the data in IBUF0 (ie, the data in Rate_Buffer_Block_0) is used by the BP, the BP interrupts ARM7. Then, ARM7 sets Rate_Buffer_Block_2 to IBUF0 by setting as follows.
IBUF0_START = Rate_Buffer_Address_4;
IBUF0_END = Rate_Buffer_Address_5.
If all of the data in IBUF1 is used by the BP, the BP will interrupt ARM7. Then, ARM7 sets Rate_Buffer_Block_3 to IBUF1 by setting as follows.
IBUF1_START = Rate_Buffer_Address_6;
IBUF1_END = Rate_Buffer_Address_7.
When all of the data in Rate_Buffer_Block_2 is used by the BP, ARM7 further sets Rate_Buffer_Block_0 to IBUF0 by setting the address as in the first stage.
Thus, a circular buffer can be implemented simply by repeating such a complete process. This example shows that the use of the BP double buffer is very flexible depending on the user's intention.
[0192]
10.12Context switching
If more than one application drives the MSP, the ARM7 operating system commands the BP to finish the current work and switch to another work. This process is usually referred to as “context switching”. BP supports the following two types of context switching modes.
10.12.1 Occupation(preemptive) Context switching
Pre-occupancy context switching means that normal processing has ended after the BP has currently performed 8 × 8 pixel block processing. ARM7 commands the preemptive context switching mode by setting the CTX_SWITCH and CTX_MODE flags in the BP_CONTROL [6: 5] register to “11”. When the current block processing is completed, the BP sends the internal context to the external SDRAM for later processing.
When the BP completes the context store, it interrupts ARM7 by setting the CTX_SW_DONE flag located in BP_STATUS [5]. Then, ARM 7 stores all the contents of the BP's I / O buffer and initializes the BP for other work.
This mode allows the BP to respond to ARM7 context switching requests as quickly as possible. In the worst case, the BP needs about 150 cycles (= 3.75 μsec) to complete the current block processing. However, if normal, it is preferable to assume that dozens of cycles are required to complete the block process.
[0193]
10.2.2.2Cooperative (cooperative) Context switching
According to cooperative context switching, the context storage process can be eliminated by BP. This is due to the fact that all BP internal states must be initialized during all slice or GOB layer processing. In this mode, the BP subsequently processes the current slice or GOB normally before completing the process.
ARM7 commands the cooperative context switching mode by setting the CTX_SWITCH and CTX_MODE flags in the BP_CONTROL [6: 5] register to “10”. When the current slice or GOB process is completed, the BP interrupts ARM7 by setting the CTX_SW_DONE flag located in BP_STATUS [5]. Then, ARM 7 stores all the contents of the BP's I / O buffer and initializes the BP for other work.
[0194]
10.12.3Context reload
To switch its previous work, the BP reloads the context stored in the SDRAM from the address SAVE_ADR [31: 0]. In order to request this context reload, the BP needs to be idle. A possible situation for this request is when BP_DONE is set, or when CTX_DONE or ARM7 causes the BP to be reset by software. So, when ARM7 sets the CTX_RELOAD flag in BP_CONTROL [7], the BP exits the idle state and begins reading the stored context.
After the BP completes the context reload operation, it sets ARM CTX_RELOAD_DONE flag to interrupt ARM7. Then, ARM7 initializes the internal register of the BP and enables the BP for previous work processing.
10.13Work handshaking
This section deals with the detailed process for working handshaking when the BP has finished processing. Here, “updating the pointer for the last data” means that the BP has entered appropriate values in VALID_BYTE_ADR [31: 0] and VALID_BIT_POS [2: 0], respectively.
[0195]
10.13.1For encoding
In a normal state, input data for encoding is supplied from the VP. When one of the input double buffers is filled with VP, the BP begins reading data through IOBUS. At the end of processing (ie, when the number of processed macroblocks is the same as the number of macroblocks specified by ARM7), the BP sets the BP_DONE flag to interrupt ARM7 and falls into an idle state.
The pointer for valid data represents “end of compressed bitstream” for a slice or GOB. VALID_BYTE_ADR [31: 0] represents any one position in the output double buffer.
ARM7 combines this compressed bitstream and the upper layer header to form the final bitstream and repeats the process. If ARM7 tries to restart the BP before it completely burns out the data in the output double buffer, it will burn out at least one output double buffer and the pointer will be updated by the BP when the BP is restarted Therefore, it is possible to leave the pointer for the last data as it is.
[0196]
10.13.2For decoding
First, ARM7 searches for a slice or GOB start code (when present). When the start code is found, ARM7 initializes and enables the BP. After performing Huffman decoding, RLC decoding and inverse zigzag scan conversion on the BP, the data is transmitted to the output buffer for VP processing. The BP continues this processing routine until a non-slice or non-GOB start code is detected. When these are detected, the BP sets the pointer for the last data used for “end of non-slice or non-GOB start code” and interrupts ARM7. ARM 7 then decodes the start code and performs header parsing until the next slice or GOB code is found.
10.13.3Errors found in the compressed bitstream
In video telephony applications where actual data is transmitted over telephone lines and public switch networks, it is very likely that some invalid data is included in the incoming bitstream. In this case, the BP must interrupt ARM7 and check the ERR_HANDLE_MODE flag. It is safe if the user determines the error handling mode before BP is enabled for a particular application.
When the ERR_HANDLE_MODE flag is set to “1”, the BP automatically searches for the next start code. If the start code is for a slice or GOB, the BP continues normal processing. This mode is very efficient because the BP can find the start code more quickly than ARM7 and ARM7 can perform other processing routines while the BP looks for the next start code. It is. However, if a start code different from the slice or GOB layer is found, the BP sets the BP_DONE flag to further interrupt ARM7 and fall into the idle state. In such a case, the pointer used for the last data must indicate the end of the next start code.
When the ERR_HANDEL_MODE flag is set to “0”, the BP does not look for the next start code and falls into an idle state. In such a case, the pointer used for the last data must point to the location where the error was found. This mode is useful when a user is debugging an tainted bitstream using the ARM7 instruction.
[0197]
[Example 2]
With Record B
MPC bitstream processor
The bitstream processor (BP) is one of the important MSP processing cores for video data encoding and decoding applications. BP is MPEG slice layer encoding and decoding, and H.264. 261 / H. It handles H.263 group block (GOB) layer encoding and decoding. In the decoding application, the BP provides the entire information contained in each macroblock to the vector processor and the ARM7 core.
The bitstream processor hardware is divided into four functional blocks.
* IOBUS port interface including I / O control and decoding unit
* BP control state machine
* Codec core including BP register multiplexer, register, arithmetic logic unit (ALU) and multiplexer, FIFO control unit
* VLC FIFO unit
* VLC codec including lookup with codec address generator
A description of the VLC LUT ROM (340, see FIG. 3) is as follows.
1.0methodology
The lookup unit is the heart of Huffman encoding and decoding. This unit is MPEG-1, MPEG-2, H.264. 261 and H.H. Supports the VLC table included in the H.263 specification and is supported by the Samsung MSP. Most of this table is implemented by a ROM having a 12-bit width. However, if the lookup process is so simple that it does not match the size of the ROM table, special encoding and decoding are applied. All four specifications of such a layer include up to 17 bits with many variable length codes. In addition to encoding values or decoding, code sizes and valid code indicators are provided for encoding and decoding and are processed correctly. If conventional methods are used to encode or decode the VLC table, the ROM table and address generator will be even larger.
[0198]
1.1The implementation method is as follows:
* If address generator design is not difficult, share as many ROM tables as possible.
* Rearrange the VLC table based on encoding or decoding.
* Decode '0' count and '1' count based on Huffman code.
* Use 1-bit flags like sign or even / odd to reduce table size.
* If possible, separate one ROM location into 'upper' and 'lower'.
* Simplify the address generator using the least significant bit (LSB) of the VLC to generate the ROM table address.
This method is very efficient. The final ROM table size is 768 * 12 bits, much smaller to be problematic. The lookup is performed by a ROM table address generator and a ROM table lookup process. The address generator decodes input signals such as table type, mode and VLC value to generate ROM table addresses. Thereafter, encoding or decoding data is obtained from the ROM table value and other information. The decoding table has two formats, one is applied to DCT coefficients with ROM locations per VLC code, and the other is divided into upper 6 bits and lower 6 bits for each ROM location. Applied to the table. Thus, each location has two VLC codes. The encoding table has two formats, one of which is H.264. 263 TCOEF, the other is for other tables. Each ROM location contains one Huffman code for encoding applications. The size of the ROM table is 768 × 12 bits. The table can be shown as follows:
[0199]
VLC decoding ROM table map
[Table 61]

[0200]
VLC encoding ROM table map
[Table 62]

[0201]
1.2Decoding
All tables for decoding are rearranged based on '0' or '1' counts. When the MSB of the VLC code is “0”, a “0” count is applied, and when it is not “0”, a “1” count is used. For example, the code “00001xxx” has four “0” s, and the code “1110xxx” has three “1” counts. In the decoding process, first, the ‘0’ / ‘1’ count is decoded, and the ‘0’ / ’1’ count of the VLC code is output to the ROM table address generator. Thereafter, the address generator decodes the remaining code to generate an address. The address is divided into two parts, one is an offset and the other is a so-called masked address, which is obtained from the VLC table. The address is obtained from a logical sum (OR) of the two parts. Other information provided by the address generator can be indicated as follows.
* VLC code size
* Special Flag: The 2-bit flag is H.264. Reference numeral 261 represents a decoding state machine for 'ESCAPE', 'END OF BLOCK', 'STUFFFING', or 'START CODE'.
* High data extract enable: Valid data is the upper 6 bits.
* Sign / even enable: This flag indicates that the decoding should extract the VLC LSB with a table based code or even bits.
* Valid VLC
* Mask shift bits and mask: Both signals are applied to generate a masked address.
In contrast to the ROM table, MPEG-2 tables 14 and 15 and H.264 are used. Excluding the table 12 of H.263, the data formed from the upper and lower bit formats is stored at the respective positions.
[0202]

1.2.1Table

12 / MPEG-2
This table is shown in Table 2-B. 5c / MPEG-1 and Table 5 / H. 261.
ROM table format: bits 10-6: run; bits 5-0: level

1.2.2Table

15 / MPEG-2
Since it has the same run, level and VLC code as Table 14 / MPEG-2, most of this table is shared with Table 14 / MPEG-2.
ROM table format: bits 10-6: run; bits 5-0: level

1.2.3Table

12 / H. 263
This table has one or more output values 'LAST' when compared with the MPEG-2 tables 14 and 15.
ROM table format: bit 11: LAST; bit 10-4: run; bit 3-0; level
1.2.4Increased motion code / macroblock
This section includes Table 1 / MPEG-2, Table 10 / MPEG-2, Table 2-B. 1 / MPEG-1, Table 2-B. 4 / MPEG-1, Table 1 / H. 261 and Table 3 / H. 261 and table 10 / h. 263.
[0203]
For the motion code, LSB is the sign bit except when VLC = 1. The LSB is an even value flag except for the case of VLC = 1 with respect to the macroblock increase. Therefore, only half of the table is decoded. When ignoring tile code / even bits, table 10 / H. Except for the upper part of H.263, the two types of tables have the same VLC value and decoding value. The decoded value occurs up to 6 bits, which means that two table values can be placed in one position. For example, table 10 / H. Even if the decoding value in the lower part of H.263 is different from the others, the tile binary value is the same because of the fixed point. That is, in order to handle all these tables, 16 half positions are used as fixed points. One simple FSM is used to generate the ROM address. In the application, the ROM table provides absolute values when the motion code is decoded. On the other hand, when the address generator enables the sign bit, the decoder extracts the LSB, but '1' means negative (-) and '0' means positive (+). This algorithm can be shown as follows.
if (sign_enable == 1)
increment_value = sign * ROM_table_value;
else
increment_value = ROM_table_value;
If the macroblock address increment table is decoded, the result is obtained from the ROM table value and the even flag. For example, the ROM table provides a value of '5'. If the even flag is 'high', a result of '10' is obtained, and if the even flag is 'low', a value of '11' is obtained. This algorithm can be shown as follows.
if (even_enable == 1)
increment_value = (ROM-table_value << 1)
| (~ Even_bit);
else
increment_value = ROM_table_value; ROM table format: bits 11-6: upper data; bits 5-0: lower data
1.2.5Macro block pattern
This section includes Table 9 / MPEG-2, Table 2-B. 3 / MPEG-1, Table 4 / H. 261 (CBP) is handled.
The decoded value occurs up to 6 bits, which means that two data can be placed in one position. That is, 32 positions are used to handle all of this table.
ROM table format: bits 11 to 6: upper data; bits 5 to 0: lower data
[0204]
1.2.6Macro block type
This section includes tables 2, 3, 4 / MPEG-2, table 2-B. 2 / MPEG-1, Table 2 / H. 261 (MTYPE) and tables 3, 4 / H. 263 (MCBPC).
The decoded value occurs up to 5 bits. Again, the upper / lower data concept is used. One simple FSM is used to generate the ROM address.
ROM table format: bits 11 to 6: upper data; bits 5 to 0: lower data
Even though some bits have different meanings depending on each specification, the macroblock type format is universally defined for each specification based on MPEG. H. H.263 requires two steps of decoding based on the information request, as follows.
* Decoding MCBPC with 3-bit macroblock type
* Macroblock type lookup based on macroblock type, BP flag and picture type
The format of the macroblock type in the VLC table is as follows.
[0205]
[Table 63]
MPEG macroblock type format

[0206]
[Table 64]
H. 263 MCBPC format

[0207]
[Table 65]
H. 261 macroblock type format

From the table, not only the 3-bit macroblock type but also the 2-bit chroma pattern is obtained. Here, the macroblock type is a 3-bit value having a range from 0 to 4. As described above, detailed macroblock type types are decoded in the second stage. The decoding lookup table is as follows.
[0208]
[Table 66]
H. 263 macroblock type decoding lookup table

[0209]
1.2.7DCT DC size
This section includes Table 12, 13 / MPEG-2, Table 2-B. 5 / Handles MPEG-1. Depending on the VLC structure, '1' count is used here instead of '0' count
ROM table format: bits 10 to 6: upper data: chroma; bits 5 to 0: lower data: luminance.

Bits

11 and 5 are reserved.
1.2.8CBPY
This section contains Table 9 / H. 263. This table contains two sets of data, one for inter-pictures and one for intra-pictures. One set of values is an inversion of the other set of values so that a set of data can be stored in ROM. Here, the interler data is located in the ROM. One 4-bit value is used to represent the CBPY value.
ROM table format: bits 9 to 6: upper data; bits 3 to 0: lower data. Bits 11-10 and bits 5-4 are reserved.
1.2.9Dual prime and mode
This section includes Table 11 / MPEG-2 and Table 7 / H. 263.
The two tables are very simple and small and can be decoded directly.
1.3encoding
Similar to the decoding clause, the encoding process uses the '0' / '1' counting concept. The ROM table includes information on the '0' / '1' count, the size of the code that follows the first 1 for the '0' or '1' count, and the VLC code that follows the first / last '1'. According to this format, the size of the ROM table is Table 12 / H. Except for the four resolved with special encoding at 263, it can be limited to 12 bits. The format is as follows.
[0210]
[Table 67]
General encoding format

[0211]
[Table 68]
Table / H. 263 encoding format

[0212]
In the above table, the VLC code size is the size of the VLC code following the first / last ‘1’. The VLC code is a VLC code following the first / last ‘1’. In case of '0' count, the VLC code following the first '1' is extracted, otherwise the VLC code must be extracted from the bit following the last '1'. The application of '1' count in encoding is different from that in decoding. The '1' count is only applied when the '1' count flag is enabled by the address generator. Therefore, if the VLC MSB is 1 but the '1' count flag is low, the '0' / '1' count portion of the ROM table is 0, which means that the '0' count is applied. .
The following example deals with all possible cases for encoding.
Example 1: VLC = 0000011001, one_count_enable = 0
Results for the general case: 0101 100 01001
Table 12 / H. Results for 263: 101 100 001001
Example 2: VLC = 11001, one_count_enable = 0
Results for the general case: 0000 100 0101
Table 12 / H. Result for 263: 000 100 001001
Example 3: VLC = 11001, one_count_enable = 1
Results for the general case: 0010 011 00001
Table 12 / H. Results for 263
A general address is generated by adding an offset and an input value.
[0213]
1.3.1Table 14 / MPEG-2
This table is shown in Table 2-B. 5c / MPEG-1 and Table 5 / H. 261. This encoding processes 'RUN', 'FIRST DC', 'ESCAPE', and 'END OF BLOCK' inputs.
Encoding result: an offset address applied to be added with the level or run to generate the address

1.3.2Table

15 / MPEG-2
Since the two tables have the same run, level and VLC code, most of the tables share table 14 / MPEG-2. In some special cases, a '1' count is applied. This encoding processes 'RUN', 'LEVEL', 'FIRST DC', 'ESCAPE', and 'END OF BLOCK' inputs.
Encoding result: Offset address and ‘1’ count indicator

1.3.3Table

12 / H. 263
As mentioned above, this table is very special. Use other formats to handle this. Unfortunately, there are some exceptions where 12 bits cannot be used to indicate the VLC code. The exception is as shown in Table 9. This exception does not use a ROM table and can be specially encoded.
[0214]
[Table 69]
12 / H. 263 Encoding exception

[0215]
Encoding processes 'RUN' and 'ESCAPE' inputs.
Encoding result: offset address applied to be added with level or run to generate address
1.3.4Increased motion code / macroblock
This section includes Table 1 / MPEG-2, Table 10 / MPEG-2, Table 2-B. 1 / MPEG-1, Table 2-B. 4 / MPEG-1, Table 1 / H. 261, Table 3 / H. 261 and table 10 / H. 263.
As described in the decoding part, one ROM table and one FSM can be shared for all the tables. The VLC code obtained from the ROM table must be combined with the sign / even bits to create a complete VLC code. Therefore, the input value processed by the encoding FSM is an absolute value for a motion code whose LSB is a fraction bit and a macro block address increment shifted to the right by 1 bit.
[0216]
Encoding processes 'STUFFING' and 'ESCAPE'.
1.3.5Macro block pattern
This section includes Table 9 / MPEG-2, Table 2-B. 3 / MPEG-1 is handled.
The address is a value obtained by adding an offset and a pattern value.
1.3.6Macro block type
This section includes tables 2, 3, 4 / MPEG-2, table 2-B. 2 / Handles MPEG-1.

1.3.7Tables

3, 4 / H. 263 (MCBPC)
Information about the picture type, macroblock type and stuffing flag is provided to generate the ROM table address offset. The address is the sum of the offset address and CBPC.

1.3.8Table

2 / H. 261 (MTYPE)
The address generator is very complex and is not worth considering for implementation.
1.3.9CBPY
As described in the decoding part, only the intra-picture data is encoded. If the picture type is inter-picture, the data must first be inverted.
The address is a value obtained by adding the offset and the CBPY value.
1.3.10DCT DC size
This section includes Table 12, 13 / MPEG-2, Table 2-B. 5 / Handles MPEG-1.
Since several VLC codes for luminance and chroma are the same, they share several ROM tables. A chroma flag and several bit values are used to generate the offset address. The ROM address can be obtained by adding the offset and the actual value.
[0217]
1.3.11.Dual prime (dual prime) And modes
This section includes Table 11 / MPEG-2 and Table 7 / H. 263.
The two tables are very simple and small and can be encoded directly.
2.0Hardware description
Hardware for VLC encoding / decoding is included in the 'VLC' block. This block includes three sub-blocks. This block is applied to generate ROM table addresses or decoding / encoding data. 'VLC-DEC' is used to decode the VLC and generate a ROM table address. ‘VLC_ENC’ is a block for encoding VLC. Generate special encoding for the H.263 TCOEF table. 'LOOKUP' outputs VLC data based on a ROM table value or a special encoding value.
2.1VLC decoding address generator
The core of VLC_DEC is decoding FSM. This FSM decodes input information and controls address generation. FSM inputs and definitions are as follows:
* ZERO_ONE Count (15 bits): Provides a 0/1 count value.
* ZERO_ONE Count (4 bits): Provides a 0/1 count value. The purpose of using two different bit count signals is to reduce gate customers by sharing input data. In most cases, 15 bits are used.
* ONE Count enable (1 bit): '1' count indicator
* Table type (6 bits): Table type
[0218]
[Table 70]
VLC_DEC_FSM table type format
* Mode (9 bits): Operation mode

[0219]
VLC_DEC FSM mode format
Bit 8 bit 7 bit 6 bit 5 bit 4 bit 3 bit 2 bit 1 bit 0
H.263 Specifications Picture Type Chroma 1st DC Table 15 MB INC
Definitions for specifications and picture types are described in Pin Definition.
A special algorithm is used to generate this decode ROM table address to simplify the hardware and ensure ROM access time. This process is as follows.
[0220]
Step 1: Generate an offset address (OFFSET).
Step 2: A 4-bit shift amount (MASK_SHFT) is generated, and a right-shift 16-bit FIFO_DATA is generated together with this amount. Thereafter, the four least significant bits (FOL_DATA) are extracted.
Stage 3: Invert the 4 bits obtained from Stage 2.
Step 4: To mask the data obtained from Step 3, a 4-bit mask signal is generated.
Step 5: OR the result of step 4 with the offset address. The result is a ROM table address.
Combining this stage is as follows.
Address = OFFSET | (BITREVERSE (Bit (3-0) of (FIFO_DATA >> MASK_SHFT)) & MASK)
The output of the FSM is as follows.
* MASK (4 bits): Master data
* OFFSET (9 bits): ROM table offset address
* MASK_SHFT (4 bits): Shift amount
* SIZE (5 bits): VLC size
* SPECIAL_FLAG (3 bits): Extra information for decoding
[0221]
[Table 71]
Definition of special flag of VLC_DEC

[0222]
* VALID_VLC (1 bit): Valid VLC code flag
* HIGH_DATA_INDICATOR (1 bit): The upper 6 bits in the ROM data are extracted.
Input pin:
* FOL_DATA (4 bits): shifted FIFO_DATA (see stage 2 above)
* CNT (4 bits): 0/1 count
* ONE_CNT_EN (1 bit): '1' count indicator
* MODE (14 bits): Table type and other information
The definition is as follows.
[0223]
[Table 72]
MODE format with VLC_DEC

[0224]
Specifications: 00 = MPEG-1; 01 = MPEG-2; 10 = H. 261; 11 = H. 263;
Picture type: 00 = Reserved; 01 = Intler; 10 = Prediction; 11 = Bidirectional;
* FIFO_DATA (16 bits): Data includes VLC.
Output pin:
* ROM_ADR (10 bits): ROM table address
* MASK_SHFT (4 bits): Shift amount for FIFO_DATA (see stage 2 above)
* SIZE (5 bits): VLC size
* SPECIAL-0 (3 bits): Special flag (refer to FSM output)
* VALID_VLC (1 bit): Valid VLC flag
* HIGH_DATA (1 bit): VLC L as the sign of even flag
SB extraction instruction person
* FULL_DATA (1 bit): Full 12-bit data structure high when DCT coefficient decoding
* TABLE (6 bits): Defined on the FSM input.
* T_MODE (9 bits): Defined in MODE for FSM input.
2.2VLC_ENC
As in the VLC encoding core part, VLC_ENC encodes variable length codes. The output of this part is ROM table address or VLC special encoding. As described in Section 1.0, the encoding data structure is H.264. Except for some special cases of TCOEF at 263, it follows the 12-bit data format. Even if a 10-bit adder is used to generate the ROM table address, it is much simpler than the VLC_DEC part from a hardware perspective.
Similar to VLC_DEC, the core of this part is the FSM called VLC_ENC. Other FSMs, ENC_SP are used for special encoding.
The input signal of FSM VLC_ENC is the same as the input pin of this part.
* LAST (1 bit): H.264 LAST value for TCOEF of 263
* RUN / VALUE (6 bits): If the DCT coefficient table is in the middle of encoding, this input means RUN, otherwise it means a general value, ie pattern.
* LEVEL (6 bits): DCT coefficient level
* SPECIAL_FLAG (2 bits): Special flag defined in the VLC_DEC part
* TABLE (6 bits): Same as VLC_DEC
* MODE (9 bits): Same as VLC_DEC
ROM address generation is very simple. The FSM provides an offset address that is added to a value (run) or level or zero to generate an address. Since this VLC has the same size and '0' count, for special encoding, the output is the two least significant bits recovered with the code.
[0225]
The output pins are as follows.
* ONE_CNT_FLG (1 bit): Signals that the VLC structure uses a '1' count.
* SIGN_EN_BIT: Signals that the VLC structure places a sign / even bit in the VLC LSB.
* SPECIAL_ENCODE (1 bit): Special encoding flag
* VLC (2-bit) specially encoded VLC code LSB
* ADR_A (16 bits): Offset address. The upper 6 bits are 0.
* ADR_B (16 bits): Another part of the address. The upper 10 bits are always 0.
2.3Look up
This section provides encoding / decoding of VLC data. This block handles the following situations:
* Regular 12-bit encoding / decoding ROM table value output
* Bit upper / lower decoding data output
* Restore special encoding data
Filled with output data 0 as requested.
Input pin:
* D_ADR (10 bits): Decodes the ROM address.
* E_ADR (10 bits): Encodes the ROM address.
* ENCODE (1 bit): 1: Encoding; 0: Decoding
* HIGH (1 bit): The upper 6-bit flag is extracted.
* ENABLE (1 bit): Complete 12-bit data flag
* VLC (2 bits): Special encoding code
* SPECIAL_ENCODE (1 bit): Special encoding code
Output pin:
LOOKUP (16 bits): VLC code
[0226]
【The invention's effect】
As described above, in the bitstream processor, the context can be stored so that various bitstreams can be encoded or decoded simultaneously in real time, so that multiple data streams can be processed simultaneously. Can do. Also, by preventing the bitstream processor from being programmed to perform single arithmetic instructions or Boolean instructions, the bitstream processor can operate at high speed.
[Brief description of the drawings]
FIG. 1 is a block diagram of a media card according to the present invention.
FIG. 2 is a block diagram of a multimedia processor according to the present invention.
FIG. 3 is a block diagram of a bitstream processor that is a part of the processor shown in FIG. 2;
FIG. 4 is a block diagram of a computer system according to the present invention.
FIG. 5 is a block diagram of a computer system according to the present invention.
FIG. 6 is a block diagram of a computer system according to the present invention.
FIG. 7 is a diagram showing a firmware structure of the processor shown in FIG. 2;
FIG. 8 shows an address map for the system of FIG.
FIG. 9 shows an address map for the system of FIG.
10 is a block diagram showing a DSP core of the processor shown in FIG. 2. FIG.
11 is a diagram showing a pipeline applied to a part of the vector processors of the processor shown in FIG. 2;
12 is a functional block diagram of the vector processor of FIG.
13 is a diagram showing an execution data path in the vector processor of FIG. 11. FIG.
FIG. 14 is a diagram showing load and storage data paths in the vector processor of FIG. 11;
FIG. 15 is a block diagram of a cache system of the processor in FIG. 2;
FIG. 16 is a diagram showing an instruction data cache in the cache system of FIG. 15;
FIG. 17 is a diagram showing a data path pipeline of the cache control unit in the processor of FIG. 2;
18 is a diagram showing a data path for the address processing pipeline of the cache control unit in the system shown in FIG. 2;
FIG. 19 is a diagram showing a state machine in the processor of FIG. 2;
20 is a diagram showing a state machine in the processor of FIG. 2. FIG.
FIG. 21 is a diagram showing a state machine in the processor of FIG. 2;
22 is a diagram showing a state machine in the processor of FIG. 2. FIG.
FIG. 23 is a diagram showing an address format used in the cache system of FIG. 15;
FIG. 24 is a diagram illustrating a bus in the processor of FIG. 2;
FIG. 25 is a diagram illustrating an arbitration control unit in the processor of FIG. 2;
FIG. 26 is a timing diagram for the processor of FIG.
FIG. 27 is a timing diagram for the processor of FIG.
FIG. 28 is a timing diagram for the processor of FIG.
FIG. 29 is a timing diagram for the processor of FIG. 2;
30 is a diagram showing a memory request signal in the processor of FIG. 2. FIG.
FIG. 31 is a diagram showing a memory request signal in the processor of FIG. 2;
32 is a diagram showing a memory request signal in the processor of FIG. 2; FIG.
FIG. 33 is a diagram showing a bus arbitration control unit in the processor of FIG. 2;
FIG. 34 is a timing diagram for the processor of FIG.
FIG. 35 is a timing diagram for the processor of FIG. 2;
FIG. 36 is a timing diagram for the processor of FIG. 2;
FIG. 37 is a diagram showing a bus interface circuit in the processor of FIG. 2;
38 is a diagram showing a bus interface circuit in the processor of FIG. 2. FIG.
FIG. 39 shows a virtual frame buffer (VFB) for the system of FIG.
40 shows a virtual frame buffer (VFB) for the system of FIG.
41 shows a bus interface for the system of FIG.
42 illustrates a memory controller for the system of FIG.
43 illustrates a memory controller for the system of FIG.
44 illustrates an address controller for the system of FIG.
45 is a diagram showing a format used in the system of FIG. 1. FIG.
46 is a diagram showing a format used in the system of FIG. 1. FIG.
47 is a diagram showing a state machine in the system of FIG. 1. FIG.
48 is a block diagram of a data controller for the system of FIG.
FIG. 49 is a timing diagram for the system of FIG.
FIG. 50 is a timing diagram for the system of FIG.
FIG. 51 is a timing diagram for the system of FIG.
52 is a diagram showing a device interface circuit in the processor of FIG. 2. FIG.
53 is a diagram showing a device interface circuit in the processor of FIG. 2;
54 is a block diagram for each part of the system of FIG. 1;
FIG. 55 is a block diagram for each part of the system of FIG. 1;
56 is a block diagram for each part of the system of FIG. 1;
FIG. 57 is a diagram showing a register in the system of FIG. 1;
FIG. 58 is a diagram showing registers in the system of FIG. 1;
FIG. 59 is a diagram showing registers in the system of FIG. 1;
60 is a diagram showing a frame buffer and a video window in the system of FIG. 1. FIG.
FIG. 61 is a timing diagram for the system of FIG.
FIG. 62 is a diagram showing registers in the system of FIG. 1;
FIG. 63 is a timing diagram for the system of FIG.
FIG. 64 is a diagram showing a buffer used in the system of FIG. 1;
FIG. 65 is a diagram showing a buffer used in the system of FIG. 1;
66 is a diagram showing a buffer used in the system of FIG. 1. FIG.
[Explanation of symbols]
100: Media card
105, 122: Bus
110: Multimedia processor
112: D / A converter
114: CODEC
120: Memory bus
210: Scalar processor
220: Vector processor
230: Cache subsystem
240: IOBUS
242: Timer
243: UART unit
245: Bitstream processor
250: FBUS
252 and 255: Interface circuit
258: Controller
290: Data mover
310: Interface unit
320: SRAM
330: VLC FIFO unit
340: VLC LUT ROM
350: Control state machine
360: BP core unit

Claims

In a system for encoding or decoding video data,
A vector processor for performing linear transformation on video data;
A bitstream processor that compresses the output of the vector processor or decompresses video data for input to the vector processor; and
A control circuit that synchronizes operation of the vector processor and the bitstream processor; the control circuit controls the bitstream processor to process while switching a plurality of video data streams; Accordingly, the bit stream processor can process two video data streams almost simultaneously so that the system can encode or decode two video data streams in real time. Video data encoding or decoding system.

2. The video data encoding or decoding system according to claim 1, wherein each of the video data streams represents a moving image.