JP4603753B2

JP4603753B2 - Method and circuit for securing cached information

Info

Publication number: JP4603753B2
Application number: JP2001556458A
Authority: JP
Inventors: ノース、グレゴリー、アレン; ペリー、マシュー、リチャード; カーチャー、ブライアン、クリストファー
Original assignee: シラスロジック、インコーポレイテッド
Priority date: 2000-02-01
Filing date: 2001-01-26
Publication date: 2010-12-22
Anticipated expiration: 2021-01-26
Also published as: WO2001057677A1; EP1256061A1; JP2003521781A; AU2001234611A1

Description

【０００１】
（発明の分野）
本発明は、一般には電子機器に関し、より詳細には個人電子機器における情報のプライベート化のための回路、システム、および方法に関する。
【０００２】
（関連技術の説明）
新技術により高度な機能を備えた手ごろな価格のデバイスの製造が可能になったのに伴い、ハンドヘルド型の個人電子機器が一般に普及しつつある。このようなデバイスの１つは携帯型のデジタル・オーディオ・プレーヤであるが、これはデジタルのオーディオ・データをダウンロードし、そのデータを読み出し／書き込み可能なメモリに記憶し、ユーザの要求に応じてそのデータをオーディオに変換する。デジタル・データは、ＭＰＥＧＬａｙｅｒ３、ＡＣＣ、およびＭＳオーディオ・プロトコルを含むいくつかの形態のいずれかでネットワークからダウンロードされ、またはコンパクト・ディスクなどの固定媒体から取り出される。適切なファームウェアによってサポートされたオーディオ・デコーダは、メモリから符号化データを取り出し、対応する復号化アルゴリズムを適用し、符号化データをアナログ形態に変換してヘッドセットやその他の携帯型スピーカー・システムを駆動する。
【０００３】
音楽など著作権で保護されている材料が無許可でダウンロードされるのを防ぐために、個人機器の操作を制御する何らかの手段が望まれる。これは、例えば、関連情報のダウンロードを許可するパスワードやソフトウェア・カーネルの発行を通じて実装することが可能である。パスワードやソフトウェアは、エンド・ユーザによるコピー、配布、および改ざんを防止するために保護しなければならない。さらに、オーディオ・デコーダは独自のファームウェアから操作することも可能なので、そのようなファームウェアもコピーや改ざんから保護しなければならない。
【０００４】
要するに、必要とされるのは、個人デジタル機器中の情報を保護するための方法、回路、およびシステムである。この目的のためには、そうした情報を保護する能力は、主要な処理チップの内部のメモリであろうと外部のメモリであろうと情報が機器のどこに記憶されるかに依存するべきではない。さらに、セキュリティを実装することにより、使用可能なメモリ空間など、より直接的に処理動作に使用することのできる資源が浪費されないことが好ましい。また、セキュリティの方法およびハードウェアは、幅広い各種のシステム構成に適用できることが好ましい。
【０００５】
（発明の概要）
本発明の原理によると、情報を処理するための命令セットに応答して動作する中央演算処理装置を含むシステムが開示される。インタフェースが含まれ、これはシステムの一部を形成する選択された回路への外部装置からのアクセスを提供する。不揮発性のプログラマブルなセキュリティ要素のセットがインタフェースの動作を選択的に使用可能および使用不能にすることにより、情報を処理するためのプライベートな環境を提供する。本発明の原理は、特に、個人デジタル機器中の情報をプライベート化する能力を提供する。この原理は、利用可能なメモリ空間など、より直接的に処理動作に使用することのできる処理資源を浪費しないような方式で実装することができる。さらに、この原理は幅広い各種のシステム構成に適用することができ、主要な処理チップの内部にあるメモリ内であっても、外部のメモリ内であっても、機器中のどこにプライベートな情報が記憶されるかに依存しない。
【０００６】
本発明、およびその利点をより完全に理解できるように、添付の図面と併せて以下の説明を参照する。
【０００７】
（好ましい実施形態の説明）
本発明の原理およびその利点は、図１〜１７に示す実施形態を参照することにより最もよく理解することができるが、これらの図面中同じ番号は同じ部分を表す。
【０００８】
図１Ａは、本発明の原理を実施する集積回路１００のハイ・レベル機能ブロック図である。集積回路１００は、例えばＣｉｒｒｕｓＬｏｇｉｃＥＰｘｘ集積回路などである。集積回路１００は、特に携帯情報端末、電子手帳、双方向ページャを含む複数の消費者向けおよび産業用のハンドヘルド情報機器に利用することができ好都合である。具体的には、集積回路１００は、電池式のインターネット・オーディオ・デコーダでオーディオ処理を実行するように構成することができる。
【０００９】
本発明の原理を有利に適用することのできるこの他２つの例示的システムを図１Ｂおよび１Ｃに示し、下記でさらに説明する。
【００１０】
図２はあるシステム構成による集積回路を示し、この図は集積回路１００の各種の機能ブロックの入出力信号（ポート）を説明する際に参照する。
【００１１】
集積回路１００は、英国ケンブリッジのＡＲＭ社から入手可能なＡＲＭ７２０Ｔデータ・シートに記載されるＡＲＭ７２０Ｔプロセッサ１０１を中心として構築される。一般に、プロセッサ１０１は、中央演算処理装置（ＣＰＵ）コア１０２、８キロバイト・キャッシュ１０３、メモリ管理ユニット（ＭＭＵ）１０４、および書き込みバッファ１０５を含み、それぞれについては下記でさらに詳しく説明する。代替実施形態では、ＡＲＭ９２０プロセッサを使用してもよいことに留意されたい。
【００１２】
ＣＰＵ１０２は、縮小命令セット・コンピュータ（ＲＩＳＣ）アーキテクチャに基づく３２ビット・マイクロプロセッサである。関連する８キロバイト・キャッシュ１０３は、混合型の命令／データ・キャッシュ（ＩＤＣ）であり、１６バイト（４ワード）のライン５１２個からなる４ウェイ・セットアソシエイティブ・キャッシュとして編成される。
【００１３】
ＭＭＵ１０４は、変換索引バッファ（ＴＬＢ）、アクセス制御ロジック、および変換テーブルウォーキング・ロジックを含む。ＭＭＵ１０４の主要機能は、仮想アドレスの物理アドレスへの変換と、メモリ・アクセスの制御である。ＭＭＵ１０４は、従来の２レベルのページ・テーブル構造もサポートする。一般に、ＴＬＢは６４個の変換済みのエントリをキャッシュし、関連付けられたアクセス制御ロジックへの変換を提供する。仮想アドレスによりＴＬＢ中の変換済みエントリにヒットがあった場合は、アクセス制御ロジックが、そのアクセスを許可するかどうかを決定する。アクセスが許可された場合は、ＭＭＵ１０４はそれに対応する物理アドレスをＴＬＢキャッシュから出力する。反対にアクセスが許可されない場合、ＭＭＵ１０４はＣＰＵ１０２に中止を実行するように通知する。仮想アドレスの結果ＴＬＢキャッシュにヒットがない（ミス）場合は、変換テーブル・ウォーキング回路が、物理メモリ中の変換テーブルから必要な変換情報を取り出す。この変換情報は、ＴＬＢキャッシュの置換用のポイントまたはエントリに書き込まれる。これによりアクセス制御ロジックはアクセスを許可するか否かを決定することができる。
【００１４】
書き込みバッファ１０５は、最高で８ワードのデータと４つの独立アドレスをバッファリングするのに使用される。これが使用可能になっている際、ＣＰＵ１０２は、外部クロックを使用してデータまたは命令を書き込みバッファ１０５に書き込み、命令の実行に戻る。書き込みバッファ１０５は、これと並行して内部データ・バス１０６にデータを書き込み、内部アドレス・バス１０７にアドレスを書き込むことができる。
【００１５】
３．６８６４ＭＨｚの水晶１０９によって駆動されるオンチップの位相同期ループ（ＰＬＬ）１０８は、クロックをプロセッサ１０１に提供するためにあるモードで使用される。ＡＲＭ７２０Ｔを使用する実施形態では、主要な（ＣＰＵ）クロックは、１８．４３２ＭＨｚ、３６．８６４ＭＨｚ、４９．１５２ＭＨｚ、または７３．７２８ＭＨｚのいずれかにプログラムすることができる。（ＰＬＬ１０８は、可能な最高のＣＰＵクロック周波数の２倍、すなわち１４７．４５６ＭＨｚで動作することが好ましい。）ＣＰＵのクロック周波数を３６．８６４ＭＨＺに選択した場合、内部データ・バス１０６および内部アドレス・バス１０７もおよそ３６ＭＨｚのクロックで動作する。３６ＭＨｚを超えるＣＰＵクロック周波数の場合は、プロセッサ１０１のみがその周波数以上のクロック速度で動作し、内部データ・バス１０６および内部アドレス・バス１０７は３６ＭＨｚの速度でクロック制御される。ＣＰＵのクロック周波数は、システム制御レジスタＳＹＳＣＯＮ３に２ビットのレジスタ・フィールドをプログラムすることによって選択する。（集積回路１００の内部レジスタのリストを表１として提供する。これらレジスタの完全な説明は「ＣｉｒｒｕｓＬｏｇｉｃＥＰ７２１１ＰｒｅｌｉｍｉｎａｒｙＤａｔａＳｈｅｅｔ」から得られる。この文献は参照により本明細書に組み込む。
【００１６】
集積回路１００は外部クロック入力も含み、これにより実質的にすべてのオンチップ回路を第２のクロック・モードで駆動させるために１３ＭＨｚの外部クロックを入力することが可能になることに留意されたい。図４Ａおよび４Ｂに示すように、ピンＣＬＫＥＮでクロック・イネーブル信号がアサートされると、外部クロックはピンＥＸＰＣＬＫを駆動させる。図４Ａは集積回路１００が待機状態に入る場合を示し、図４Ｂは待機状態を終了する場合を示す。（待機状態については下記でさらに説明する。）
【００１７】
発振器１１０は、３２ビットの実時間クロック・ジェネレータ（ＲＴＣ）１１２を駆動するのに使用される１ヘルツ・クロックを生成するのに使用される。ＲＴＣ１１２は、３２ビットの出力マッチ・レジスタに書き込まれ、ここから読み出され、またこれを含むことができる。出力マッチ・レジスタは、ＲＴＣの時間が事前に決められた特定の時間と一致したときに割込みを発行することを可能にする。ＲＴＣ１１２は、プログラマブルなＬＥＤフラッシャ（図示せず）の駆動にも使用される。
【００１８】
さらに、集積回路１００は、１対のオンチップのタイマ・カウンタ１１３を含む。各タイマ・カウンタは独立しており、１６ビットの読み出し／書き込み可能なデータ・レジスタを含む。所与のカウンタは所望の値にロードされ、事前に選択されたクロックに応答して減分される。タイマ・カウンタが下位桁あふれを起こす（すなわちゼロに達する）と、適切な割込みが生成される。タイマ・カウンタ・レジスタはいつでも読み出すことができる。これらタイマのクロック周波数は、システム制御レジスタＳＹＳＣＯＮ中の対応するビットに書き込むことによって選択することができる。例えば、ＰＬＬ１０８が内部クロックをソースとする場合、タイマ・カウンタ１１３は５１２ｋＨｚおよび２ｋＨｚの速度を使用することができる。外部ソースからの１３ＭＨｚクロックを使用する場合は、５４１ｋＨｚおよび２．１１５ｋＨｚのクロックを利用することができる。さらに、システム制御レジスタＳＹＳＣＯＮ２中のビットを設定することによって可能になる、２６で除算を行う回路を使用することにより、１３ＭＨｚのソースから５００ｋＨｚクロックを生成することも可能である。
【００１９】
各タイマ・カウンタ１１３は、システム制御レジスタＳＹＳＣＯＮ１中のビットを設定またはクリアすることにより、フリー・ランニング・モードまたはプレスケール（ｐｒｅｓｃａｌｅ）・モードで動作することができる。フリー・ランニング・モードでは、所与のカウンタは下位桁あふれを起こす（すなわちゼロに達する）と０ｘＦＦＦＦに折り返し、カウント・ダウンを続ける。プレスケール・モードでは、カウンタが下位桁あふれを起こすと所与のタイマ・カウンタに書き込まれた値が自動的に再ロードされる。プレスケール・モードは、プログラマブルな周波数を作る、ブザーを駆動させる、あるいは周期的な割込みを生成するために使用することができる。
【００２０】
状態制御回路１１４は、集積回路１００を作動状態、アイドル状態、あるいは待機状態にセットすることを可能にする。状態制御回路１１４の動作を表す状態図を図５に示す。作動状態は、通常のプログラム実行状態であり、すべてのクロックおよび周辺ロジックが使用可能になっている。アイドル状態は作動状態に似るが、割込みや、回路を作動状態に戻すためのウェークアップの間ＣＰＵクロックを停止する点が異なる。待機状態では、ＰＬＬ１０８はシャット・ダウンされるが、水晶１１１と発振器１１０、およびＲＴＣ回路１１２は稼動状態に保たれる。待機状態中には、パワー・ダウンした周辺機器が電流を消費するのを防ぐために、外部のアドレス・バスおよびデータ・バスも動作を低下させる。集積回路１００は、最初に電源を投入した際、あるいはコールド・リセット中には強制的に待機状態に置かれ、この状態は外部からのウェークアップ指示によってのみ終了することができることに留意されたい。
【００２１】
電源管理は、状態制御回路１１４以外にも電源管理制御ブロック１１５を通じても行われる。各状態における集積回路１００の各種機能ブロックの状態を表１２に示す。電源管理回路１１５は、外部の電力供給ユニット２０１からアクティブ・ローの電源異常信号ＰＷＲＦＬを受け取ると、集積回路１００を待機モードに置く。集積回路１００を外部のＤＣ電源２０２によって駆動させる場合、外部電源感知入力信号ＥＸＴＰＷＲをアクティブ・ローにする。電池２０３を使用する場合は、ＢＡＴＯＫピンがアクティブ・ハイであるとメイン・バッテリが正常であることを示す。この信号の立ち下がりによりＦＩＱ（高速割込み要求）が生成され、待機状態でこのピンの信号がロー・レベルになるとシステムの起動が禁止される。新規電池感知信号ＢＡＴＣＨＧは、新しい電池が必要であることを示す。電池の電圧が「電池なし」の閾値を下回ると、この入力がアクティブ・ローになる。集積回路１００に電力を供給する電池は、例えば小売り消費者に広く流通する１つまたは複数の標準的な単３電池でよい。
【００２２】
プログラムの実行中に予期しないイベント（すなわち割込みやメモリ障害）が起こった際は、通例、例外が生成される。複数の例外が生じると、固定された優先順位システムに作用する割込みコントローラ１１６が例外にサービスする順序を決定する。
【００２３】
集積回路１００は２つの割込みタイプ、すなわち割込み要求（ＩＲＱ）と高速割込み要求（ＦＩＱ）に対応する。ＦＩＱは、ＩＲＱよりも高い優先度を有する。同じタイプの割込みが２つ以上同時に発生したときには、ソフトウェア内で競合を解決する。表２Ａ〜２Ｃは好ましい割込みの割り当てを示す。表中のＩＮＴＭＲ１とＩＮＴＳＲ１はそれぞれ高速割込みマスク・レジスタと高速割込み状態レジスタであり、ＩＮＴＭＲ２とＩＮＴＳＲ２はそれぞれ第２割込みマスク・レジスタと第２割込み状態レジスタであり、ＩＮＴＭＲ３とＩＮＴＳＲ３は第３割込みマスク・レジスタと第３割込み状態レジスタである。同じグループ（ＩＲＱまたはＦＩＱ）から２つの割込みを受け取った場合、それらにサービスする順序はソフトウェア中で解決することが好ましいことに留意されたい。
【００２４】
一般に、割込みコントローラ１１６は以下の要領で動作する。外部または内部の割込みを行うデバイスが適切な割込みをアサートする。対応する割込みマスク・レジスタで適切なビットがセットされると、割込みコントローラ１１６によってＦＩＱまたはＩＲＱがアサートされる。割込みが許可される場合、プロセッサ１０１は該当するアドレスにジャンプする。すると割込みディスパッチ・ソフトウェアが、対応する割込み状態レジスタを読み出して割込み要因を確定し、該当する割込みサービス・ルーチン・ソフトウェアを呼び出し、そのソフトウェア・ルーチンが、割込みを行うデバイスに固有の何らかのアクションによって割込み要因をクリアする。割込みサービス・ルーチンは、次いで割込みを再び許可し、他に保留中の割込みがあればそれを同様にサービスすることができる。他のすべての外部割込み要因は、対応するサービス・ルーチンが実行を開始するまでアクティブな状態に保持される。
【００２５】
表３に、外部割込みの待ち時間を示す。作動状態で、プロセッサ１０１は、各命令を実行した後にそのＦＩＱ入力およびＩＲＱ入力がロー・レベルであるかを調べる。したがって、割込み条件を最初に検出してから現在の命令を完了するまでにかかる時間量に直接関連する割込み待ち時間があることになる。待機状態では、待ち時間は、システム・クロックがシャット・ダウンされているかどうか、およびシステム制御レジスタ中の制御ビットＦＡＳＴＷＡＫＥがセットされているかどうかによって決まる。上記で指摘したように、ＰＬＬ１０８は待機状態では常にシャット・ダウンされる。ＦＡＳＴＷＡＫＥビットをクリアした場合、待ち時間は０．１２５秒から０．２５秒の間である。ただし、このビットをセットした場合は、待ち時間は２５０マイクロ秒から５００マイクロ秒の間になる。外部クロックを使用し待機中にそれに使用不可にした場合も、待ち時間は０．１２５秒から０．２５秒の間になり、発振器の安定化を図る。外部クロックを使用不可にしない場合は、待ち時間は数マイクロ秒まで短縮することができる。割込みにより、集積回路１００がアイドル状態を終了することも可能である。この場合は、ＣＰＵクロックを再び開始させなければならず、さらに、上記のように割込みに対するサービスは命令の実行のために遅らせることができる。
【００２６】
図の実施形態では、集積回路１００を初期化するための命令セットを保持するオンチップ・ブートＲＯＭ１１７が提供される。オンチップ・ブートＲＯＭは、オンチップＳＲＡＭ１１８にダウンロードされる、受信した２０４８バイトのシリアル・データに、下記でさらに説明するＵＡＲＴ１も設定する。データがＳＲＡＭ１１８にダウンロードされると、プロセッサはＳＲＡＭの開始部にジャンプすることにより命令の実行を続けることができる。好都合なのは、この構成により、集積回路１００を使用するデバイスの製造中にコードをダウンロードしてシステム・フラッシュ・メモリをプログラムすることができる点である。ユーザは、オンチップＲＯＭ１１７からのブートか、またはポートＣＳ［０］に接続された外部メモリからのブートを選択できることに留意されたい。具体的には、ピンＭＥＤＣＨＧにおける信号がローである場合、ブートはオンチップＲＯＭ１１７から行われ、このピンにハイの信号が印加されると外部メモリからブートを行うことが必要となる。オンチップ・ブートＲＯＭからのブートの作用は、内部ですべてのチップ・セレクト信号を復号した場合の逆になることに留意されたい。この機能を表５Ａに示し、通常の反転していないチップ・セレクト復号を表５Ｂに示す。さらに、ブートは、表４に従って幅を選択することのできるブート・デバイスの幅を用いて、外部メモリから行うことができる。
【００２７】
ＡＲＭ７２０Ｔプロセッサは、４ギガバイトのアドレス空間を有する。図の実施形態では、集積回路１００は、このアドレス空間の下位の２ギガバイトをＲＯＭ／ＲＡＭ／フラッシュおよび拡張空間のために使用する。さらに０．５ギガバイトをＤＲＡＭに使用し、残りの１．５ギガバイト、内部レジスタについては８Ｋ未満は未使用である。
【００２８】
メモリおよびＩ／Ｏ拡張インタフェースは、外部の拡張メモリ２０４に対する６個の個別の線形のメモリまたは拡張セグメントをサポートする。他の２つのセグメントを、オンチップＳＲＡＭおよびＲＯＭ専用とする。各セグメントのサイズは２５６メガバイトである。６セグメントのいずれも、従来のＳＲＡＭインタフェースをサポートするのに使用することができる。さらに、各セグメントは個々に８ビット、１６ビット、または３２ビット幅にプログラムすることができ、これにより、ページ・モード・アクセスをサポートし、非連続アドレスの場合は１から８の待ち状態から実行し、バースト・モード・アクセスの場合は０から３の待ち状態から実行することができる。ゼロ待ち状態の逐次的機能により、集積回路１００がバースト・モードＲＯＭとのインタフェースをとることが可能になる。オンチップＲＯＭの空間が完全に復号されるのに対し、完全なＳＲＡＭアドレス空間が完全に復号されるのは、外部のＬＣＤを駆動させるのに使用されるビデオ・フレーム・バッファの最大サイズ（最高で１２８キロバイト）までであることに留意されたい。
【００２９】
チップ・セレクト信号ＮＣＳ４およびＮＣＳ５を使用して、２つのＰＣＣａｒｄカード２０５とのインタフェースを確立するために、拡張セグメントのうち２つを確保することができる。外部のＰＣカードとのインタフェースは、ＣｉｒｒｕｓＬｏｇｉｃＣＬ−ＰＳ６７００ＰＣカード・スロット・ドライバ２０６を通じて行うことが好ましい。メモリをセグメント化することにより、異なるタイプのアクセスを行うことが可能になる（すなわち属性、入出力、および共通のメモリ空間。）
【００３０】
拡張制御ブロック１１９へのＥＸＰＣＬＫポートは、１３ＭＨｚモードおよび１８ＭＨｚモードにおけるＣＰＵクロックと等しい拡張クロックを出力し、これは、集積回路１００が３６ＭＨｚ、４９ＭＨｚ、または７０ＭＨｚモードで動作する場合には、３６．８６４ＭＨｚの速度を有する。（ＥＸＰＣＬＫポートは、上記の１３ＭＨｚモードではクロック入力として使用される。）ＥＸＰＲＤＹピン（ＥｘｐａｎｓｉｏｎＰｏｒｔＲｅａｄｙ）は、バス・サイクルを延長し、待ち状態を挿入するために外部の拡張デバイスによってローに置かれる。チップ・セレクト信号ＣＳ［０：３］はＳＲＡＭの拡張に使用され、一方チップ・セレクト信号ＣＳ［４：５］はメモリ拡張またはＰＣカード選択の両方に使用することができる。書き込みストローブＷＲＩＴＥは、拡張デバイスからの読み出し中はローになり、拡張デバイスへの書き込み中はハイになる。ワード／ハーフワード・ビット（２）は、集積回路１００からの書き込み中に、アクセス・サイズがワードか、ハーフワードか、またはバイト単位かを外部デバイスに示す。ＤＲＡＭコントローラ１２０は、最高でＤＲＡＭの２バンク２０７へのプログラマブルな１６ビットまたは３２ビット幅のインタフェースを提供し、各バンクは２５６メガバイトまでの記憶容量を有する。ＤＲＡＭバンクは、市場に流通する複数種のＤＲＡＭのいずれでもよく、これには従来型のＤＲＡＭ、同期ＤＲＡＭ（ＳＤＲＡＭ）、ＥＤＯＤＲＡＭ（ＥｘｔｅｎｄｅｄＤａｔａＯｕｔＤＲＡＭ）、ファースト・ページ・モードＤＲＡＭ、およびＤＤＲＤＲＡＭ（ＤｏｕｂｌｅＤａｔａＲａｔｅＤＲＡＭ）が含まれる。さらに、これらのＤＲＡＭは、集積回路１００が上記の待機状態に入ると低電力状態に置かれるセルフリフレッシュ・タイプでもよい。２バンクをサポートするために、４つの列アドレス・ストローブＣＡＳ［０：３］とともに２つの行アドレス・ストローブＲＡＳ［０：１］を生成することができる。出力イネーブル信号ＭＯＥは、ＤＲＡＭ、ＲＯＭ／ＳＲＡＭ／フラッシュ、または拡張出力のイネーブルのために使用される。書き込みイネーブル信号ＮＷＥは、同じ外部デバイス・セットに使用される。また、ＤＲＡＭコントローラはプログラマブルなリフレッシュ・カウンタを含み、そのリフレッシュ周期はリフレッシュ周期レジスタ（ＤＲＦＰＲ）を使用して制御される。
【００３１】
好ましい物理ＤＲＡＭのアドレス指定を表６に示す。表７および８は、３２ビットＤＲＡＭおよび１６ビットＤＲＡＭのメモリ・システムについてのＤＲＡＭのアドレス・マッピングを表す。この３２ビットは、３２ビットＤＲＡＭの動作を選択して、各ＲＡＳラインに接続された２つのｘ１６デバイスに基づくものと想定する。このマッピングは、各バンクの２５６メガバイトごとに繰り返される。これらのテーブルのプレースホルダ「ｎ」は、０ｘＣ＋バンク番号に等しい。１６／３２ビットのＤＲＡＭ選択は、システム制御レジスタＳＹＳＣＯＮ２のビットをセットすることによってプログラムする。
【００３２】
フラッシュ・インタフェース１２１は、集積回路１００が、上記のチップ・セレクト信号ＣＳ［０：１］を使用してフラッシュ・メモリとのインタフェースをとることを可能にする。
【００３３】
ＬＣＤコントローラ１２２は、集積回路１００がシングル・パネルの多重型ＬＣＤモジュール２０９と直接インタフェースをとることを可能にするのに必要とされる制御信号をすべて提供する。合計フレーム・バッファ・サイズは、オンチップ・メモリとオフチップ・メモリの両方を使用して、最高１２８キロバイトまでにプログラムすることが可能である。上記のように、オンチップＳＲＡＭ１１８をＬＣＤビデオ・フレーム・バッファとして使用することにより、外部ＤＲＡＭを使用せずにシステムを構築することができる。画面は、ビデオ・フレーム・バッファにマッピングすることが好ましい。
【００３４】
ＬＣＤダイレクト・メモリ・アクセス（ＤＭＡ）エンジン１２３は、フレーム・バッファ・メモリからＬＣＤコントローラ１２２用の表示データをフェッチするために提供される。画素ビット・レート、したがってＬＣＤのリフレッシュ・レートは、１８．４３２〜７３．７２８ＭＨｚモードで動作する場合は１８・４３２ＭＨｚ〜５７６ｋＨＺにプログラムすることができ、１３ＭＨｚクロックにより動作する際は１３ＭＨｚ〜２０３ｋＨＺにプログラムすることができる。
【００３５】
集積回路１００は、１対の汎用非同期送受信（ＵＡＲＴ）インタフェース１２４および１２５を含む。これらの非同期ポートは、例えば１対のＲＳ−２３２トランシーバ２１０との通信に使用することができる。各ＵＡＲＴ１２４／１２５は、集積回路１００がＰＬＬ１０８で生成されるクロックによって動作する場合は、最高で毎秒１１５．２キロビットのデータ転送速度をサポートすることができる。集積回路１００が１３ＭＨｚの外部クロック・ソースによって駆動される場合、生成することが可能なＵＡＲＴのビット・レートには、９．６Ｋｂｐｓ、１９．２Ｋｂｐｓ、３８Ｋｂｐｓ、５８Ｋｂｐｓ、および１１５．２Ｋｂｐｓが含まれる。
【００３６】
ＵＡＲＴ１２４／１２５はどちらも、対応する送信（ＴＸ）ピンを駆動させる１６バイトの送信ＦＩＦＯと、専用の受信（ＲＸ）ピンからデータを受信するための１６バイトの受信ＦＩＦＯを含む。ＲＸ割込みは、所与のＲＸＦＩＦＯが半分満たされるか、またはそのＦＩＦＯがそれ以上の文字を受け取らず、３文字長タイムよりも長く非空である場合にアサートされる。ＴＸ割込みは、所与のＴＸＦＩＦＯバッファが半分空の状態に達すると必ずアサートされる。
【００３７】
ＵＡＲＴ１２４（ＵＡＲＴ１）は、ＲＸポートとＴＸポートに加えて、３つのモデム制御信号ＣＴＳ、ＤＳＲ、およびＤＣＤも受信することができる。この他のモデム制御ＲＩ入力および出力モデム制御信号ＲＴＳおよびＤＴＲは、下記でさらに説明するＧＰＩＯポート１２９を使用して実装することができる。ＵＡＲＴ１に対するモデム状態割込みは、これらのモデム制御ビットのいずれかが変化すると生成される。
【００３８】
ＵＡＲＴの動作とライン速度は、ＵＡＲＴビット・レートと、ライン制御レジスタ（ＵＢＬＣ１およびＵＢＬＣ２）を通じてプログラムすることができる。また、ＦＩＦＯの４つは１バイトの深さになるようにプログラムすることもできる。各バイトを受け取る際に検出されるフレーミング・エラー・ビットとパリティ・エラー・ビットは、１１ビット幅レジスタから読み出すこともできる。
【００３９】
集積回路１００は、ＵＡＲＴ１２４の出力時にＩｒＤＡ（赤外線データ協会）ＳＩＲプロトコルによる後処理段階１２６も含む。集積回路１００は、赤外線発光ダイオード（ＬＥＤ）と光ダイオード（図２のブロック２１１としてまとめて示す）への接続の入力部を駆動させるピンを含む。ＳＲＩエンコーダ１２６は、ＵＡＲＴ１のＴＸポートおよびＲＸポート中へと切り替えられ、上記の信号が赤外線インタフェースを直接駆動することができる。
【００４０】
集積回路１００はさらに、ＳＰＩ／Ｍｉｃｒｏｗｉｒｅマスタ・モード１２８ＫｂｐｓＡＤＣインタフェース１２７、およびシリアル・インタフェース１２８を含み、後者は図６にさらに詳細に示す。デジタル・オーディオ・ポートの好ましいシリアル・ピン割り当てを表１０に示す。ＳＰＩインタフェース１（ＡＤＣインタフェース）は、外部のアナログ／デジタル変換器２１２およびデジタイザ２１３との通信に使用することができる。シリアル・インタフェース・ブロック１２８は、マスタ／スレーブ・モードのＳＢＩ／Ｍｉｃｒｏｗｉｒｅ（ＳＳＩ２）インタフェース６０３、デジタル・オーディオ・インタフェース（ＤＡＩ）６０１、およびコーデック・インタフェース６０４を含み、これらはすべてマルチプレクサ６０２によって１セットの外部インタフェース・ピン上に多重化される。選択されたインタフェースは、図２のブロック２１４中の対応する回路を駆動させる。多重化は、システム制御レジスタ中の対応するフィールドをプログラムすることによって制御される。利用可能なシリアル・インタフェースのオプションの一覧を表１１に提供する。
【００４１】
ＡＤＣインタフェース１２７は、デフォルト・モードでは、ＭＡＸＩＭ、ＭＡＸ１４８／９周辺装置などのＳＳＩまたはＭｉｃｒｏｗｉｒｅ互換デバイスと互換性がある。ＡＤＣインタフェース１２７は、共通のＲＦＳ／ＴＦＳ線としてＮＡＤＣＣＳを使用して、アナログ・デバイスＡＤ７１８８／１２チップなどのデバイスともインタフェースをとることができる。集積回路１００がＭＡＸ１４８／９およびＡＤ７８１１／２を駆動する際の例示的なタイミング図を、それぞれ図７Ａおよび７Ｂとして提供する。例示的なＩ²Ｓインタフェースを図８に示す。
【００４２】
ＡＤＣインタフェース１２７のクロック出力周波数も、システム制御レジスタを使用して設定することができる。１８．４３２〜７３．７２８ＭＨｚの動作モードの場合、ＡＤＣクロック（ＡＤＣＣＬＫ）は、４ＭＨｚ、１６ＭＨｚ、６４ＭＨｚ、または１２８ＭＨｚに設定することができる。集積回路１００が外部で生成される１３．０ＭＨｚのクロックに応答して動作する場合、ＡＤＣクロックは４．２ｋＨｚ、１６．９ｋＨｚ、６７．７ｋＨｚ、または１３５．４ｋＨｚに設定することができる。サンプル・クロックＳＭＰＣＬＫは、常にシフト・クロック（ＡＤＣＣＬＫ）周波数の２倍の速度で動作する。利用可能なＡＤＣ周波数のオプションを表１２に示す。
【００４３】
ＡＤＣのシリアル出力ＡＤＣＯＵＴは、ＳＹＮＣＩＯレジスタにセットされたビットに応じて、８ビットまたは１６ビットのシフト・レジスタによって供給される。ＡＤＣのシリアル入力チャネルＡＤＣＩＮは、１６ビットのシフト・レジスタに取り込まれる。ＡＣＤクロックの同期パルスは、出力シフト・レジスタへの書き込みによってアクティブ化される。転送中に、システム状態フラグ・レジスタ中のＳＳＩＢＵＳＹ（同期シリアル・インタフェース）ビジー・ビットがセット。転送が完了し、１６ビットの読み出しシフト・レジスタ中に有効なデータがあると、ＳＳＥＯＴＩ割込みがアサートされ、ＳＳＩＢＵＳＹビットがクリアされる。サンプル・クロックＳＭＰＣＬＫは単独でイネーブルされる。
【００４４】
デジタル・オーディオ・インタフェース５０１は、図９に示すようなＣＤ品質のＡ／ＤおよびＤ／Ａ変換器へのインタフェースを提供する。（ＤＡＩはＩ²Ｓのサブセットである。）１６ビットのステレオ・デジタル・オーディオの１２８ビット・フレーム、オーディオ・サンプリング周波数で、個別の送信線および受信線により。各フレームは、右チャンネルの１６ビットと、左チャンネルのオーディオ・データの１６ビットのみを含むことに留意されたい。残りのビットはゼロにセットされる。
【００４５】
図１０は、ＤＡＩ６０１の動作を表す例示的なタイミング図である。左右クロック（ＬＲＣＫ）は、フレーム同期信号を提供する。シリアル・クロック（ＳＣＬＫ）はビット転送クロックであり、オーディオ・サンプル周波数の１２８倍に固定された速度を有することが好ましい。ＳＤＯＵＴ（ＳＤＡＴＡＯ）およびＳＤＩＮ（ＳＤＡＴＡＩ）はそれぞれ、再生データを外部のＤ／Ａ変換器に送信し、記録データを外部のＡ／Ｄ変換器から受信するために使用される。集積回路１００、外部のＤ／Ａ変換器および／または外部のＡ／Ｄ変換器間のタイミングは、オーバーサンプリングしたクロックＭＣＬＫに基づく。ＭＣＬＫは、サンプリング周波数の２５６倍に固定された速度を有することが好ましい。
【００４６】
非同期シリアル・インタフェース２（ＳＳＩ２）５０３は、フル・マスタ／スレーブ・モードで動作することのできるＳＰＩ／Ｍｉｃｒｏｗｉｒｅインタフェースである。図１０は、マスター／スレーブ方式で動作するように構成された１対の集積回路１００デバイスを表す。好ましい維持データ転送速度は８５．３Ｋｂｐｓであり、この速度は割込み間の周期が十分な長さになることを保証する。割込みは、受信ＦＩＦＯが半分満たされ、送信ＦＩＦＯが半分空になると生成される。スレーブ・モードでは、シリアル・クロック（ＳＳＩＣＬＫ）およびシリアル受信ポート（ＳＳＩＲＸＤＡ）、受信された同期制御ピン（ＳＳＩＲＸＦＲ）および送信同期ピン（ＳＳＩＴＸＦＲ）が入力になり、送信ピンＳＳＩＴＸＤＡが出力になる。マスタ・モードでは、ピンＳＳＩＣＬＫ、ＳＳＩＴＸＤＡ、ＳＳＩＴＸＦＲ、およびＳＳＩＲＸＦＲが出力になり、ピンＳＳＩＲＸＤＡが入力になる。モード選択は、システム制御レジスタのビットをプログラムすることによって行う。
【００４７】
非対称（非平衡）トラフィックおよび連続トラフィックはいずれも、個別の送信制御ラインおよびフレーム同期制御ラインＳＳＴＸＦＲおよびＳＳＩＲＸＦＲを使用してサポートされる。この構成では、受信ノードは、受信したフレーム同期制御信号をアサートした後に８クロックで１バイトのデータを受信し、送信ノードは、独立した送信フレーム同期制御パルスをアサートした後に８クロックで１バイトのデータを送信する。これら２つのインタフェースの動作を表す例示的なタイミング図を、参照として図７Ａおよび７Ｂに提供する。
【００４８】
コーデック・インタフェース６０４は、テレフォニー・コーデックへの直接の接続をサポートする。クロックおよびクロック信号の生成と合わせて、コーデック・インタフェース６０４はパラレルからシリアル、シリアルからパラレルへの変換も実行する。このインタフェースは全二重であり、それに対応する６４Ｋｂｐｓで動作する送信および受信ＦＩＦＯを用いる。許可されている場合は、８バイトが転送されるごとに（すなわちＦＩＦＯが半分満たされる／空になると）、換言すると１ミリ秒の待ち時間で１ミリ秒ごとに、コーデック割込みＣＳＩＮＴが生成される。このタイミングを図８に示すが、図中ＣＤＥＮＲＸとＣＤＥＮＴＸは、それぞれシステム制御レジスタＳＹＳＣＯＮ１中の受信制御ビットと送信制御ビットである。
【００４９】
ＤＡＩ６０１は、図９に示すインタフェース９００などのＩ²Ｓインタフェースをサポートする。この場合は、外部のＡＤＣ９０１および外部のＤＡＣ９０２の両方。クロック・ソース９０３は時間基準を提供する。例示的なタイミング図を図１０に提供する。図９および図１０で、ＭＣＬＫはオーバーサンプリングしたクロックであり、これは通例オーディオ・サンプリング周波数の２５６倍に固定する。ＳＣＬＫはビット・クロックであり、これは通例オーディオ・サンプリング周波数の１２８倍に固定する。ＬＣＬＫはフレーム同期信号であり、通例はオーディオ・サンプリング周波数に固定する。ＳＤＯＵＴは、再生デジタル・オーディオをＤＡＣ９０２に送信するオーディオ・データ出力である。ＳＤＩＮは、ＡＤＣ９０１から記録データを受け取る。
【００５０】
ＳＳＩ１インタフェース６０３は、図１１に示すようなマスター／スレーブ動作をサポートする。このインタフェースは、２つのノード間で全二重のシリアル転送を実行する手段を提供する。データは、クロックおよびフレーム同期信号に応答してバイト単位で転送される。
【００５１】
集積回路１００には、汎用入出力（ＧＰＩＯ）ポート１２９のセットも提供される。図の実施形態では、３つの８ビット・ポートと１つの３ビット・ポートがある。ＧＰＩＯポートは、キーボード・ドライバ２１５とのインタフェースを確立するなどの目的に使用することができる。
【００５２】
ＰＷＭ（Ｐｕｌｓｅｄｗｉｔｈｍｏｄｕｌａｔｏｒ）回路１３０は、外部の電源供給ユニット（ＰＳＵ）サブシステム２０１と協働して動作するＤＣ／ＤＣ２１６コンバータを駆動させるための２つの出力を含む。通例、外部のＤＣ／ＤＣコンバータ出力を監視するコンパレータからの出力に接続される外部の入力ピンは、これらのクロックをイネーブルにするために使用される。集積回路１００が内部のＰＬＬ１０８によって動作する場合、ＰＷＭクロックは９６ｋＨｚの周波数を有する。これらの信号の負荷サイクル比は、１：１６から１５：１６までにプログラムすることができる。ＰＷＭ駆動信号のアクティブ・サイクルの感知は、パワー・オン・リセット中に駆動信号の状態をラッチすることによりハイまたはローに設定することができる（すなわち駆動信号をプルアップすると駆動の出力がアクティブ・ローになり、その逆も同様）。この結果、外部のＤＣ／ＤＣコンバータにより、正電圧または負電圧を生成することができる。これらの出力も同様に、制御レジスタ中のビットをクリアすることにより使用不可にすることができる。
【００５３】
集積回路１００のブロック間の通信は、アドバンスト・ペリフェラル・バス１３２およびアドバンスト・ペリフェラル・バス・ブリッジ１３１を通じて確立される。内部データ・バス１０６は３２ビット幅であり、多重化回路１３２を通じて外部デバイスに接続することができる。内部アドレス・バス１０７は２８ビット幅であり、多重化回路１３３を通じて外部デバイスと通信することができる。ＩＥＥＥ１１４９．１準拠であるＩＣＥ−ＪＴＡＧ回路１３４は、試験および開発段階におけるバウンダリ・スキャンのために含まれる。また、埋め込みＩＣＥはＡＲＭプロセッサ・コアのデバッグをサポートする。
【００５４】
好ましい実施形態では、集積回路１００の内部レジスタはリトル・エンディアン構成である。ただし、集積回路１００はビッグ・エンディアンの外部メモリ・システムとのインタフェースをとることができる点が有利である。具体的には、ビッグ・エンド・ビットとＣＰＵ１０１のレジスタ・セットによって、外部メモリ中のワードがビッグ・エンディアン形式で記憶されるか、またはリトル・エンディアン形式で記憶されるかが決まる。具体的には、メモリは、ゼロから上へと番号をつけたバイトの線形集合と見なされる。バイト０〜３が最初の記憶ワードを保持し、バイト４〜７が２番目の記憶ワードを保持し、以下同様に続く。リトル・エンディアン方式の場合は、ワード中の最も番号の低いバイトをそのワードの最下位バイトと見なし、最も番号の高いバイトが最上位ワードになる。したがって、リトル・エンディアン・システムにおけるバイト・ゼロは、データ・ライン７〜０に接続される。ビッグ・エンディアン方式では、ワードの最上位バイトが最も番号の低いバイトに記憶され、最下位バイトが最も番号の高いバイトに記憶される。したがって、ビッグ・エンディアン・システムにおけるバイト・ゼロは、データ・ライン３１〜２４に接続される。図の実施形態では、ロード命令および記憶命令のみをエンディアン方式で行う。表１３および１４は、読み出し（表１３）および書き込み（表１４）両方についての集積回路１００の動作を表す。ＤＲＡＭバンクへの列アドレス・ストローブ・ラインＮＣＡＳ［３：０］は、エンディアン方式に関係なく常に同じバイト・レーンに接続されることに留意されたい。例えば、列アドレス・ストローブ・ラインＮＣＡＳ［０］はデータ・ラインＤ［７：０］と関連付けられ、ＮＣＡＳ［３］はデータ・ラインＤ［３１：２４］に関連付けられる。この結果、リトル・エンディアン・システムでは、ラインＮＣＡＳ［０］は、ＤＲＡＭのバイト０との読み出し／書き込みのためにアサートされ、ビッグ・エンディアン・システムでは、ラインＮＣＡＳ［３］はＤＲＡＭのバイト０にアクセスするためにアサートされる。
【００５５】
集積回路１００は、１セットのプログラマブルなヒューズを含み、これにより各チップに１つまたは複数の一意のＩＤ番号とパスワードを割り当てることができる。プログラマブルなヒューズとそれに関連するレジスタは、セキュリティ・レジスタと、ＡＰＢ１３２から動作するハードウェア・ブロック１３３（図１）内とに配置される。図１Ｄの実施形態に限定すると、ブートＲＯＭ自体はＡＲＭローカル・バス１０７上にあり、アクセス・チェッキングは分割されて、ＡＲＭローカルバス上とＡＲＭローカル／グローバルＡＨＢラッパ内の両方にロジックを有することになる。
【００５６】
好ましい実施形態では、これは２５６個のプログラマブルなヒューズであり、パブリックなヒューズとプライベートなヒューズのセットを含む。プライベート・ヒューズのアドレスおよび値は隠され、それらのヒューズに対応するプライベートなファームウェアだけにアクセスが許可される。非プライベートの環境では、これらのアドレスと値はすべてゼロに戻る。パブリック・ヒューズを表１５に示し、プライベート・ヒューズを表１６に示す。
【００５７】
集積回路１００は、ヒューズされたハミング・コードと、選択されたＩＤに一致するハミング・コードを照合するための埋め込みハードウェアもブロック１３３中に含む。妥当性検査アドレスが読み出されると、ＩＤの値をそのハミング値と突きあわせて確認する。この結果得られる５ビットのコードは、ハミング・コードが一致しない場合はデバッグ情報を提供する（すべてのヒューズが飛んだ場合、またはすべてのヒューズが飛ばない場合）。表１７は、妥当性検査読み出しビットの復号を示す。これにより、ヒューズが飛んだ際に発生しているエラーを検出し、同時にヒューズの値とアドレスへのアクセスが不可能な状態を保持することが可能になるので好都合である。
【００５８】
表１８は、パブリックのＩＤ−ＣＨＫペアの妥当性検査コードを戻すアドレスを提供する。
【００５９】
表１９は、プライベートのＩＤ−ＣＨＫペアの妥当性検査コードを戻すアドレスを提供する。これらのアドレスには、集積回路１００がプライベート・モードで動作している際はファームウェアからしかアクセスすることができず、それ以外の場合は０が読み出される。
【００６０】
ハミング・コード・ジェネレータを十分にテストできるように、ＩＤ−ＣＨＫペアとして選択し妥当性を検査することのできる２つのテスト・レジスタがある。これらレジスタの定義と位置を表２０に提供する。
【００６１】
図１Ｂは、本発明の原理を実施するのに適したチップ１４０における第２のシステムのハイ・レベル機能ブロック図である。この実施形態では、ＭＭＵの他に命令キャッシュとデータ・キャッシュの両方を有するＡＲＭ９２０Ｔプロセッサ１４１を使用する。システム１４１は、集積回路１００とは異なり汎用ＳＲＡＭは含まない。
【００６２】
図１３は、特にＡＲＭ９２０Ｔコアに基づく実施形態の、プロセッサ１４１のより詳細な機能ブロック図である。この実施形態では、利用可能なキャッシュは、命令キャッシュ１３０１とデータ・キャッシュ１３０２の両方を含む。同様に、個別の命令用ＭＭＵ１３０３とデータ用ＭＭＵ１３０４を使用する。命令修正仮想アドレス（ＩＭＶＡ）バス、命令物理アドレス（ＩＰＡ）バス、および命令データ（ＩＤ）バスはそれぞれ３２ビット幅である。同様に、データ修正仮想アドレス（ＤＶＭＡ）バス、データ物理アドレス（ＤＰＡ）バス、およびデータ・データ（ＤＤ）バスも３２ビット幅である。物理アドレスとデータは、ＡＭＢＡバス・インタフェース１３０５を通じてＡＨＢバス１４２と交換される。書き込みバッファ１３０６は、プロセッサ・コアの動作中にインタフェース１３０５を通じてデータをパラレルに交換することを可能にする。キャッシュ１３０２のデータは、ライトバック物理アドレス（ＰＴＡＧ）ＲＡＭ１３０７を通じて出力することができる。
【００６３】
プロセッサコアに不可欠なのはコプロセッサであるが、これはＣＰＵによって発行された仮想アドレスを、図１Ｂに示すＩＭＶＡおよびＤＭＶＡで伝送される、修正された命令およびデータの仮想アドレス（ＭＶＡ）に変換するためのレジスタを含む。具体的には、０〜３２メガバイトのメモリ領域のアドレスの場合、仮想アドレスＶＡは、７ビットのプロセス識別子によって修正されてＶＭＡ＝ＶＡ＋（ＰｒｏｃＩＤｘ３２メガバイト）となり、このプロセス識別子ＰｒｏｃＩＤは読み出しまたは書き込みプロセスの識別子である。
【００６４】
システム１４１は、内部ＡＨＢ（アドバンスト・マイクロコントローラ・バス・アーキテクチャ・ハイスピード・バス）１４２、ならびに内部ＡＰＢ（アドバンスト・ペリフェラル・バス）１４３に基づく。ＡＨＢ／ＡＰＢブリッジ１４４は、ＡＨＢ１４２およびＡＰＢ１４３とのインタフェースをとる。第２ブリッジ１４５は、プロセッサ１４１とＡＨＢ１４２のインタフェースをとる。
【００６５】
ＡＨＢ１４２から動作するデバイスには、グラフィックス・エンジン１４６とラスタ・エンジン１４７がある。一般に、グラフィックス・エンジンは、システムのグラフィック性能を向上するために、プロセッサ１４１からブロック転送や描線などの機能をオフロードする。グラフィックス・エンジン１４６は、Ｗｉｎｄｏｗｓ（登録商標）ＣＥをサポートするために標準的なデバイス非依存ビットマップ（ＤＩＰ）フォーマットを使用することが好ましい。ラスタ・エンジン１４７は、外部のＬＣＤ表示装置、ＣＲＴ表示装置、またはＴＶ表示装置を駆動するために、同期ＤＲＡＭインタフェース１４８を通じて外部のディスプレイ・バッファからのデータをラスタ化するために提供される。
【００６６】
内部ＡＨＢに通じる他のオンチップ・インタフェースには、システム１４１を外部のシステム・バスに結合するインタフェース１４９、外部のＰＣカードとインタフェースをとるためのＰＣＭＩＡ、およびＤＭＡコントローラやラスタ・システムなどのオンチップ回路ブロックをテストするテスト・インタフェース・コントローラ（ＴＩＣ）インタフェース１５１が含まれる。メモリ・インタフェース１５２は、上記の方式と同様の方式で、外部ＳＲＡＭ、フラッシュ、またはＲＯＭと制御信号およびデータを交換することを可能にする。そして、下記でさらに説明するシステムのブートは、少なくとも部分的にブートＲＯＭ５３を使用して行われる。この例では、ブートＲＯＭ１５３はＡＨＢ１４２から動作するが、代替実施形態では複数のグローバル・バスおよびローカル・バスのいずれからも動作することができる。
【００６７】
システム１４０は８チャンネルのＤＭＡエンジン１５４を含み、これはＵＡＲＴなどのオンチップ・リソースによる外部メモリへのアクセスの要求に優先順位をつけ、サービスする。ジョイント・テスト・アクション・グループ（ＪＴＡＧ）ポート１５５は、オンチップ・プロセッサと関連する回路のデバッグをサポートする。また、ユニバーサル・シリアル・バス（ＵＳＢ）コントローラ１５６およびイーサネット（登録商標）・ポート１５７は、直接ＡＨＢから動作する。
【００６８】
複数の周辺デバイスがオンチップに提供され、ＡＰＢ１４３から動作する。システム１４０は中でも特に、ＵＡＲＴ１５８、１５９、および１６０を含む。また、図の実施形態には、１対のＳＰ１インタフェース１６１と１６２、およびＡＣ９７インタフェース１６３が含まれる。実時間クロック（ＲＴＣ）１６５、一般的なタイマー・セット１６６、およびウォッチドッグ・タイマ１６７もこの実施形態に提供される。追加のメモリ・インタフェース、ＥＥＰＲＯＭインタフェース１６８もＡＰＢに結合する。
【００６９】
データの手動入力は、キー・マトリックス・インタフェース１６９に結合された外部のキー・マトリクスか、またはタッチ・スクリーンＡＤＣ１７１およびタッチ・スクリーン・インタフェース１７０とインタフェースをとるタッチ・スクリーンを通じて行うことができる。ＬＥＤ出力１７２も、システム１４０のユーザ・インタフェースに含まれる。
【００７０】
集積回路１００と同様に、システム１４０は、汎用入出力（ＧＰＩＯ）ポートのセット１７３と、割込みコントローラ１７４と、システム制御回路１７６を駆動させるオンチップＰＬＬ１７５とを含む。制御回路は、メモリのリマップおよびシステム一時停止制御回路１７７を含む。フラッシュＶＰＰ制御ブロック１７８は、外部フラッシュの書き込みおよび消去を行うのに必要な電圧を生成する。
【００７１】
図１Ｃは、本発明の原理を適切に適用することのできる別の例示的なシステムオンチップ１８０のハイ・レベル機能ブロック図である。この例では、例えばＡＲＭ７ＴＤＭＩコントローラなどのＣＰＵコア１８１は、ＭＭＵやオンチップ・キャッシュを利用しない。ＣＰＵ１８１は、ローカルＡＨＢバスとローカル／メインＡＨＢインタフェース１８２を介して、ＡＨＢバス１４２と協働して動作する。ＣＰＵ１８１は、メモリ１８３、セキュリティ・ゲート１８４、およびセキュリティ／リセット回路１８５によってサポートされる。セキュリティについては下記でさらに詳細に説明する。
【００７２】
この実施形態では、システム１８０はさらに、グローバル・メモリ１８９、データ・メモリ１９０、およびプログラム・メモリ１９１によってサポートされるデジタル信号プロセッサ（ＤＳＰ）１８６を含む。インタープロセッサ通信レジスタ１９２と、Ｉ²Ｓオーディオ入出力ポート１９３と、アナログＤＡＣサポート回路を使用せずに外部スピーカをＣＤ品質のレベルで駆動させることのできるＰＷＭ回路１９４と、ＤＳＰタイマ／ＳＴＣ１９５は、ＤＳＰペリフェラル・バス１９６を介してＤＳＰ１８６と通信する。これらのデバイスは、ＡＰＢからも動作する。ＡＰＢから動作する周辺デバイスには、ＵＳＢスレーブ・ポート１９７、シリアル・メディア入力用のＳＰＩ１９８、およびＩ²Ｓホスト・ポート１９９が含まれる。
【００７３】
モーション・ピクチャ・エキスパート・グループ（ＭＰＥＧ）オーディオ圧縮規格は、デジタル化されたオーディオ・データの符号化ストリームを復号するプロセスと併せて、ストリームの構文を定義している。オーディオ・アリーナでは、３つのレイヤ、すなわちレイヤＩ〜ＩＩＩがそれぞれ定義される。この説明のために、最も高品質のオーディオ再生を提供するレイヤＩＩＩを考察する。
【００７４】
符号化のプロセスは、１つまたは複数のオーディオ・チャンネルを、３２ｋＨｚ、４４．１ｋＨｚ、あるいは４６ｋＨｚなどの所与のサンプリング・レートでサンプリングすることから開始する。この結果得られるデジタル化ストリームを多相フィルタ・バンクに通し、これにより受信した時間領域のストリームを３２個の周波数サブバンドに分割する。通例、フィルタ・バンクは５０％の重複により一度に６４個の入力サンプルに作用し、３２個の入力時間領域サンプルに対して３２個の周波数領域のサンプルが生成される。
【００７５】
音響心理学モデルを使用して、聴覚マスキングのために人間の耳には聞こえないオーディオ信号部分を除去する。聴覚マスキングは人間の聴覚システムの特徴であり、強いオーディオ信号によって時間的または空間的に近くにある弱いオーディオ信号が知覚できなくなるというものである。さらに、人間の耳が音を区別する能力は周波数に依存する。特定のクリティカルな帯域では、人間の耳は、入ってくる様々なオーディオ成分を精密に区別（ｄｅｌｉｎｅａｔｅｂｅｔｗｅｅｎ）しない。こうしたクリティカルな聴覚帯域を近似させる処理用のサブバンドは、そのサブバンド内の量子化ノイズの可聴度に応じて量子化される。
【００７６】
音響心理学モデルに基づくエンジンは、多相フィルタと並行して動作し、所与の周波数成分と所与の音量に対して利用可能なノイズ・マスキングを決定する。この情報に基づき、多相フィルタからのデータ・ストリーム出力を量子化し、符号化する。レイヤＩＩＩでは、多相フィルタからの３２個のサブバンド出力のそれぞれをウィンドウに通し、これにより、ウィンドウの長さがそれぞれ３６サンプル幅と１２サンプル幅になるように５０％の重複により、ストリームを１８サンプルからなるロング・ブロックか、または６サンプルからなるショート・ブロックに構文解析する。ロング・ブロックは、比較的一定したオーディオ信号成分に対してより優れた周波数分解能を達成するために使用し、ショート・ブロックは過渡部分の分解能を向上させるために使用する。次いで、各サブバンドのブロックを変形離散コサイン変換（ＭＤＣＴ）で処理する。サブバンドの周波数をさらに分割することによりスペクトル分解能を向上して、多相フィルタによって生じたエイリアシングの一部を相殺することができるようにする。
【００７７】
ＭＰＥＧｘのレイヤＩＩＩでは、量子化を非均一にすることにより、量子化の値の範囲にわたる信号対ノイズ比をより一定にする。また、レイヤＩＩＩでは、クリティカルな帯域の幅を近似するスケール因数帯域を利用し、いくつかのＭＤＣＴ係数をカバーする。スケール因数はノイズ割り当ての際に使用され、周波数に依存するマスキング閾値を変え、また基本的に各サブバンドの利得を設定する。さらに、データ圧縮を高めるために量子化したＭＤＣＴ係数にハフマン符号化を行う。最後にビット・リザーバ（ｒｅｓｅｒｖｏｉｒ）」を使用するが、あるフレームを符号化するのに必要なビット数が平均よりも少ない際はこれにビットを提供することができ、あるフレームを符号化するのに必要なビット数が平均よりも多い際にはここからビットを借りることができる。
【００７８】
フレームはヘッダ、ＣＲＣ値、サイド情報、およびメイン・データから形成されるが、フレームのこれら構成要素の相対的位置は必ずしも常に同じ順序ではなく、またストリーム中で隣接するとも限らない。ヘッダには、フレーム同期ビットのセット、ＭＰＥＧのバージョンおよびレイヤ識別子、ＣＲＣ保護ビット、そのフレームが作成された際のビット・レートを示すビット・レート・インデックス、およびオーディオ・データをサンプリングした際の周波数を示すサンプリング・レート周波数インデックスが含まれ、また移送されるデータに関する他の情報が含まれる。
【００７９】
その後、ＭＰＥＧ１のレイヤＩＩＩのビット・ストリームは一般に以下の要領で復号することができる。データは、毎秒所定のフレーム数でデコーダに入力される。各フレームのヘッダ部分中のフレーム同期ビットを検出する。次いで、スケール因数を抽出し、復号する。これに続いて、周波数エネルギを表すハフマン符号化されたメイン・データを復号する。スケール因数を適用し、データを再量子化する。この時ステレオ・データを処理する場合はステレオ・チャンネルを回復し、エイリアシングの低減を行う。逆ＭＤＣＴの操作を行い、それに続き重複逆離散コサイン変換（ＤＣＴ）を行ってデータを時間領域に戻す。ロー・パス・フィルタをかけてＰＣＭサンプルを回復するが、各サンプルは基本的に互いに隣り合う５１２個の時間領域サンプルの加重平均になる。
【００８０】
集積回路１００をＭＥＰＧｘのレイヤＩＩＩデコーダとして構成する場合は、ＣｉｒｒｕｓＬｏｇｉｃＣＳ４３ＬｘｘＳｔｅｒｅｏＡｕｄｉｏＤＡＣなどのステレオＤＡＣを、ヘッドフォン・セットの駆動のためにデジタル・オーディオ・ポート１２８に結合する。マイクロフォンからデータを入力するために、ＣｉｒｒｕｓＬｏｇｉｃＣＳ５３Ｌ３２オーディオＡ／Ｄ変換器などのアナログ／デジタル変換器もこのポートに結合することができる。図１Ｄのこの実施形態は、外部のステレオ・オーディオＤＡＣを用いずにＣＤ品質レベルでヘッドフォンを駆動することが可能なオンチップのＰＷＭ回路を含む。
【００８１】
しばしば、電子製品にバンドルされるソフトウェアやファームウェアの改ざん、複製、またはロジック・アナライザによる調査を防止することが必要とされる。この結果、一定レベルのセキュリティを提供しなければならない。これは例えば暗号化されたパスワードを使用して行われ、製造者に許可を受けたエンド・ユーザが、ソフトウェアやファームウェアのダウンロード、デバッグ、およびアップグレードの目的でシステム・メモリ中の資産にアクセスすることは許可するが、許可されていないエンド・ユーザに対しては同じレベルのアクセスを拒否する。デジタル・オーディオ・プレーヤの文脈では、オンラインの音楽配信業者がこれにより、無許可のダウンロードは少なくともある程度阻止されるという認識に基づき、使用料を支払い、歌曲をダウンロードするのに必要なパスワードを受け取ったエンド・ユーザを許可する信頼感を得ることができる。
【００８２】
一般には、セキュリティ方式が満たさなければならないいくつかの基準がある。第１に、システムは、パワーオン・リセットを行った結果無許可のアクセスを許してはならない。第２に、暗号化されたパスワード、セキュリティ・コード、およびセキュリティ情報が存在するメモリ中の位置に関する情報などのセキュアな情報に、システムの外側から容易にアクセスすることができてはならない。ただし、このセキュアな情報は製造試験段階で確認を行って、通常の製造欠陥に関してはエンド・ユーザ・システムの品質が許容できるものになることを保証しなければならない。最後に、セキュリティ措置が提供されないか、または起動されない場合に、システムの通常動作が期待される方式で進行しなればならない。
【００８３】
本発明の原理は、集積回路１００が上記基準のそれぞれを満たすことを可能にするセキュリティ技術を提供し、好都合である。この技術の１つによると、特定のデフォルト条件または特定命令の動的なアサートに応じて上記のチップ・セレクト信号の復号を逆転させるプロセッサ１０１の能力を使用する。パワーオン・リセット時にチップ・セレクトの復号を逆にすることにより、通常はアクセスが不可能なメモリ空間からセキュリティ・コードを実行することができる。さらに、プロセッサ１０１のこの機能は、プロセッサ１０１が命令を実行しない特定の時間にしか起動することができず、これによりあらゆるセキュリティ侵害の試みを難しくする。
【００８４】
図１２は、本発明の概念に従って集積回路１００をブートする好ましい手順１２００を説明するフロー・チャートである。プロセッサ１０１はＡＲＭ７２０ＴプロセッサまたはＡＲＭ９２０Ｔプロセッサであると想定し、また信号名は信号および／またはそれらの命令に基づくものとする。この手順は、ステップ１２０１でパワーオン・リセット（ＮＰＯＲ）信号をアサートすることにより、集積回路１００をパワーオン・リセットすることから開始する。システム中の回路は直ちにすべてのハードウェアとデバッグ機能を使用不可にし、すべてのセキュリティ要素（例えばファームウェア、レジスタ、パスワード）を外部からの調査から隠す（ステップ１２０２）。このステップにより、少なくともステップ１２０３でセキュリティ・ファームウェア・ルーチンが適所にあり使用可能になっているかどうかを確認するまでは、システムがセキュアであることが保証される。好ましい実施形態では、これはプログラマブルなヒューズ・レジスタを読み出すことによって行われる。
【００８５】
説明のために、セキュリティが提供されない、または不能になっている場合を最初に説明する。ステップ１２０４で、内部ＲＯＭからブートを続けるか、または外部メモリを使用するかを決定する。ＡＲＭプロセッサによる実施形態の場合は、ＮＭＥＤＣＨＧビットを使用して、内部ブート・メモリと外部ブート・メモリのオプションから選択する。ステップ１２０４で、ピンＮＭＥＤＣＨＧの信号がクリアである（すなわちアクティブ・ロー状態である）場合は、集積回路１００のブートは内部ＲＯＭから行われる。この場合は、ステップ１２０５で、デフォルトで内部のブートＲＯＭへのアドレス・マッピングが逆転される。アドレス・マッピングを逆転すると、実行は現在のブートＲＯＭ位置０から行われることになる（ステップ１２０７）。図のこの実施形態では、パワーリセット信号ＮＰＯＲをアサートして、アドレス・マッピングを通常の状態に戻さなければならない。
【００８６】
あるいは、ＮＭＥＤＣＨＧビットがセットされている（すなわちアクティブ・ハイ状態にある）場合は、ブートは外部メモリ（ＲＯＭ／ＥＰＲＯＭ／フラッシュ）から行われる。この場合、チップ・セレクト・マッピングは表５Ａに示すようにセットされ、外部のＣｈｉｐＳｅｌｅｃｔ０がブート・メモリとして選択される。
【００８７】
次いで、プログラマブルなヒューズ・レジスタの読み出しが、セキュリティ・ルーチンが適所にあり使用可能になっていることを示す場合を考えてみたい。ブートはステップ１２０８で分岐して、セキュリティ手順の実行に進む。
【００８８】
集積回路１００は、ブートおよび／またはセキュリティ・コードの異なるセットに対応するように構成することができる。これは、各ベンダのセキュア情報にはそのベンダ自身のブート／セキュリティ手順によってしかアクセスできない場合でも、集積回路１００が複数のベンダによるブート／セキュリティ用のファームウェアを使用して動作することを可能にするため好都合である。最初に、複数のブート・コードのセットまたはオプションを用いてブート・メモリをプログラムする。これは、内部のブートＲＯＭか、または１つまたは複数の外部メモリ・チップ（ＲＯＭ／ＲＡＭ／フラッシュ）を使用して行うことができる。複数のブート・オプションにより、エンド・ユーザは様々なベンダから入手可能なセキュリティ・ファームウェアの中から選択することができる。
【００８９】
この結果、ステップ１２０９でブート・メモリ内のブート・オプションのうち最初のオプションが識別され、ステップ１２１０でリセット・ベクトルにエイリアスされるが、通例これは第１のオプションについては位置０ｘ００である。所与の実装に必要とされるすべての必要なセキュリティ要素（レジスタ、ファームウェア、Ｉ／Ｏデバイス）は、現在のブート・オプションによって使用可能になり、一方その他のセキュリティ・オプション（実装）はすべて隠された状態に保たれる（ステップ１２１１）。次いで、ステップ１２１２で、選択されたブート・コードをプロセッサで実行して、選択されたセキュリティ・ファームウェア／ソフトウェアの初期化を試みる。
【００９０】
ステップ１２１３でそのブート・コードで呼び出される該当セキュリティ・ファームウェア／ソフトウェアがメモリ中にあった場合、集積回路１００はブートを完了し、ステップ１２１４でそのセキュリティ・ファームウェア／ソフトウェアの監視制御下で、選択されたセキュアな環境で動作する。一方、必要とされるセキュリティ・ファームウェア／ソフトウェアが見つからない場合は、別のブート・オプションを試さなければならない。
【００９１】
ステップ１２１５で最後のセキュリティ・オプションに達さなかった場合は、ブート・コード中の次のセキュリティ・オプションを選択する（ステップ１２１６）。プロセッサに新しいリセット・ベクトルを動的に強制する命令が発行される。この例では、リセット・ベクトルはジャンプしてブート・コード中の２番目のセキュリティ・オプションをポイントする。ステップ１２１８で、処理はステップ１２１１に戻り、ブート・プロセスを再度試みる。図の実施形態では、命令パイプラインは３段であることに留意されたい。この結果、チップ・セレクトを変更する命令の実行前に、プログラム・カウンタを０にリセットする命令がすでに内部ブートＲＯＭからロードされていることになる。ＭＯＶｐｃ，＃０の命令により、プロセッサ・パイプラインをフラッシュし、それによりチップ・セレクトを行う前に数サイクルを行わせる。このプロセス中には、そのチップ・セレクト信号がリマップ・コマンドの実行中に変化するメモリ・リソースに対して他のアクセスは一切許可されない。
【００９２】
このプロセスは、ステップ１２１４で集積回路１００をセキュア動作に入らせるセキュリティ・オプションが見つかるか、またはステップ１２１５で最後のセキュリティ・オプションに達するまで反復する。図の実施形態では、最後のオプションまたはデフォルトのオプションは、ステップ１２１９でブート手順を通常の（セキュアでない）ブートに戻す。ここで、すべてのデバッグ機能が可能になり、ステップ１２２０でセキュリティ機能が隠される。ステップ１２２１で、デフォルトのブートＲＯＭを選択し、ステップ１２２２でプロセッサにリセット・ベクトルを動的に強制する。それでも、代替実施形態では、主要なオプションがすべて使用不可能であった場合でも集積回路１００がなおセキュアな環境で動作できるように、デフォルトのセキュリティ・コードを提供することが可能である。
【００９３】
ＡＲＭ９２０Ｔに基づく集積回路１００による実施形態では、命令およびデータを対応する命令キャッシュおよびデータ・キャッシュ内にロックすることができ、これらはキャッシュ・ミスの際に置換アルゴリズムによる置換の犠牲として選択されない。データ／命令をロックすることにより、対応する情報が直接キャッシュからフェッチされるキャッシュ・ヒットと、好適なキャッシュ・アクセス待ち時間が保証される。さらに、ロックされたキャッシュ内の情報には、キャッシュまたはＴＬＢメモリへの可視性を可能にするＪＴＡＧポートや他のテスト・デバッグ・モードを通じて以外は、集積回路１００の外側からアクセスすることはできない。主に製品の開発と試験の際に使用されるＪＴＡＧポートは、集積回路１００が製造場所を離れると使用不可にすることができる。
【００９４】
キャッシュ・エントリをロックする前に、対応する記述子（物理アドレスと許可）を関連付けられた変換索引バッファ（ＴＬＢ）中にロックして、予測可能なパフォーマンス結果を得なければならない。この例で使用するＡＲＭ９２０Ｔなどの多くのデバイスは、キャッシュに加えて、データと命令両方の変換索引バッファ（ＴＬＢ）を含む。所与の命令またはデータ・フィールドに対して、ＣＰＵは仮想アドレスを生成する。次いで修正された仮想アドレスを対応するＴＬＢに提示し、修正された仮想アドレスのフィールドとＴＬＢ中の比較（タグ）レジスタとを比較する。一致があり、アクセスを許可することができる（これはＴＬＢエントリ中の許可フィールドによって決定される）場合、修正された仮想アドレスのインデックス・ビットと併せて、対応するＴＬＢエントリから戻される物理アドレス・ビットを使用して物理アドレスを生成し、必要に応じてキャッシュまたは外部メモリにアクセスする。一致がない場合は、下記のプロセスを起動してハードウェア内で仮想アドレスを物理アドレスに変換する。
【００９５】
キャッシュ・ラインがロックされると、データ用ＴＬＢおよび命令用ＴＬＢ中の対応するエントリもロックされ、ＴＬＢ更新中の置換の対象から除外される。ＡＲＭ９２０Ｔプロセッサの場合、ＴＬＢのエントリは、ロックされるデータおよび命令のＴＬＢ中の特定のエントリについての識別子を、システム制御プロセッサ・レジスタＣ１５のＴＬＢロック・ダウン・フィールドに書き込むことによってロックする。
【００９６】
図１３のＴＬＢロックダウン手順１３００は、命令用ＴＬＢまたはデータ用ＴＬＢのエントリをロックする１手法である。ステップ１３０１で、保護するデータまたは命令に対応する物理アドレス・ビットおよび許可を含むページ・テーブルをセットアップする。次いで、対象とするＴＬＢ中の少なくとも一部のエントリをフラッシュまたはクリーンして、ロックするコードがすでにＴＬＢレジスタ中に入っていないことを確実にする（ステップ１３０２）。
【００９７】
ＡＲＭ９２０Ｔプロセッサを用いる実施形態では、データ用ＴＬＢと命令用ＴＬＢの両方を６４ラインの単一セグメント中に編成する。置換（犠牲）カウンタは、置換するエントリをポイントする。したがって、ステップ１３０３で、置換カウンタを更新して、ロックされた情報を書き込む次のエントリをポイントさせる。好ましい実施形態では、このプロセスはエントリ０から開始する。
【００９８】
命令用ＴＬＢにはプリフェッチ命令を使用して、修正された仮想アドレスを生成し、ＴＬＢのミスを強制的に起こす（ステップ１３０４）。データ用ＴＬＢの場合は、ロード命令を使用してミスを起こすことができる。ミスが起こった後でページ・テーブル・ウォークを行って記述子（例えば物理アドレスおよび許可）を生成し、ＴＬＢ中にロードしなければならない（ステップ１３０５）。ステップ１３０６で、アクセスされたページ・テーブル・エントリの物理アドレス・ビットと修正された仮想アドレスからのインデックス・ビットを使用して、ページ・テーブル・ウォークで生成された記述子を、所与のＴＬＢの現在の置換カウンタ内容がポイントするエントリにロードする。
【００９９】
ＡＲＭＴ９２０による実施形態では、対応するＴＬＢのロックダウン・レジスタのビットを設定することにより、ロードされたＴＬＢエントリをステップ１３０７でロックする。ステップ１３０８で最後のエントリに達した場合は手順が終了し、それ以外の場合はステップ１３０９で手順がステップ１３０３にループバックし、次のエントリのロードに備えて置換カウンタが更新される。
【０１００】
ＴＬＢのエントリをロックすると、キャッシュ内で対応するデータまたはコードをロックすることができる。説明のために、ＡＲＭ９２０Ｔの命令キャッシュ内に命令をロックする場合を考えてみたい。データ・キャッシュの場合も同様である。本発明の概念は、ＡＲＭプロセッサを用いるシステムに限定されず、ロック可能な命令および／またはデータ・キャッシュを含む任意のシステムまたはデバイスに適用できることにも留意されたい。
【０１０１】
図１４は、セキュアなコードをキャッシュにロックするためのキャッシュ・ロックダウン手順１４００を示す。下記でさらに説明するように、図の実施形態では、ロック操作を行うにはキャッシュ・ミスを強制的に起こさなければならない。キャッシュ・ミスを強制的に起こす好ましい方法は、下記で図１５との関連で説明する。
【０１０２】
ステップ１４０１で、実際のページ・テーブルまたはエミュレートしたページ・テーブルを、キャッシュ内にロックすべきデータまたは命令が存在するメモリ位置の物理アドレスを用いてセットアップする。本発明の概念を用いるエミュレートし合成したページ・テーブルについても下記でさらに説明する。このテーブルは、好ましくは手順１３００を使用して対応するＴＬＢを更新するのに使用される。
【０１０３】
ステップ１４０２で、所与のキャッシュの少なくとも一部のキャッシュ・ラインをフラッシュまたはクリーンして、ロックするコードがすでにキャッシュに入っていないことを確実にする。ステップ１４０３で、キャッシュと関連付けられた置換（犠牲）カウンタに、強制的に最初のキャッシュ・ライン（キャッシュ・ライン０）をポイントさせる。好ましい実施形態では、データ・キャッシュおよび命令キャッシュはそれぞれ８個の６４ライン・セグメントに区分され、各セグメントには修正された仮想アドレス中のインデックス・フィールドで索引をつける。手順１４００でキャッシュ・ラインを逐次満たす。例えば、すべてのセグメントのキャッシュ・ライン０はすべて最初に順次満たし、続いてすべてのキャッシュ・ライン１を逐次満たし、以下同様に続く。
【０１０４】
ステップ１４０４で、キャッシュに入れるデータまたは命令を生成するが、この際には解読プロセス（暗号化解除）が必要となる可能性がある。そしてそのデータまたは命令を、内部ＳＲＡＭや外部ＳＲＡＭ／ＤＲＡＭ／フラッシュなど代替メモリ中の対応する位置に格納する。次いでステップ１３０５で、命令をキャッシュに入れるためのプリフェッチ・キャッシュ・ライン操作を行って、ポイントされたキャッシュ・エントリでルックアップを起動する。（データ・キャッシュにはＬＯＡＤ命令を使用することができる。）これによりキャッシュ・ミスが起こり、プロセッサが必要なデータまたは命令を含んだ代替メモリにアクセスすることが必要となる。ＴＬＢが最新のものであり正確である場合、プロセッサは物理アドレスに必要なビットを求めてＴＬＢを参照することによってこのアクセスを行うことができ、またはステップ１４０１でセットアップされたページ・テーブルを直接ウォークスルーすることによって行うことができる。物理アドレス自体は、ＴＬＢ中のアクセスされたエントリの基底アドレス・ビットと、仮想アドレスのインデックス・ビットから生成される。
【０１０５】
ステップ１４０５で、生成したコードまたはデータをキャッシュ・ミスを処理すべき箇所に置き、ステップ１４０６でキャッシュ・ラインの現在の置換ポインタ・エントリにライン・フィルを行う。ここでも、仮想アドレスのキャッシュ・セグメント・インデックス・ビットでキャッシュ・セグメントに索引をつけてキャッシュ・ミスを起こす。
【０１０６】
ステップ１４０７で所与のキャッシュの最後のセグメントに達さずさらにキャッシュ操作が必要な場合は、プロセッサはステップ１４０８でキャッシュ・セグメント・インデックス・ビットを増分して、現在の置換カウンタ値で次のキャッシュ・セグメントへの次のキャッシュ・アクセスを強制的に行う。手順はステップ１０４０に戻り、そこから継続する。
【０１０７】
ただし、終えたばかりの動作が最後のキャッシュ・セグメントに対するものであり、さらにキャッシュ動作が続く場合（すなわち、ステップ１４０９で満たすべき最後のキャッシュ・ラインに到達していない場合）は、ステップ１４１０で、手順はジャンプしてステップ１４０３に戻り、置換カウンタの値を更新し、手順はその時点から継続する。
【０１０８】
ロックするすべてのコードのロードが完了すると、置換カウンタのベースを、ロックされたキャッシュ・ラインのベースの値よりも１大きい値にセットする（ステップ１４１１）。これにより、プライベートなデータ（この時点では解読されている）が、キャッシュ・ミス時に上書きされず、あるいは無許可の者からアクセスできないことが保証される。そしてステップ１４１２でコードをキャッシュから実行することができる。
【０１０９】
ロックする領域全体についてのメモリ位置を使用せずにロックおよびキャッシュされたデータを作成する手段の１つは、レジスタのキャッシュ・ラインの長さを使用してその領域をエミュレートするものである。また、キャッシュ・ミスのエミュレーションを使用して、キャッシュ・ロック処理の細分性に対するハードウェア制限を改善することもできる。例えば、ＡＲＭ９２０Ｔによる実施形態では、キャッシュは６４ワード・ブロック（２５６バイト）単位でロックすることができる。ただし、各キャッシュ・ラインはわずか８ワード（３２バイト）の長さであり、したがってアドレス・ビットに応じて６４ワード・ブロック内の異なる位置にマッピングすることができる。
【０１１０】
本発明の概念によると、ロック可能な位置それぞれについて、メモリ内の代替位置に８個のプログラマブルな３２ビットのエミュレーテッド・キャッシュ・ライン（ＥＣＬＩＮＥ）レジスタが、８個の連続した３２ビット位置としてセットアップされる。さらに、比較（オフセット）レジスタ（ＥＣＯＦＦＳＥＴ）がセットアップされ、これは、エミュレートされたキャッシュ・ミスの後でキャッシュ・メモリ空間のどこにＥＣＬＩＮＥレジスタの内容が存在するかを識別する物理アドレスを用いてプログラムされる。この結果、サイズが１キャッシュ・ラインである位置を使用して、６４ワードのロック可能な位置全体を表すことができる。
【０１１１】
エミュレートしたキャッシュ・ミスの手順１５００を図１５のフロー・チャートに示す。ステップ１５０１で、（命令キャッシュまたはデータ・キャッシュに）キャッシュする内容をＥＣＬＩＮＥレジスタに書き込む。次いで、データを書き込むロック可能なキャッシュ空間に対するオフセットを、ＥＣＯＦＦＳＥＴ比較レジスタ中にプログラムする（ステップ１５０２）。
【０１１２】
ステップ１５０３で、キャッシュ・ミスを起こすための操作を行う。命令キャッシュの場合、これは命令キャッシュに対するプリフェッチ命令を通じて行うことができ、データ・キャッシュの場合にはロード命令を通じて行うことができる。この位置に対して生成された仮想アドレスにより所与のキャッシュでミスが起こり、次いで仮想アドレスのインデックス・ビットと、適切なＴＬＢから取り出したベース・ビットを使用するか、あるいはページ・テーブル・ウォークを行って対応する物理アドレスを生成する。ステップ１５０５で、対応するＥＣＬＩＮＥレジスタ中の情報を取り出し、ステップ１５０６でそれをキャッシュ内のアドレス指定されたエントリにロードする。このエントリは手順１４００を使用してロックに備えられている。この手順により、キャッシュのロックされた部分に内部ＳＲＡＭまたは外部ＳＲＡＭに頼らずにロードすることが可能になっており好都合である。
【０１１３】
すでに指摘したように、データまたは命令が取り出される物理メモリのアドレスを生成するために、キャッシュおよびＴＬＢのロック操作中にページ・テーブル・ウォークが必要となる。本発明の概念は、ページ・テーブル・サポートのために確保しなければならないメモリ量を節減する、合理化されたページ・テーブルの作成を可能にする。さらに、ＴＬＢのミスを考慮にいれても、本発明の概念は、セクション／ページ・テーブル・ウォークによるアドレス変換中に、改ざん、コピー、またはＭＭＵ１０４のセキュアな動作を通じた電子的な分析からデータおよび命令コードを保護する。説明のために再度ＡＲＭ９２０Ｔプロセッサ・コアを考察するが、本発明の原理はこの他のプロセッサおよびメモリ管理ユニットのメモリ管理方式にも適用することができる。
【０１１４】
この実施形態の場合、従来のページ・テーブル・ウォークは一般に以下のように進行する。レベル１のフェッチ中に、セクション記述子（レベル１）、コース・ページ・テーブルの基底アドレス、またはファイン・ページ・テーブルの基底アドレスを４０９６エントリの変換ベース・テーブル（ＴＢＴ）から取り出す。ＴＢＴには、変換ベース・レジスタ内のＴＢＴ基底アドレスと、修正された仮想アドレスのテーブル・インデックス・フィールドを使用してアクセスする。
【０１１５】
ＴＢＴからの出力がセクション記述子である場合、その記述子にはセクション基底アドレスとアクセス許可が含まれている。次いで、レベル１記述子からのセクション基底アドレス・ビットと修正された仮想アドレスのセクション・インデックス・ビットを使用して、メモリの１メガバイト・セクションの物理アドレスを生成する。（レベル１のセクション記述子に含まれる許可は望ましいものであると想定する。）
【０１１６】
ＴＢＴから取り出したコース・ページ・テーブル基底アドレスは、修正された仮想アドレスのレベル２のテーブル・インデックスと併せて、コース・ページ・テーブル中の２５６エントリの１つにアクセスし、それにより１メガバイトのブロックを４キロバイトのブロックに分割する。コース・ページ・テーブルは、アクセス許可とともに、ラージ基底アドレスまたはスモール基底アドレスを戻す。その許可の状態に応じて、ラージ・ページまたはスモール・ページの基底アドレス・ビットを修正された仮想アドレスのページ・インデックス・ビットと組み合わせて、メモリの６４キロバイトのラージ・ページまたは４キロバイトのスモール・ページの物理アドレスを生成する。
【０１１７】
ＴＢＴから取り出したファイン・ページの基底アドレスは、修正された仮想メモリ・アドレスからのレベル２のテーブル・インデックス・ビットとともに、１０２４エントリのファイン・ページ・テーブルをポイントする。このテーブルからの出力はレベル２の記述子であり、これはアクセス許可とともにラージ、スモール、またはタイニー（ｔｉｎｙ）の基底アドレスを含んでいる。ラージ・ページは６４キロバイトであり、スモール・ページは４キロバイトであり、タイニー・ページは１キロバイトである。許可がアクセスの許可を示すとすると、ページ基底アドレスを、修正された仮想アドレスのページ・インデックス・ビットと連結して、すでに述べたメモリ中のラージ・ページまたはスモール・ページ、あるいはメモリ中の１キロバイトのタイニー・ページの物理アドレスを生成する。
【０１１８】
ページ・テーブル・ウォークの結果アクセスされるメモリは、キャッシュ、内部メモリ、または外部メモリのいずれかである。物理アドレスと許可を使用してＴＬＢを更新する。セキュアな情報があればそれを上記のようにＴＬＢ中にロックする。
【０１１９】
この２レベルからなるテーブル・ウォーク手順の欠点は、各種のテーブルに相当量のオンチップ・メモリが必要となることが原因である。先に述べたように、セキュアな情報は、無許可のユーザからアクセスすることが不可能なシステムの内側のメモリ・エリア中になければならない。したがって、物理アドレスの変換方式など扱いに注意を要する情報を利用可能な内部メモリに効率的に記憶するための何らかの措置を行わなければならない。
【０１２０】
集積回路１００の好ましい実施形態では、テーブル・ウォークのプロセスを大幅に簡略化し、変換テーブルに必要なメモリ量を著しく減らすことができる。これは動作効率の向上の点から見て重要なだけではなく、これによりセキュアでない外部メモリを使用せずに済むことも保証される。
【０１２１】
ここでは、メモリ空間を２５６メガバイトの領域に分割し、各領域を共通のアクセス特性（例えばアクセス許可、キャッシュの可能性、バファリングの可能性）のセットと関連付ける。この領域のうち１つのみのわずか１メガバイトだけに第２レベルのページ・テーブルが必要となる。したがって、多くのメモリ領域が共通のアクセス特性を有するので、利用可能なＳＲＡＭ空間内にはるかに小さな変換テーブルを作成することができる。
【０１２２】
アクセス許可は、対応するメモリ・ブロックから所与の情報にアクセスできるかどうかを示す。キャッシュ可能性とバッファリング可能性の属性ビットは、アクセスされる情報をキャッシュに記憶することができるか、または書き込みバッファを通じて転送することができるかを判定するのに使用される。例えば、ＵＡＲＴおよびその他の周辺装置、およびＩ／Ｏデバイスを制御するリアル・ハードウェア・レジスタの内容は、一般にはＣＰＵのサブシステムによってキャッシュまたはバッファリングすることはできない。これにより、アクセスが実際に行われるタイミングが原因でこれらの周辺装置の動作が不正確になる可能性がある。
【０１２３】
さらに、セキュア・システムの場合は、ページ／セクション・テーブルの情報をプライベート・エリアの境界内に保持して、その情報がメモリから、ロジック・アナライザで検査することが可能なデバイス・ピンへと移動できないようにしなければならない。
【０１２４】
メモリを１６個の２５６メガバイト・ブロックに分割する図の実施形態では、レベル１のＡＰビットを記憶するために３２ビット・レジスタを作成し、各２ビットのペアが２５６メガバイトのメモリ領域に対応する。例えば、ビット［１：０］は領域１に対応し、ビット［３：２］は領域２に対応し、以下同様である。レベル１の各領域のキャッシュ可能性を示すビット・セットを保持するための１６ビット・レジスタをセットアップする。各領域のキャッシュ可能性を示すビット・セットのどちらかを保持するために別の１６ビット・レジスタをセットアップする。これらのレジスタは、ＭＭＵ中の変換ベース・レジスタの内容によってポイントされる。
【０１２５】
図１６Ａに、固有の特性と定数を有するメモリ領域を扱うのと同時に、これらのレジスタを更新する手順１６００を示す。
【０１２６】
所与の２５６メガバイト領域に対して、ステップ１６０１で、それが共通のアクセス特性のセットを有するかどうかを判定する。判定の結果が肯定である場合、ステップ１６０２で、グローバルなレベル１のＡＰレジスタ中の対応するエントリに該当するＡＰビットをロードする。ステップ１６０２および１６０３で、グローバルなレベル１のバッファリング可能性とキャッシュ可能性を示すレジスタ中の対応するエントリも同様に更新する。
【０１２７】
ステップ１６０５で、手順は戻り、更新を必要とする次のメモリ領域（ブロック）のレジスタ・エントリを更新する。グローバルなアクセス制御レジスタの初期化／更新はループで行うことが好ましい。一般に値は変化しないが、必要であればシステムの処理中に更新することができる。合成しないエントリのフル・レジスタ値は、システムの動作中に適宜更新される。更新が必要となるのは、例えばあるメモリ・ページをディスクや同様の大容量記憶装置に「スワップ」アウトするときに別のページに置き換える場合などである。
【０１２８】
ステップ１６０１で、アクセス許可、バッファリング可能性とキャッシュ可能性のビット、および物理アドレス・ビットを含む固有のアクセス特性セットを有するメモリ・ブロックまたはレジスタについては、ステップ１６０６および１６０７でフルの３２ビット・レジスタに完全なレベル１の記述子をロードする。手順は再度ステップ１６０８にループ・バックする。この記述子は、コースまたはファインのページ（レベル２）テーブルのアドレスを含むことができる。
【０１２９】
そうでない場合は、ステップ１６０８で、ハードワイヤード・ゲートで定数がポイントされる。記憶された定数は、固定値でもレベル２テーブルの基底アドレスでもよい。ステップ１６０９でレベル２へのウォークが必要でない場合は、手順はステップ１６１０にループバックする。そうでない場合は、ステップ１６１１でレベル２の合成されたテーブル中の対応するレジスタをセットアップする。
【０１３０】
これと同様のプロセスを、レベル２の合成に使用する。具体的には、レベル２の各ページについて、共通の特性を有するページおよびサブブロックに対して、レベル１レジスタ内のレベル２基底アドレス・ビットによってポイントされるレジスタと、グローバルなレベル２のＡＰレジスタを、レベル２のバッファリング可能性およびキャッシュ可能性のレジスタとともに上記のようにセットアップする。
【０１３１】
ステップ１６１２で、所与のページまたはページのセットについて、それが共通のアクセス特性のセットを有するかどうかを判定する。判定の結果が肯定である場合は、ステップ１６１３でグローバルなレベル２のＡＰレジスタ中の対応するエントリに適切なＡＰビットをロードする。ステップ１６１４および１６１５で、グローバルなレベル２のバッファリング可能性およびキャッシュ可能性を示すレジスタ中の対応エントリも同様に更新する。
【０１３２】
ステップ１６１６で、手順はステップ１６０１に戻り、更新を必要とする次のメモリ領域（ブロック）のレジスタ・エントリを更新する。
【０１３３】
それらレベル２のページに対して、ステップ１６１２でアクセス許可と、バッファリング可能性およびキャッシュ可能性のビットと、物理アドレス・ビットとを含む固有のレベル２のアクセス特性のセットを有するページ、ブロック、またはレジスタのセットに対して、ステップ１６１８でフルの３２ビット・レジスタに完全なレベル２記述子をロードする。そうでない場合は、ステップ１６１８でハードワイヤード・ゲート内で定数がポイントされる。記憶された定数は、固定値、基底アドレスなどでよい。手順はステップ１６１９で再度ステップ１６０１にループバックする。
【０１３４】
例示的な合成されたページ・テーブルのウォークを図１２Ｂに示す。ステップ１６２０で、テーブル・ウォークを要求する。この要求は、ＴＬＢおよび／またはキャッシュのミスに応答したものでよい。この例では、ステップ１６２１でテーブル・ウォークの第２レベルが必要とされない第１の場合を考えてみたい。ステップ１６２２で、上記のレベル１のレジスタが、ＭＭＵ中の変換ベース・レジスタによってポイントされる。仮想アドレスのテーブル・インデックス・ビットを使用して、レベル１のレジスタ・エントリを索引づけする（ステップ１６２３）。
【０１３５】
ステップ１６２４および１６２５で、レベル１レジスタの索引付けされたエントリからの戻りが完全な記述子であるか、または定数であるかを判定する。戻りが定数または完全な記述子のいずれでもない場合を最初に考察する。
【０１３６】
ステップ１６２６で、第１レベルのグローバル・アクセス・レジスタ中のアクセス制御ビット（すなわちＡＰ、キャッシュ可能性、およびバッファリング可能性のビット）を取り出す。次いでステップ１６２７で、仮想アドレスに相対的なビット位置を移動することにより、仮想アドレスのテーブル・インデックスを物理アドレスに変換する。
【０１３７】
好ましい実施形態では、セクション・エントリの変換された仮想アドレスは、テーブル・インデックス・ビットになり（４０９６エントリのレベル１ページ・テーブル中へのルックアップ・ワード・インデックスのビット１３：２がそのエントリ（１メガバイトのメモリ領域）についての結果のビット（３１：２０）になる）。セクションの領域は、メモリ位置のビット（１３：１０）によって定義される。ＡＲＭ９２０またはＡＲＭ７２０のＭＭＵを使用する実施形態では、ページ・テーブル・エントリ中の数ビットが常に定数０または１になる。
【０１３８】
ステップ１６２８で、変換されたアドレス・ビットと取り出したアクセス制御ビットを組み合わせることにより、レベル１の記述子を形成する。ステップ１６２９で、ＴＬＢおよび／またはキャッシュの更新のために合成した記述子を戻す。
【０１３９】
ステップ１６２４および１６２５に戻ると、レベル１のエントリも完全な記述子か（ステップ１６３０）、または定数（ステップ１６３１）である。ステップ１６３２で、その記述子または定数を直ちに使用することができる。
【０１４０】
次にステップ１６２１でレベル２のテーブル参照を要求すると想定する。
【０１４１】
レベル２の変換は、レベル１の参照だけを必要とする際に行われる変換と同様である。ステップ１６３３で、上記のようにセットアップされたレベル２のレジスタが、ＭＭＵ中の基底アドレスによってポイントされる。ステップ１６３４で、仮想アドレスのテーブル・インデックス・ビットを使用して、特定のレジスタまたはエントリを索引付ける。ステップ１６３５および１６３６で、索引付けされたレジスタ（エントリ）が完全な記述子または定数を含むかどうかを判定する。記述子が見つかった場合はステップ１６３７でその記述子を取り出し、定数が見つかった場合はステップ１６３８でその定数を取り出す。この記述子または定数は、ステップ１６３９で直ちに使用することができる。
【０１４２】
ステップ１６３５および１６３６で、定数または記述子のいずれも見つからない場合は、ステップ１６４０で第２レベルのアクセス制御レジスタにアクセスし、ステップ１６４１で仮想アドレスのページ・インデックス・ビットを使用して対応するアクセス制御ビットを取り出す。ステップ１６４２で、ビット位置をシフトすることにより、仮想アドレスのページ・インデックス・ビットを物理アドレスに変換する。ステップ１６４３で、この物理アドレスのビットを、取り出したアクセス制御ビットと組み合わせて合成記述子を形成する。ステップ１６４４で、ＴＬＢの更新、キャッシュ・ミス時のメモリの実行、あるいは同様の動作のためにこの合成記述子を戻す。
【０１４３】
説明を簡潔にするために、合成したテーブルのウォークについてはレベル１とレベル２の記述子生成に関してのみ説明したことに留意されたい。ただし、本発明の原理を反復して適用することにより、第２レベル以下の他レベルのウォーク・スルーを実装できることに留意されたい。
【０１４４】
要約すると、本発明の概念によると、第１レベルのテーブルとして必要なのは、３２ビットのＡＰレジスタと、バッファリング可能性およびキャッシュ可能性のための１対の１６ビット・レジスタのみである。アドレス指定しなければならない第２レベルの各ページに必要とされるのは、スモール・ページのＡＰレジスタと、１ビットのキャッシュ可能性およびバッファリング可能性のレジスタからなる第２レベルのテーブルだけである。
【０１４５】
本発明の概念はまた、キャッシュ・ミス時に、キャッシュ・ミスのエミュレーションに類似するメモリのレジスタ・エミュレーションによりアドレスの変換とＴＬＢの更新を可能にする点が好都合である。その後、上記のようにセキュリティのためにキャッシュおよび／またはＴＬＢのエントリをロックすることができる。好ましいエミュレーション・プロセスでは代替のエミュレートしたメモリを使用し、集積回路１００の内部メモリを他のタスクのために割くことができるようにする。ページ・テーブルのメモリ・アドレスは、集積回路の内部にマッピングすることが好ましい。こうした概念を用いる好ましい手順は、図１７に示すエミュレーテッド・テーブル・ウォーク／ＴＬＢ更新手順１７００である。
【０１４６】
初めに、レベル１の記述子かまたはレベル２の基底アドレスを含むエミュレートされたレベル１の変換レジスタ（テーブル）（ＥＬ１ＴＲ）を作成する（ステップ１７０１）。さらに、ＥＬ１ＴＲ中のエントリへのインデックスを保持する、エミュレートしたレベル１のインデックス・レジスタ（ＥＬ１ＩＲ）を代替のメモリ空間にセットアップする（ステップ１７０２）。ＭＭＵ中の変換ベース・テーブル（ＴＴＢ）は、エミュレートされたレベル１のテーブルをポイントするようにプログラムする。この領域に対する要求は、テーブル・マッチＥＬ１ＩＲへのインデックスを含んだＥＬ１ＴＲの内容を受け取る。インデックスが一致しない場合は、戻された値が例外を発生させるエントリになる。
【０１４７】
第２レベルを過ぎて継続するアドレス変換については、レベル２の記述子を含むエミュレートしたレベル２の変換レジスタ（ＥＬ２ＴＲ）を代替メモリ中に作成し（ステップ１７０４）、エミュレートしたレベル２のインデックス・レジスタが、対応するインデックスを保持する（ステップ１７０５）。
【０１４８】
ステップ１７０６で、ＣＰＵ１０１に指示を出すか、または外部アドレス・ジェネレータを使用して仮想アドレスを生成する。キャッシュおよびＴＬＢがフラッシュまたはクリーンされている場合はキャッシュ／ＴＬＢのミスが起こり、従ってステップ１７０７で、ＭＭＵによってポイントされるエミュレートされたレベル１のテーブルを使用してテーブル・ウォーク手順を起動する。仮想アドレス中のレベル１のテーブル・インデックス・ビットを、ＥＬ１ＩＲ中のテーブル・インデックス・ビットとＥＬ１ＴＲから戻された対応するレベル１の情報と比較する（ステップ１７０８）。
【０１４９】
ステップ１７０８で情報が記述子である場合（すなわちレベル２の変換が不要な場合）は、レベル１のアクセスを実行し（ステップ１７０９）、ここで記述子中の許可を調べる（ステップ１７１０）。許可を与えない場合、動作はステップ１７１１で中止する。それ以外の場合は、ステップ１７１２で、レベル１の記述子中のセクション・アドレス・ビットと、仮想アドレス中のセクション・インデックスから物理アドレスを生成する。ステップ１７１３でこの物理アドレスをＴＬＢにロードして、ロック操作と、対応するデータまたは命令が適切なキャッシュにロードされるのを待機する。ステップ１７１４で、ＴＬＢ中の現在のエントリがロードする最後のエントリでないと判定された場合は、ステップ１７１５で、手順はステップ１７０６にループバックして次のテーブル・ウォークを開始する。それ以外の場合は、ステップ１７１６でＴＬＢのロック手順を実行する。
【０１５０】
ステップ１７０８でＥＬ１ＴＲの情報がレベル２への基底アドレスであると分かった場合は、ステップ１７１７でレベル２のページ・ウォークを起動する。ＥＬ２ＴＲレジスタには、ＥＬ１ＴＲの基底アドレスを使用してアクセスする（ステップ１７１８）。仮想アドレスのインデックス・ビットに照らして比較することにより対応するＥＬ２ＩＲレジスタの内容を使用して特定のレジスタを索引付けする（ステップ１７１９）。ステップ１７２０で、戻されたレベル２の記述子中の許可を調べる。アクセスを許可しない場合はステップ１７２１でアクセスを中止し、それ以外の場合は、ステップ１７２３でレベル２の記述子中の物理アドレス・ビットと仮想アドレスのインデックス・ビットを使用して物理アドレスを生成する。ステップ１７２３で物理アドレスをＴＬＢにロードしてロック操作を待つ。
【０１５１】
ステップ１７２４で現在のＴＬＢエントリがロードする最後のエントリである場合、ステップ１７２５でＴＬＢロック手順を起動することができ、それ以外の場合はステップ１７２６で、手順がジャンプしてステップ１７０６に戻り、次のＴＬＢエントリのテーブル・ウォークを開始する。
【０１５２】
集積回路１００の実施形態によっては、メモリ管理ユニット（ＭＭＵ）またはハードウェア・キャッシュを含まないベア（ｂａｒｅ）ＣＰＵを用いることが可能である。例えば、ＣＰＵコア１０１は、キャッシュ１０３やＭＭＵ１０４を用いずにＡＲＭ７ｔｄｍｉプロセッサ１０２だけを基盤とすることが可能である。このオプションを選択する場合は、すべてのソフトウェアをメモリ中のフラットなメモリ空間に格納しなければならない。ただし、これには外部メモリ（例えばＮＯＲフラッシュ、ＳＲＡＭ、ＤＲＡＭ）の使用が必要となる場合がある。上記のように、外部メモリ内のデータには、無許可のエンド・ユーザがアクセスまたは分析する可能性があるという大きな欠点がある。
【０１５３】
ハードウェア・キャッシュもＭＭＵも用いない集積回路１００の実施形態では、セキュリティ・コードを監視プログラム・モードで実行する。監視プログラム・モードでは、メモリの特定領域および特定レジスタへのアクセスは、監視者の特権に照らしてチェックを受ける。セキュリティ・ファームウェアは、ＳＲＡＭなどの内部メモリから実行することが好ましい。監視モードでは、すべての他のソフトウェア／ファームウェアはユーザ・モードで実行されているものと解釈され、したがってセキュリティ保護されたソフトウェアによる監視者特権によるチェックを受ける。
【０１５４】
本発明について特定の実施形態を参照して説明したが、これらの説明は限定的な意味に解釈すべきではない。当業者には本発明の説明を参照すれば、開示した実施形態の各種の変形形態、ならびに本発明の代替実施形態が明らかになろう。当業者には、開示した概念および特定の実施形態は、本発明の同じ目的を実行するための他の構造の修正または設計の基礎として容易に利用できることが理解されよう。当業者にはまた、そのような同等の構成が、頭記の特許請求の範囲に述べる本発明の精神および範囲から逸脱しないことも認識されよう。
【０１５５】
したがって、特許請求の範囲は、本発明の真の範囲内に該当する変形形態または実施形態を包含するものとする。
【０１５６】

【０１５７】

【０１５８】

【０１５９】

【０１６０】

【０１６１】

【０１６２】

【０１６３】

【０１６４】

【０１６５】

【０１６６】

【０１６７】

【０１６８】

【０１６９】

【０１７０】

【０１７１】

【０１７２】

【０１７３】

【０１７４】

【０１７５】

【０１７６】

【０１７７】

【０１７８】

【０１７９】

【０１８０】

【０１８１】

【０１８２】

【０１８３】

【図面の簡単な説明】
【図１Ａ】本発明の原理を実施する集積回路のハイ・レベル機能ブロック図である。
【図１Ｂ】本発明の概念を実施する第２のシステムのハイ・レベル図である。
【図１Ｃ】本発明の原理を有利な形で適用することのできる第３の例示的システムの図である。
【図１Ｄ】さらに２つの形態を示す図である。
【図２】最大の利用が可能な構成における集積回路１００の図である。
【図３】図１Ｂのプロセッサのハイ・レベル機能ブロック図である。
【図４Ａ】システムが「待機状態」に入りピンＣＬＫＥＮのクロック・イネーブル信号がアサートされたときにピンＥＸＰＣＬＫを駆動させる外部クロックを表す図である。
【図４Ｂ】ピンＣＬＫＥＮのクロック・イネーブル信号がアサートされ、システムが「待機状態」を終了するときにピンＥＸＰＣＬＫを駆動させる外部クロックを表す図である。
【図５】図１Ａの状態制御回路の動作を表す状態図である。
【図６】図１Ａのシリアル・インタフェース・ブロックを含む３つのシリアル・インタフェースのブロック図である。
【図７Ａおよび７Ｂ】選択された外部デバイスとの関連でＳＳＩ（ＡＤＣ）の動作を表すタイミング図である。
【図８】図６のコーデック・インタフェースの動作を表すタイミング図である。
【図９】図６のシリアル・インタフェース・ブロックのＩ²Ｓポート間のインタフェースを示す機能ブロック図である。
【図１０】図９のＩ²Ｓのインタフェースの動作を表すタイミング図である。
【図１１】マスタ／スレーブ構成における図６のＳＳＩ２ポートの使用を説明する機能ブロック図である。
【図１２】パワーオン・リセット時のシステム初期化を説明するフロー・チャートである。
【図１３】ＴＬＢ中のプライベート・データをロックする手順を説明するフロー・チャートである。
【図１４】セキュアなコードをキャッシュ内にロックするキャッシュ・ロックダウン手順を説明する図である。
【図１５】エミュレートされたキャッシュ・ミスの手順を示すフロー・チャートである。
【図１６Ａ】合成された変換テーブルをセットアップする好ましい方法の図である。
【図１６Ｂ】図１６Ａの合成テーブルを通じたテーブル・ウォークを説明するフロー・チャートである。
【図１７Ａ〜１７Ｅ】エミュレートされたテーブル・ウォークを実行する好ましい手順を説明する図である。[0001]
(Field of Invention)
The present invention relates generally to electronic devices, and more particularly to circuits, systems, and methods for the privatization of information in personal electronic devices.
[0002]
(Description of related technology)
Handheld personal electronic devices are becoming more popular as new technologies enable the manufacture of affordable devices with advanced features. One such device is a portable digital audio player, which downloads digital audio data, stores the data in a readable / writable memory, and responds to user requests. Convert the data to audio. Digital data is downloaded from the network in any of several forms, including MPEG Layer 3, ACC, and MS audio protocols, or retrieved from fixed media such as compact discs. An audio decoder supported by the appropriate firmware retrieves the encoded data from the memory, applies the corresponding decoding algorithm, converts the encoded data to analog form, and converts the headset or other portable speaker system. To drive.
[0003]
In order to prevent unauthorized downloading of copyrighted material such as music, some means of controlling the operation of the personal device is desired. This can be implemented, for example, through the issuance of a password or software kernel that permits downloading of related information. Passwords and software must be protected to prevent copying, distribution, and tampering by end users. In addition, since the audio decoder can be operated from its own firmware, such firmware must also be protected from copying and tampering.
[0004]
In short, what is needed is a method, circuit, and system for protecting information in a personal digital device. For this purpose, the ability to protect such information should not depend on where the information is stored in the device, whether internal or external to the main processing chip. Furthermore, by implementing security, it is preferable that resources that can be used for processing operations more directly, such as available memory space, are not wasted. The security method and hardware are preferably applicable to a wide variety of system configurations.
[0005]
(Summary of Invention)
In accordance with the principles of the present invention, a system is disclosed that includes a central processing unit that operates in response to a set of instructions for processing information. An interface is included that provides access from external devices to selected circuits that form part of the system. A set of non-volatile programmable security elements provides a private environment for processing information by selectively enabling and disabling interface operation. The principles of the present invention provide, among other things, the ability to privateize information in personal digital devices. This principle can be implemented in a manner that does not waste processing resources that can be used more directly for processing operations, such as available memory space. Furthermore, this principle can be applied to a wide variety of system configurations, where private information is stored anywhere in the device, whether in the memory inside the main processing chip or in the external memory. It does not depend on what is done.
[0006]
For a more complete understanding of the present invention and its advantages, reference is now made to the following description taken in conjunction with the accompanying drawings, in which:
[0007]
(Description of Preferred Embodiment)
The principles of the invention and its advantages can best be understood by referring to the embodiments shown in FIGS. 1-17, where like numerals refer to like parts throughout the drawings.
[0008]
FIG. 1A is a high-level functional block diagram of an integrated circuit 100 that implements the principles of the present invention. The integrated circuit 100 is, for example, a Circus Logic EPxx integrated circuit. The integrated circuit 100 can be advantageously used in a plurality of consumer and industrial handheld information devices including portable information terminals, electronic notebooks, interactive pagers, among others. Specifically, the integrated circuit 100 can be configured to perform audio processing with a battery-powered Internet audio decoder.
[0009]
Two other exemplary systems in which the principles of the present invention can be advantageously applied are shown in FIGS. 1B and 1C and described further below.
[0010]
FIG. 2 shows an integrated circuit having a certain system configuration. This figure is referred to when describing input / output signals (ports) of various functional blocks of the integrated circuit 100.
[0011]
The integrated circuit 100 is built around an ARM720T processor 101 as described in the ARM720T data sheet available from ARM, Cambridge, UK. Generally, the processor 101 includes a central processing unit (CPU) core 102, an 8 kilobyte cache 103, a memory management unit (MMU) 104, and a write buffer 105, each of which will be described in more detail below. Note that in alternative embodiments, an ARM920 processor may be used.
[0012]
CPU 102 is a 32-bit microprocessor based on a reduced instruction set computer (RISC) architecture. The associated 8 kilobyte cache 103 is a mixed instruction / data cache (IDC), organized as a 4-way set associative cache consisting of 512 lines of 16 bytes (4 words).
[0013]
The MMU 104 includes a translation index buffer (TLB), access control logic, and translation table walking logic. The main functions of the MMU 104 are conversion of virtual addresses to physical addresses and control of memory access. The MMU 104 also supports a conventional two-level page table structure. In general, the TLB caches 64 translated entries and provides translation to the associated access control logic. If there is a hit in the translated entry in the TLB due to the virtual address, the access control logic determines whether to allow the access. If access is permitted, the MMU 104 outputs the corresponding physical address from the TLB cache. On the other hand, if the access is not permitted, the MMU 104 notifies the CPU 102 to execute the cancellation. When there is no hit (miss) in the TLB cache as a result of the virtual address, the translation table walking circuit extracts necessary translation information from the translation table in the physical memory. This conversion information is written to a replacement point or entry in the TLB cache. This allows the access control logic to determine whether or not to permit access.
[0014]
Write buffer 105 is used to buffer up to 8 words of data and 4 independent addresses. When this is enabled, the CPU 102 writes data or instructions to the write buffer 105 using an external clock and returns to instruction execution. In parallel with this, the write buffer 105 can write data to the internal data bus 106 and write an address to the internal address bus 107.
[0015]
An on-chip phase-locked loop (PLL) 108 driven by a 3.6864 MHz crystal 109 is used in certain modes to provide a clock to the processor 101. In embodiments using ARM720T, the main (CPU) clock can be programmed to either 18.432 MHz, 36.864 MHz, 49.152 MHz, or 73.728 MHz. (PLL 108 preferably operates at twice the highest possible CPU clock frequency, ie 147.456 MHz.) When the CPU clock frequency is selected to be 36.864 MHZ, internal data bus 106 and internal address bus. 107 also operates with a clock of approximately 36 MHz. For CPU clock frequencies above 36 MHz, only processor 101 operates at a clock speed above that frequency, and internal data bus 106 and internal address bus 107 are clocked at a speed of 36 MHz. The CPU clock frequency is selected by programming a 2-bit register field in the system control register SYSCON3. (A list of internal registers of integrated circuit 100 is provided as Table 1. A complete description of these registers is obtained from “Cirrus Logic EP7211 Preliminary Data Sheet.” Which is incorporated herein by reference.
[0016]
Note that integrated circuit 100 also includes an external clock input, which allows a 13 MHz external clock to be input to drive substantially all on-chip circuits in the second clock mode. As shown in FIGS. 4A and 4B, when the clock enable signal is asserted at pin CLKEN, the external clock drives pin EXPCLK. FIG. 4A shows a case where the integrated circuit 100 enters the standby state, and FIG. 4B shows a case where the standby state ends. (The standby state will be further described below.)
[0017]
The oscillator 110 is used to generate a 1 Hertz clock that is used to drive a 32-bit real-time clock generator (RTC) 112. The RTC 112 is written to, read from, and can include a 32-bit output match register. The output match register allows an interrupt to be issued when the RTC time matches a predetermined time. The RTC 112 is also used to drive a programmable LED flasher (not shown).
[0018]
The integrated circuit 100 further includes a pair of on-chip timer counters 113. Each timer counter is independent and includes a 16-bit read / write data register. A given counter is loaded to a desired value and decremented in response to a preselected clock. When the timer counter overflows (ie, reaches zero), an appropriate interrupt is generated. The timer counter register can be read at any time. The clock frequency of these timers can be selected by writing to the corresponding bit in the system control register SYSCON. For example, if the PLL 108 is sourced from an internal clock, the timer counter 113 can use speeds of 512 kHz and 2 kHz. When using a 13 MHz clock from an external source, 541 kHz and 2.115 kHz clocks can be used. It is also possible to generate a 500 kHz clock from a 13 MHz source by using a circuit that divides by 26, which is made possible by setting a bit in the system control register SYSCON2.
[0019]
Each timer counter 113 can operate in a free running mode or a prescale mode by setting or clearing a bit in the system control register SYSCON1. In free running mode, if a given counter overflows (ie, reaches zero), it wraps to 0xFFFF and continues to count down. In prescale mode, the value written to a given timer counter is automatically reloaded when the counter overflows the lower digits. The prescale mode can be used to create a programmable frequency, drive a buzzer, or generate a periodic interrupt.
[0020]
The state control circuit 114 allows the integrated circuit 100 to be set to an active state, an idle state, or a standby state. A state diagram representing the operation of the state control circuit 114 is shown in FIG. The operating state is a normal program execution state where all clocks and peripheral logic are available. The idle state is similar to the active state, except that the CPU clock is stopped during an interrupt or wakeup to return the circuit to the active state. In the standby state, the PLL 108 is shut down, but the crystal 111, the oscillator 110, and the RTC circuit 112 are kept in operation. During the standby state, the external address bus and data bus also degrade operation to prevent the powered down peripherals from consuming current. Note that the integrated circuit 100 is forced into a standby state when it is first turned on or during a cold reset, and this state can only be terminated by an external wake-up instruction.
[0021]
The power management is performed through the power management control block 115 in addition to the state control circuit 114. Table 12 shows states of various functional blocks of the integrated circuit 100 in each state. When the power management circuit 115 receives the active-low power abnormality signal PWRFL from the external power supply unit 201, the power management circuit 115 places the integrated circuit 100 in the standby mode. When the integrated circuit 100 is driven by the external DC power supply 202, the external power supply sensing input signal EXTPWR is set to active low. When using battery 203, the BATOK pin is active high to indicate that the main battery is normal. The falling edge of this signal generates a FIQ (Fast Interrupt Request). When the signal on this pin goes low in the standby state, system startup is prohibited. A new battery sensing signal BATCHG indicates that a new battery is required. This input is active low when the battery voltage falls below the “no battery” threshold. The battery that provides power to the integrated circuit 100 may be, for example, one or more standard AA batteries that are widely distributed to retail consumers.
[0022]
An exception is typically generated when an unexpected event occurs during program execution (ie, an interrupt or memory failure). When multiple exceptions occur, the interrupt controller 116 acting on the fixed priority system determines the order in which the exceptions are serviced.
[0023]
Integrated circuit 100 supports two interrupt types: interrupt request (IRQ) and fast interrupt request (FIQ). FIQ has a higher priority than IRQ. When two or more interrupts of the same type occur simultaneously, the conflict is resolved in software. Tables 2A-2C show preferred interrupt assignments. INTMR1 and INTSR1 in the table are the high-speed interrupt mask register and high-speed interrupt status register, INTMR2 and INTSR2 are the second interrupt mask register and second interrupt status register, respectively, INTMR3 and INTSR3 are the third interrupt mask register A register and a third interrupt status register. Note that if two interrupts are received from the same group (IRQ or FIQ), the order in which they are serviced is preferably resolved in software.
[0024]
In general, the interrupt controller 116 operates as follows. An external or internal interrupting device asserts the appropriate interrupt. FIQ or IRQ is asserted by interrupt controller 116 when the appropriate bit is set in the corresponding interrupt mask register. If the interrupt is permitted, the processor 101 jumps to the corresponding address. The interrupt dispatch software then reads the corresponding interrupt status register to determine the interrupt cause, calls the appropriate interrupt service routine software, and the software routine performs an interrupt cause by some action specific to the interrupting device. To clear. The interrupt service routine can then re-enable the interrupt and service any other pending interrupts as well. All other external interrupt sources remain active until the corresponding service routine begins execution.
[0025]
Table 3 shows the waiting time for external interrupts. In the active state, the processor 101 checks whether its FIQ and IRQ inputs are low after executing each instruction. Thus, there will be an interrupt latency that is directly related to the amount of time it takes to detect the interrupt condition first and complete the current instruction. In the wait state, the wait time depends on whether the system clock is shut down and whether the control bit FASTWAKE in the system control register is set. As pointed out above, the PLL 108 is always shut down in the standby state. When the FASTWAKE bit is cleared, the waiting time is between 0.125 seconds and 0.25 seconds. However, when this bit is set, the waiting time is between 250 microseconds and 500 microseconds. Even when an external clock is used and disabled during standby, the waiting time is between 0.125 seconds and 0.25 seconds, thereby stabilizing the oscillator. If the external clock is not disabled, the waiting time can be reduced to a few microseconds. It is also possible for the integrated circuit 100 to exit the idle state by an interrupt. In this case, the CPU clock must be restarted, and further, the service for the interrupt can be delayed for instruction execution as described above.
[0026]
In the illustrated embodiment, an on-chip boot ROM 117 that holds an instruction set for initializing the integrated circuit 100 is provided. The on-chip boot ROM also sets UART1 described further below in the received 2048-byte serial data downloaded to the on-chip SRAM 118. Once the data has been downloaded to the SRAM 118, the processor can continue executing instructions by jumping to the start of the SRAM. Conveniently, this configuration allows code to be downloaded and system flash memory programmed during the manufacture of devices using integrated circuit 100. Note that the user can choose to boot from on-chip ROM 117 or from external memory connected to port CS [0]. Specifically, when the signal at pin MEDCHG is low, booting is performed from on-chip ROM 117, and when a high signal is applied to this pin, it is necessary to boot from external memory. Note that the effect of booting from the on-chip boot ROM is the reverse of internally decoding all chip select signals. This function is shown in Table 5A, and normal non-inverted chip select decoding is shown in Table 5B. In addition, booting can be done from external memory using a boot device width whose width can be selected according to Table 4.
[0027]
The ARM720T processor has a 4 gigabyte address space. In the illustrated embodiment, the integrated circuit 100 uses the lower 2 gigabytes of this address space for ROM / RAM / Flash and expansion space. Furthermore, 0.5 gigabytes is used for DRAM, and the remaining 1.5 gigabytes and less than 8K of internal registers are unused.
[0028]
The memory and I / O expansion interface supports six separate linear memories or expansion segments for external expansion memory 204. The other two segments are dedicated to on-chip SRAM and ROM. The size of each segment is 256 megabytes. Any of the six segments can be used to support a conventional SRAM interface. In addition, each segment can be individually programmed to be 8 bits, 16 bits, or 32 bits wide, thereby supporting page mode access and executing from 1 to 8 wait states for non-contiguous addresses However, in the case of burst mode access, it can be executed from a waiting state of 0 to 3. The sequential function of waiting for zero allows the integrated circuit 100 to interface with a burst mode ROM. While the on-chip ROM space is fully decoded, the full SRAM address space is fully decoded because the maximum size of the video frame buffer used to drive the external LCD (maximum Note that it is up to 128 kilobytes).
[0029]
Two of the extension segments can be reserved to establish an interface with the two PC Card cards 205 using the chip select signals NCS4 and NCS5. The interface with an external PC card is preferably performed through a Circus Logic CL-PS6700 PC card slot driver 206. Segmenting memory allows different types of access (ie attributes, input / output, and common memory space).
[0030]
The EXPCLK port to extended control block 119 outputs an extended clock equal to the CPU clock in 13 MHz mode and 18 MHz mode, which is 36.864 MHz if integrated circuit 100 is operating in 36 MHz, 49 MHz, or 70 MHz mode. Have a speed of (The EXPCLK port is used as the clock input in the 13 MHz mode above.) The EXPRDY pin (Expansion Port Ready) is placed low by an external expansion device to extend the bus cycle and insert a wait state. . The chip select signal CS [0: 3] is used for SRAM expansion, while the chip select signal CS [4: 5] can be used for both memory expansion or PC card selection. The write strobe WRITE goes low while reading from the expansion device and goes high while writing to the expansion device. The word / halfword bit (2) indicates to the external device whether the access size is word, halfword, or byte while writing from the integrated circuit 100. The DRAM controller 120 provides a programmable 16-bit or 32-bit wide interface to up to two banks 207 of DRAM, each bank having a storage capacity of up to 256 megabytes. The DRAM bank may be any of a plurality of types of DRAM distributed in the market, including conventional DRAM, synchronous DRAM (SDRAM), EDODRAM (Extended Data Out DRAM), first page mode DRAM, and DDRDRAM (Double). Data Rate DRAM). Further, these DRAMs may be of a self-refresh type that is placed in a low power state when the integrated circuit 100 enters the standby state described above. To support two banks, two row address strobes RAS [0: 1] can be generated along with four column address strobes CAS [0: 3]. The output enable signal MOE is used to enable DRAM, ROM / SRAM / flash, or extended output. The write enable signal NWE is used for the same external device set. The DRAM controller also includes a programmable refresh counter whose refresh cycle is controlled using a refresh cycle register (DRFPR).
[0031]
Table 6 shows the preferred physical DRAM addressing. Tables 7 and 8 represent DRAM address mapping for 32-bit and 16-bit DRAM memory systems. Assume that this 32 bits is based on two x16 devices connected to each RAS line, selecting the operation of a 32-bit DRAM. This mapping is repeated for every 256 megabytes in each bank. The placeholder “n” in these tables is equal to 0xC + bank number. The 16/32 bit DRAM selection is programmed by setting the bit in the system control register SYSCON2.
[0032]
The flash interface 121 enables the integrated circuit 100 to interface with the flash memory using the chip select signal CS [0: 1].
[0033]
The LCD controller 122 provides all the control signals needed to allow the integrated circuit 100 to interface directly with the single panel multiple LCD module 209. The total frame buffer size can be programmed up to 128 kilobytes using both on-chip and off-chip memory. As described above, by using the on-chip SRAM 118 as an LCD video frame buffer, a system can be constructed without using an external DRAM. The screen is preferably mapped to a video frame buffer.
[0034]
An LCD direct memory access (DMA) engine 123 is provided to fetch display data for the LCD controller 122 from the frame buffer memory. The pixel bit rate, and hence the LCD refresh rate, can be programmed from 18.432 MHz to 576 kHz when operating in 18.432-73.728 MHz mode, and from 13 MHz to 203 kHz when operating with a 13 MHz clock. can do.
[0035]
Integrated circuit 100 includes a pair of universal asynchronous transmit / receive (UART) interfaces 124 and 125. These asynchronous ports can be used for communication with a pair of RS-232 transceivers 210, for example. Each UART 124/125 can support a data rate of up to 115.2 kilobits per second when the integrated circuit 100 operates with a clock generated by the PLL 108. If the integrated circuit 100 is driven by a 13 MHz external clock source, the UART bit rates that can be generated include 9.6 Kbps, 19.2 Kbps, 38 Kbps, 58 Kbps, and 115.2 Kbps.
[0036]
Both UARTs 124/125 include a 16-byte transmit FIFO that drives the corresponding transmit (TX) pin and a 16-byte receive FIFO for receiving data from a dedicated receive (RX) pin. An RX interrupt is asserted when a given RX FIFO is half full or when it does not receive any more characters and is non-empty for longer than 3 character long times. The TX interrupt is asserted whenever a given TX FIFO buffer reaches a half-empty state.
[0037]
UART 124 (UART1) can also receive three modem control signals CTS, DSR, and DCD in addition to the RX and TX ports. Other modem control RI input and output modem control signals RTS and DTR can be implemented using the GPIO port 129, described further below. A modem status interrupt for UART1 is generated when any of these modem control bits change.
[0038]
The UART operation and line speed can be programmed through the UART bit rate and line control registers (UBLC1 and UBLC2). The four FIFOs can also be programmed to be 1 byte deep. The framing error bit and parity error bit detected when receiving each byte can also be read from the 11-bit wide register.
[0039]
The integrated circuit 100 also includes a post-processing stage 126 according to the IrDA (Infrared Data Association) SIR protocol at the output of the UART 124. Integrated circuit 100 includes pins that drive the input of connections to infrared light emitting diodes (LEDs) and photodiodes (collectively shown as block 211 in FIG. 2). The SRI encoder 126 is switched into the TX port and RX port of the UART 1 and the above signals can directly drive the infrared interface.
[0040]
The integrated circuit 100 further includes an SPI / Microwire master mode 128 Kbps ADC interface 127 and a serial interface 128, the latter being shown in more detail in FIG. A preferred serial pin assignment for the digital audio port is shown in Table 10. The SPI interface 1 (ADC interface) can be used for communication with an external analog / digital converter 212 and a digitizer 213. The serial interface block 128 includes a master / slave mode SBI / Microwire (SSI2) interface 603, a digital audio interface (DAI) 601, and a codec interface 604, all of which are set by a multiplexer 602. Multiplexed on external interface pins. The selected interface drives the corresponding circuit in block 214 of FIG. Multiplexing is controlled by programming the corresponding field in the system control register. A list of available serial interface options is provided in Table 11.
[0041]
The ADC interface 127 is compatible with SSI or Microwire compatible devices such as MAXIM, MAX148 / 9 peripherals in the default mode. The ADC interface 127 can interface with devices such as the analog device AD7188 / 12 chip using NADCCS as a common RFS / TFS line. Exemplary timing diagrams when integrated circuit 100 drives MAX 148/9 and AD 7811/2 are provided as FIGS. 7A and 7B, respectively. Exemplary I ² The S interface is shown in FIG.
[0042]
The clock output frequency of the ADC interface 127 can also be set using the system control register. For the 18.432-73.728 MHz operating mode, the ADC clock (ADCCLK) can be set to 4 MHz, 16 MHz, 64 MHz, or 128 MHz. When the integrated circuit 100 operates in response to an externally generated 13.0 MHz clock, the ADC clock can be set to 4.2 kHz, 16.9 kHz, 67.7 kHz, or 135.4 kHz. The sample clock SMPCLK always operates at a speed twice the shift clock (ADCCLK) frequency. Available ADC frequency options are shown in Table 12.
[0043]
The ADC serial output ADCOUT is provided by an 8-bit or 16-bit shift register, depending on the bit set in the SYNCIO register. The ADC serial input channel ADCIN is captured in a 16-bit shift register. The ACD clock sync pulse is activated by writing to the output shift register. During transfer, the SSIBUSY (Synchronous Serial Interface) busy bit in the system status flag register is set. When the transfer is complete and valid data is in the 16-bit read shift register, the SSEOTI interrupt is asserted and the SSIBUSY bit is cleared. The sample clock SMPCLK is enabled alone.
[0044]
The digital audio interface 501 provides an interface to a CD quality A / D and D / A converter as shown in FIG. (DAI is I ² A subset of S. ) 16-bit stereo digital audio 128-bit frame, audio sampling frequency, with separate transmit and receive lines. Note that each frame contains only 16 bits of the right channel and 16 bits of left channel audio data. The remaining bits are set to zero.
[0045]
FIG. 10 is an exemplary timing diagram illustrating the operation of DAI 601. The left and right clock (LRCK) provides a frame synchronization signal. The serial clock (SCLK) is a bit transfer clock and preferably has a speed fixed at 128 times the audio sample frequency. SDOUT (SDATAO) and SDIN (SDATAI) are used to transmit reproduction data to an external D / A converter and receive recording data from the external A / D converter, respectively. The timing between the integrated circuit 100, the external D / A converter and / or the external A / D converter is based on the oversampled clock MCLK. MCLK preferably has a speed fixed at 256 times the sampling frequency.
[0046]
Asynchronous serial interface 2 (SSI2) 503 is an SPI / Microwire interface that can operate in full master / slave mode. FIG. 10 represents a pair of integrated circuit 100 devices configured to operate in a master / slave manner. The preferred maintenance data transfer rate is 85.3 Kbps, which ensures that the period between interrupts is long enough. An interrupt is generated when the receive FIFO is half full and the transmit FIFO is half empty. In the slave mode, the serial clock (SSICLK) and the serial reception port (SSIRXDA), the received synchronization control pin (SSIRXFR) and the transmission synchronization pin (SSITXFR) are input, and the transmission pin SSITXDA is an output. In master mode, pins SSICLK, SSITXDA, SSITXFR, and SSIRXFR are outputs and pin SSIRXDA is an input. Mode selection is performed by programming bits in the system control register.
[0047]
Both asymmetric (unbalanced) traffic and continuous traffic are supported using separate transmission control lines and frame synchronization control lines SSTXFR and SSIRXFR. In this configuration, the receiving node receives 1 byte of data in 8 clocks after asserting the received frame synchronization control signal, and the transmitting node receives 1 byte in 8 clocks after asserting an independent transmission frame synchronization control pulse. Send data. Exemplary timing diagrams representing the operation of these two interfaces are provided in FIGS. 7A and 7B for reference.
[0048]
The codec interface 604 supports a direct connection to the telephony codec. In conjunction with clock and clock signal generation, the codec interface 604 also performs parallel-to-serial and serial-to-parallel conversions. This interface is full duplex and uses a corresponding transmit and receive FIFO operating at 64 Kbps. If allowed, a codec interrupt CSINT is generated every 8 bytes transferred (ie when the FIFO is half full / empty), in other words, every 1 millisecond with a 1 millisecond latency. . This timing is shown in FIG. 8, where CDENRX and CDENTX are a reception control bit and a transmission control bit in the system control register SYSCON1, respectively.
[0049]
The DAI 601 is an interface such as the interface 900 shown in FIG. ² Supports S interface. In this case, both the external ADC 901 and the external DAC 902. Clock source 903 provides a time reference. An exemplary timing diagram is provided in FIG. In FIGS. 9 and 10, MCLK is an oversampled clock, which is typically fixed at 256 times the audio sampling frequency. SCLK is a bit clock, which is typically fixed at 128 times the audio sampling frequency. LCLK is a frame synchronization signal and is usually fixed to the audio sampling frequency. SDOUT is an audio data output for transmitting playback digital audio to the DAC 902. SDIN receives recording data from the ADC 901.
[0050]
The SSI1 interface 603 supports a master / slave operation as shown in FIG. This interface provides a means to perform full-duplex serial transfer between two nodes. Data is transferred in bytes in response to clock and frame synchronization signals.
[0051]
Integrated circuit 100 is also provided with a set of general purpose input / output (GPIO) ports 129. In the illustrated embodiment, there are three 8-bit ports and one 3-bit port. The GPIO port can be used for purposes such as establishing an interface with the keyboard driver 215.
[0052]
A PWM (Pulsed with Modulator) circuit 130 includes two outputs for driving a DC / DC 216 converter operating in cooperation with an external power supply unit (PSU) subsystem 201. Typically, external input pins connected to the output from the comparator that monitors the external DC / DC converter output are used to enable these clocks. When the integrated circuit 100 is operated by the internal PLL 108, the PWM clock has a frequency of 96 kHz. The duty cycle ratio of these signals can be programmed from 1:16 to 15:16. Detection of the active cycle of the PWM drive signal can be set high or low by latching the state of the drive signal during power-on reset (i.e. pulling up the drive signal causes the output of the drive to become active Go low and vice versa). As a result, a positive voltage or a negative voltage can be generated by an external DC / DC converter. These outputs can also be disabled by clearing the bits in the control register.
[0053]
Communication between the blocks of the integrated circuit 100 is established through the advanced peripheral bus 132 and the advanced peripheral bus bridge 131. Internal data bus 106 is 32 bits wide and can be connected to external devices through multiplexing circuit 132. The internal address bus 107 is 28 bits wide and can communicate with external devices through the multiplexing circuit 133. An IEEE 1149.1 compliant ICE-JTAG circuit 134 is included for boundary scanning during the testing and development phase. Embedded ICE also supports ARM processor core debugging.
[0054]
In the preferred embodiment, the internal registers of integrated circuit 100 are of little endian configuration. However, the integrated circuit 100 is advantageous in that it can interface with a big endian external memory system. Specifically, the big end bit and the CPU 101 register set determine whether words in the external memory are stored in big endian format or little endian format. Specifically, the memory is considered a linear set of bytes numbered from zero to top. Bytes 0-3 hold the first storage word, bytes 4-7 hold the second storage word, and so on. In the little endian scheme, the lowest numbered byte in a word is considered the least significant byte of the word, and the highest numbered byte is the most significant word. Thus, byte zero in the little endian system is connected to data lines 7-0. In big endian mode, the most significant byte of a word is stored in the lowest numbered byte and the least significant byte is stored in the highest numbered byte. Thus, byte zero in a big endian system is connected to data lines 31-24. In the illustrated embodiment, only the load and store instructions are performed in an endian manner. Tables 13 and 14 represent the operation of the integrated circuit 100 for both reading (Table 13) and writing (Table 14). Note that column address strobe lines NCAS [3: 0] to the DRAM bank are always connected to the same byte lane regardless of endianness. For example, column address strobe line NCAS [0] is associated with data lines D [7: 0] and NCAS [3] is associated with data lines D [31:24]. As a result, in little endian systems, line NCAS [0] is asserted for read / write with byte 0 of the DRAM, and in big endian systems, line NCAS [3] is at byte 0 of the DRAM. Asserted for access.
[0055]
Integrated circuit 100 includes a set of programmable fuses that allow each chip to be assigned one or more unique ID numbers and passwords. Programmable fuses and associated registers are located in the security registers and in the hardware block 133 (FIG. 1) that operates from the APB 132. Limiting to the embodiment of FIG. 1D, the boot ROM itself is on the ARM local bus 107 and access checking is split to have logic both on the ARM local bus and in the ARM local / global AHB wrapper. become.
[0056]
In the preferred embodiment, this is 256 programmable fuses, including a set of public and private fuses. Private fuse addresses and values are hidden, and only private firmware corresponding to those fuses is allowed access. In a non-private environment, these addresses and values all return to zero. Public fuses are shown in Table 15 and private fuses are shown in Table 16.
[0057]
Integrated circuit 100 also includes embedded hardware in block 133 for matching the fused Hamming code with the Hamming code that matches the selected ID. When the validity check address is read, the ID value is checked against the Hamming value. The resulting 5-bit code provides debug information if the Hamming code does not match (if all fuses are blown or all fuses are not blown). Table 17 shows the decoding of the validity check read bits. This advantageously detects an error that occurs when the fuse is blown, and at the same time maintains a state in which access to the fuse value and address is not possible.
[0058]
Table 18 provides addresses that return validation codes for public ID-CHK pairs.
[0059]
Table 19 provides addresses that return the validation code for the private ID-CHK pair. These addresses can be accessed only from the firmware when the integrated circuit 100 is operating in the private mode, and 0 is read otherwise.
[0060]
There are two test registers that can be selected and validated as ID-CHK pairs so that the Hamming code generator can be fully tested. Table 20 provides the definition and location of these registers.
[0061]
FIG. 1B is a high level functional block diagram of a second system in a chip 140 suitable for implementing the principles of the present invention. This embodiment uses an ARM920T processor 141 that has both an instruction cache and a data cache in addition to the MMU. Unlike the integrated circuit 100, the system 141 does not include a general-purpose SRAM.
[0062]
FIG. 13 is a more detailed functional block diagram of the processor 141, particularly in an embodiment based on the ARM920T core. In this embodiment, the available cache includes both an instruction cache 1301 and a data cache 1302. Similarly, a separate instruction MMU 1303 and data MMU 1304 are used. The instruction modification virtual address (IMVA) bus, the instruction physical address (IPA) bus, and the instruction data (ID) bus are each 32 bits wide. Similarly, the data modification virtual address (DVMA) bus, the data physical address (DPA) bus, and the data data (DD) bus are also 32 bits wide. Physical addresses and data are exchanged with the AHB bus 142 through the AMBA bus interface 1305. Write buffer 1306 allows data to be exchanged in parallel through interface 1305 during operation of the processor core. Data in the cache 1302 can be output through a write back physical address (PTAG) RAM 1307.
[0063]
Essential to the processor core is the coprocessor, which translates the virtual address issued by the CPU into a modified instruction and data virtual address (MVA) transmitted in IMVA and DMVA as shown in FIG. 1B. Includes registers. Specifically, in the case of an address in a memory area of 0 to 32 megabytes, the virtual address VA is modified by a 7-bit process identifier to become VMA = VA + (ProcID × 32 megabytes), and this process identifier ProcID is the read or write process It is an identifier.
[0064]
The system 141 is based on an internal AHB (Advanced Microcontroller Bus Architecture High Speed Bus) 142 and an internal APB (Advanced Peripheral Bus) 143. The AHB / APB bridge 144 interfaces with the AHB 142 and the APB 143. The second bridge 145 serves as an interface between the processor 141 and the AHB 142.
[0065]
Devices that operate from the AHB 142 include a graphics engine 146 and a raster engine 147. In general, the graphics engine offloads functions such as block transfer and line drawing from the processor 141 in order to improve the graphics performance of the system. Graphics engine 146 preferably uses a standard device-independent bitmap (DIP) format to support Windows® CE. A raster engine 147 is provided for rasterizing data from an external display buffer through a synchronous DRAM interface 148 to drive an external LCD display, CRT display, or TV display.
[0066]
Other on-chip interfaces leading to the internal AHB include interface 149 that couples system 141 to an external system bus, PCMIA for interfacing with external PC cards, and on-chip such as DMA controllers and raster systems A test interface controller (TIC) interface 151 for testing circuit blocks is included. The memory interface 152 makes it possible to exchange control signals and data with an external SRAM, flash, or ROM in a manner similar to that described above. Then, the booting of the system described further below is performed at least partially using the boot ROM 53. In this example, boot ROM 153 operates from AHB 142, but in alternative embodiments, it can operate from any of a plurality of global and local buses.
[0067]
The system 140 includes an 8-channel DMA engine 154 that prioritizes and services requests for access to external memory by on-chip resources such as UART. Joint Test Action Group (JTAG) port 155 supports debugging of circuitry associated with on-chip processors. The universal serial bus (USB) controller 156 and the Ethernet (registered trademark) port 157 operate directly from the AHB.
[0068]
Multiple peripheral devices are provided on-chip and operate from the APB 143. System 140 includes

UARTs

158, 159, and 160, among others. The illustrated embodiment also includes a pair of

SP1 interfaces

161 and 162 and an AC97 interface 163. A real time clock (RTC) 165, a general timer set 166, and a watchdog timer 167 are also provided in this embodiment. An additional memory interface, EEPROM interface 168, is also coupled to the APB.
[0069]
Manual entry of data can be through an external key matrix coupled to the key matrix interface 169 or through a touch screen that interfaces with the touch screen ADC 171 and touch screen interface 170. An LED output 172 is also included in the user interface of the system 140.
[0070]
Similar to integrated circuit 100, system 140 includes a set of general purpose input / output (GPIO) ports 173, an interrupt controller 174, and an on-chip PLL 175 that drives system control circuit 176. The control circuit includes a memory remapping and system suspension control circuit 177. The flash VPP control block 178 generates a voltage necessary for writing and erasing the external flash.
[0071]
FIG. 1C is a high-level functional block diagram of another exemplary system-on-chip 180 that can suitably apply the principles of the present invention. In this example, a CPU core 181 such as an ARM7TDMI controller does not use an MMU or an on-chip cache. The CPU 181 operates in cooperation with the AHB bus 142 via the local AHB bus and the local / main AHB interface 182. CPU 181 is supported by memory 183, security gate 184, and security / reset circuit 185. Security is described in more detail below.
[0072]
In this embodiment, system 180 further includes a digital signal processor (DSP) 186 supported by global memory 189, data memory 190, and program memory 191. Interprocessor communication register 192 and I ² An S audio input / output port 193, a PWM circuit 194 capable of driving an external speaker at a CD quality level without using an analog DAC support circuit, and a DSP timer / STC 195 are connected to a DSP 186 via a DSP peripheral bus 196. Communicate with. These devices also operate from the APB. Peripheral devices operating from APB include USB slave port 197, SPI 198 for serial media input, and I ² S host port 199 is included.
[0073]
The Motion Picture Expert Group (MPEG) audio compression standard defines a stream syntax along with a process for decoding an encoded stream of digitized audio data. In the audio arena, three layers, namely layers I to III, are respectively defined. For purposes of this discussion, consider Layer III, which provides the highest quality audio playback.
[0074]
The encoding process begins by sampling one or more audio channels at a given sampling rate, such as 32 kHz, 44.1 kHz, or 46 kHz. The resulting digitized stream is passed through a polyphase filter bank, which divides the received time domain stream into 32 frequency subbands. Typically, the filter bank operates on 64 input samples at a time with 50% overlap, producing 32 frequency domain samples for 32 input time domain samples.
[0075]
The psychoacoustic model is used to remove portions of the audio signal that are inaudible to the human ear for auditory masking. Auditory masking is a feature of the human auditory system, where a strong audio signal makes it impossible to perceive weak audio signals that are close in time or space. Furthermore, the ability of the human ear to distinguish sounds is frequency dependent. In certain critical bands, the human ear does not precisely separate various incoming audio components. A processing subband that approximates such a critical auditory band is quantized according to the audibility of quantization noise in the subband.
[0076]
An engine based on a psychoacoustic model operates in parallel with a polyphase filter to determine the available noise masking for a given frequency component and a given volume. Based on this information, the data stream output from the polyphase filter is quantized and encoded. In Layer III, each of the 32 subband outputs from the polyphase filter is passed through a window, which allows the stream to be streamed with 50% overlap so that the window length is 36 and 12 samples wide, respectively. Parse into a long block of 18 samples or a short block of 6 samples. Long blocks are used to achieve better frequency resolution for relatively constant audio signal components, and short blocks are used to improve the resolution of transients. Next, each subband block is processed by a modified discrete cosine transform (MDCT). By further dividing the subband frequency, the spectral resolution is improved so that part of the aliasing caused by the polyphase filter can be canceled.
[0077]
MPEGx layer III makes the signal to noise ratio more uniform over the range of quantization values by making the quantization non-uniform. Layer III uses a scale factor band that approximates the width of a critical band and covers several MDCT coefficients. The scale factor is used when assigning noise, changes the masking threshold depending on the frequency, and basically sets the gain of each subband. Further, Huffman coding is performed on the quantized MDCT coefficients to enhance data compression. Finally, a bit reservoir is used, but when the number of bits required to encode a frame is less than average, it can be provided with bits and encoded You can borrow a bit from here when the required number of bits is greater than the average.
[0078]
Frames are formed from headers, CRC values, side information, and main data, but the relative positions of these components of the frame are not always in the same order and are not necessarily adjacent in the stream. The header contains a set of frame sync bits, MPEG version and layer identifier, CRC protection bits, a bit rate index indicating the bit rate at which the frame was created, and the frequency at which the audio data was sampled And a sampling rate frequency index indicating, and other information about the data to be transferred.
[0079]
The MPEG1 layer III bit stream can then generally be decoded as follows. Data is input to the decoder at a predetermined number of frames per second. A frame synchronization bit in the header part of each frame is detected. The scale factor is then extracted and decoded. Following this, Huffman encoded main data representing frequency energy is decoded. Apply a scale factor and requantize the data. At this time, when stereo data is processed, the stereo channel is restored to reduce aliasing. An inverse MDCT operation is performed, followed by an overlapped inverse discrete cosine transform (DCT) to return the data to the time domain. A low pass filter is applied to recover the PCM samples, but each sample is essentially a weighted average of 512 time domain samples adjacent to each other.
[0080]
When the integrated circuit 100 is configured as a MPEGx layer III decoder, a stereo DAC, such as a Circus Logic CS43Lxx Stereo Audio DAC, is coupled to the digital audio port 128 for driving a headphone set. An analog / digital converter, such as a Cirrus Logic CS53L32 audio A / D converter, can also be coupled to this port for inputting data from the microphone. This embodiment of FIG. 1D includes an on-chip PWM circuit that can drive headphones at a CD quality level without using an external stereo audio DAC.
[0081]
Often, it is necessary to prevent tampering, duplication, or logic analyzer investigation of software or firmware bundled with an electronic product. As a result, a certain level of security must be provided. This can be done, for example, using encrypted passwords, so that end users who are authorized by the manufacturer have access to assets in system memory for software, firmware download, debugging, and upgrade purposes. Is allowed, but denies the same level of access to unauthorized end users. In the context of a digital audio player, an online music distributor has paid a fee and received the password needed to download a song, based on the perception that this will prevent unauthorized downloads at least to some extent Confidence to allow end users.
[0082]
In general, there are several criteria that a security scheme must meet. First, the system must not allow unauthorized access as a result of a power-on reset. Second, secure information such as encrypted passwords, security codes, and information about the location in memory where the security information resides should not be easily accessible from outside the system. However, this secure information must be verified at the manufacturing test stage to ensure that the quality of the end user system is acceptable for normal manufacturing defects. Finally, if security measures are not provided or activated, the normal operation of the system must proceed in the expected manner.
[0083]
The principles of the present invention advantageously provide security techniques that allow the integrated circuit 100 to meet each of the above criteria. One technique uses the processor 101's ability to reverse the decoding of the chip select signal in response to specific default conditions or dynamic assertion of specific instructions. By reversing the decoding of the chip select at power-on reset, the security code can be executed from a memory space that is normally inaccessible. Furthermore, this functionality of processor 101 can only be activated at specific times when processor 101 does not execute instructions, thereby making any security breach attempt difficult.
[0084]
FIG. 12 is a flow chart illustrating a preferred procedure 1200 for booting integrated circuit 100 in accordance with the concepts of the present invention. Assume that processor 101 is an ARM720T processor or an ARM920T processor, and the signal names are based on signals and / or their instructions. This procedure begins with a power-on reset of the integrated circuit 100 by asserting a power-on reset (NPOR) signal at step 1201. The circuitry in the system immediately disables all hardware and debug functions and hides all security elements (eg firmware, registers, passwords) from external investigation (step 1202). This step ensures that the system is secure until at least step 1203 determines whether the security firmware routine is in place and available. In the preferred embodiment, this is done by reading a programmable fuse register.
[0085]
For purposes of explanation, the case where security is not provided or disabled is first described. In step 1204, it is determined whether to continue booting from the internal ROM or to use external memory. For the ARM processor embodiment, the NMEDCHG bit is used to select between the internal and external boot memory options. In step 1204, if the signal at pin NMEDCHG is clear (ie, active low), the integrated circuit 100 is booted from the internal ROM. In this case, in step 1205, the address mapping to the internal boot ROM is reversed by default. If the address mapping is reversed, execution will start from the current boot ROM location 0 (step 1207). In this embodiment of the figure, the power reset signal NPOR must be asserted to return the address mapping to the normal state.
[0086]
Alternatively, if the NMEDCHG bit is set (ie, in an active high state), booting is done from external memory (ROM / EPROM / flash). In this case, the chip select mapping is set as shown in Table 5A, and the external chip select 0 is selected as the boot memory.
[0087]
Now consider the case where a programmable fuse register read indicates that the security routine is in place and available. The boot branches at step 1208 and proceeds to the execution of the security procedure.
[0088]
The integrated circuit 100 can be configured to accommodate different sets of boot and / or security codes. This allows the integrated circuit 100 to operate using boot / security firmware from multiple vendors, even if each vendor's secure information can only be accessed by the vendor's own boot / security procedure. Therefore, it is convenient. First, the boot memory is programmed with multiple sets of boot code or options. This can be done using an internal boot ROM or one or more external memory chips (ROM / RAM / flash). Multiple boot options allow end users to choose from security firmware available from various vendors.
[0089]
As a result, the first option of the boot options in the boot memory is identified at step 1209 and aliased to the reset vector at step 1210, which is typically position 0x00 for the first option. All necessary security elements (registers, firmware, I / O devices) required for a given implementation are enabled by the current boot option, while all other security options (implementation) are hidden. The state is kept (step 1211). Step 1212 then executes the selected boot code on the processor to attempt to initialize the selected security firmware / software.
[0090]
If the corresponding security firmware / software called by the boot code in step 1213 is in memory, the integrated circuit 100 completes booting and is selected in step 1214 under supervisory control of the security firmware / software. Operates in a secure environment. On the other hand, if the required security firmware / software is not found, another boot option must be tried.
[0091]
If the last security option has not been reached in step 1215, the next security option in the boot code is selected (step 1216). An instruction is issued that dynamically forces a new reset vector to the processor. In this example, the reset vector jumps to point to the second security option in the boot code. At step 1218, processing returns to step 1211 and tries the boot process again. Note that in the illustrated embodiment, the instruction pipeline is three stages. As a result, an instruction for resetting the program counter to 0 is already loaded from the internal boot ROM before the execution of the instruction for changing the chip select. The MOVpc, # 0 instruction causes the processor pipeline to be flushed, thereby allowing several cycles to occur before performing a chip select. During this process, no other access is permitted to memory resources whose chip select signal changes during execution of the remap command.
[0092]
This process repeats until a security option is found that causes the integrated circuit 100 to enter secure operation at step 1214 or until the last security option is reached at step 1215. In the illustrated embodiment, the last or default option returns the boot procedure to a normal (insecure) boot at step 1219. Here, all debugging functions are enabled and the security functions are hidden in step 1220. In step 1221, a default boot ROM is selected, and in step 1222, the processor is dynamically forced with a reset vector. Nevertheless, in alternative embodiments, a default security code can be provided so that the integrated circuit 100 can still operate in a secure environment even if all major options are unavailable.
[0093]
In an embodiment with an integrated circuit 100 based on ARM920T, instructions and data can be locked into the corresponding instruction cache and data cache, which are not selected as replacement sacrifices by the replacement algorithm in the event of a cache miss. Locking data / instructions guarantees cache hits where the corresponding information is fetched directly from the cache, and good cache access latency. Further, information in the locked cache cannot be accessed from outside the integrated circuit 100 except through a JTAG port or other test debug mode that allows visibility into the cache or TLB memory. The JTAG port, which is primarily used during product development and testing, can be disabled when the integrated circuit 100 leaves the manufacturing location.
[0094]
Before locking a cache entry, the corresponding descriptor (physical address and grant) must be locked in the associated translation index buffer (TLB) to obtain predictable performance results. Many devices, such as the ARM920T used in this example, include both a data and instruction translation index buffer (TLB) in addition to the cache. For a given instruction or data field, the CPU generates a virtual address. The modified virtual address is then presented to the corresponding TLB, and the field of the modified virtual address is compared with the comparison (tag) register in the TLB. If there is a match and access can be granted (which is determined by the grant field in the TLB entry), the physical address returned from the corresponding TLB entry along with the modified virtual address index bits Bits are used to generate physical addresses and access caches or external memory as needed. If there is no match, the following process is started to convert the virtual address into a physical address in hardware.
[0095]
When a cache line is locked, the corresponding entries in the data TLB and instruction TLB are also locked and excluded from replacement during TLB update. For the ARM920T processor, the TLB entry locks by writing the identifier for the particular entry in the TLB of the data and instruction to be locked to the TLB lock down field of the system control processor register C15.
[0096]
The TLB lockdown procedure 1300 in FIG. 13 is one technique for locking an instruction TLB entry or a data TLB entry. Step 1301 sets up a page table containing physical address bits and permissions corresponding to the data or instruction to be protected. Then, at least some entries in the target TLB are flushed or cleaned to ensure that the code to be locked is not already in the TLB register (step 1302).
[0097]
In an embodiment using an ARM920T processor, both the data TLB and the instruction TLB are organized into a single segment of 64 lines. The replacement (sacrifice) counter points to the entry to be replaced. Therefore, in step 1303, the replacement counter is updated to point to the next entry to write the locked information. In the preferred embodiment, the process starts at entry 0.
[0098]
A prefetch instruction is used for the instruction TLB to generate a corrected virtual address and forcibly cause a TLB miss (step 1304). In the case of a data TLB, a load instruction can be used to cause a mistake. After a miss has occurred, a page table walk must be performed to generate descriptors (eg, physical addresses and permissions) and loaded into the TLB (step 1305). In step 1306, using the physical address bits of the accessed page table entry and the index bits from the modified virtual address, the descriptor generated by the page table walk is given a given TLB. Load the entry pointed to by the contents of the current replacement counter.
[0099]
In the ARMT 920 embodiment, the loaded TLB entry is locked in step 1307 by setting the corresponding TLB lockdown register bit. If the last entry is reached in step 1308, the procedure ends, otherwise the procedure loops back to step 1303 in step 1309, and the replacement counter is updated in preparation for loading the next entry.
[0100]
Locking a TLB entry can lock the corresponding data or code in the cache. For illustration purposes, consider the case of locking an instruction in the ARM920T instruction cache. The same applies to the data cache. It should also be noted that the inventive concept is not limited to systems using ARM processors, but can be applied to any system or device that includes lockable instruction and / or data caches.
[0101]
FIG. 14 shows a cache lockdown procedure 1400 for locking secure code into the cache. As described further below, in the illustrated embodiment, a cache miss must be forced to perform a lock operation. A preferred method for forcing a cache miss is described below in connection with FIG.
[0102]
In step 1401, an actual page table or an emulated page table is set up with the physical address of the memory location where the data or instruction to be locked is in the cache. An emulated and synthesized page table using the concepts of the present invention is further described below. This table is preferably used to update the corresponding TLB using procedure 1300.
[0103]
In step 1402, at least some cache lines of a given cache are flushed or cleaned to ensure that the code to lock is not already in the cache. Step 1403 forces the replacement (sacrifice) counter associated with the cache to point to the first cache line (cache line 0). In the preferred embodiment, the data cache and instruction cache are each partitioned into eight 64-line segments, and each segment is indexed by an index field in the modified virtual address. Step 1400 fills the cache lines sequentially. For example, all segment cache lines 0 are filled sequentially first, followed by all cache lines 1 sequentially, and so on.
[0104]
Step 1404 generates data or instructions to be cached, which may require a decryption process (decryption). Then, the data or instruction is stored in a corresponding position in the alternative memory such as internal SRAM or external SRAM / DRAM / flash. Step 1305 then performs a prefetch cache line operation to cache the instruction, invoking a lookup on the pointed cache entry. (The LOAD instruction can be used for the data cache.) This causes a cache miss and requires the processor to access an alternate memory containing the necessary data or instructions. If the TLB is up-to-date and accurate, the processor can make this access by referring to the TLB for the required bits in the physical address or directly walk the page table set up in step 1401. This can be done through. The physical address itself is generated from the base address bits of the accessed entry in the TLB and the index bits of the virtual address.
[0105]
In step 1405, the generated code or data is placed where cache misses are to be handled, and in step 1406, a line fill is performed on the current replacement pointer entry of the cache line. Again, the cache segment is indexed with the cache segment index bit of the virtual address to cause a cache miss.
[0106]
If the last segment of a given cache has not been reached in step 1407 and further cache operations are required, the processor increments the cache segment index bit in step 1408 to increment the next cache with the current replacement counter value. • Force the next cache access to the segment. The procedure returns to step 1040 and continues from there.
[0107]
However, if the operation just finished is for the last cache segment and further cache operations continue (ie, the last cache line to be satisfied in step 1409 has not been reached), in step 1410, the procedure Jumps back to step 1403, updates the value of the replacement counter, and the procedure continues from that point.
[0108]
When all the code to be locked has been loaded, the base of the replacement counter is set to a value one greater than the value of the base of the locked cache line (step 1411). This ensures that private data (which has been decrypted at this point) is not overwritten upon a cache miss or cannot be accessed by unauthorized persons. Then, in step 1412, the code can be executed from the cache.
[0109]
One means of creating locked and cached data without using memory locations for the entire area to lock is to emulate that area using the length of the cache line of the register. Cache miss emulation can also be used to improve hardware limitations on the granularity of cache lock processing. For example, in an embodiment according to ARM920T, the cache can be locked in units of 64 word blocks (256 bytes). However, each cache line is only 8 words (32 bytes) long and can therefore be mapped to different locations within a 64 word block depending on the address bits.
[0110]
In accordance with the concepts of the present invention, for each lockable location, eight programmable 32-bit emulated cache line (ECLINE) registers are set up as eight consecutive 32-bit locations at alternate locations in memory. Is done. In addition, a compare (offset) register (ECOFFSET) is set up, which is programmed with a physical address that identifies where in the cache memory space the contents of the ECLINE register are present after the emulated cache miss. Is done. As a result, a position that is one cache line in size can be used to represent the entire 64 word lockable position.
[0111]
The emulated cache miss procedure 1500 is shown in the flow chart of FIG. In step 1501, the contents to be cached (in the instruction cache or data cache) are written to the ECLINE register. Next, an offset to the lockable cache space into which data is written is programmed into the ECOFFSET compare register (step 1502).
[0112]
In step 1503, an operation for causing a cache miss is performed. In the case of an instruction cache, this can be done through a prefetch instruction to the instruction cache, and in the case of a data cache, it can be done through a load instruction. A virtual address generated for this location causes a miss in a given cache and then uses the index bit of the virtual address and the base bit taken from the appropriate TLB, or a page table walk Go to generate the corresponding physical address. Step 1505 retrieves the information in the corresponding ECLINE register, and step 1506 loads it into the addressed entry in the cache. This entry is provided for the lock using procedure 1400. This procedure advantageously allows the locked portion of the cache to be loaded without resorting to internal SRAM or external SRAM.
[0113]
As already pointed out, a page table walk is required during cache and TLB lock operations to generate the physical memory addresses from which data or instructions are fetched. The concept of the present invention enables the creation of a streamlined page table that saves the amount of memory that must be reserved for page table support. Further, even taking into account TLB misses, the concept of the present invention is based on data and electronic analysis through tampering, copying, or secure operation of the MMU 104 during section / page table walk address translation. Protect instruction codes. For purposes of discussion, the ARM920T processor core is considered again, but the principles of the present invention can be applied to other processor and memory management unit memory management schemes.
[0114]
For this embodiment, a conventional page table walk generally proceeds as follows. During a level 1 fetch, the section descriptor (level 1), course page table base address, or fine page table base address is retrieved from the 4096 entry translation base table (TBT). The TBT is accessed using the TBT base address in the translation base register and the table index field of the modified virtual address.
[0115]
If the output from the TBT is a section descriptor, the descriptor includes a section base address and access permissions. The section base address bits from the level 1 descriptor and the modified virtual address section index bits are then used to generate the physical address of the 1 megabyte section of memory. (Assuming that the permissions contained in the level 1 section descriptor are desirable.)
[0116]
The course page table base address retrieved from the TBT, together with the modified virtual address level 2 table index, accesses one of the 256 entries in the course page table, thereby allowing 1 megabyte of Divide the block into 4 kilobyte blocks. The course page table returns a large base address or a small base address along with access permissions. Depending on the permission status, the large page or small page base address bits are combined with the modified virtual address page index bits to create a 64 kilobyte large page or 4 kilobyte small page of memory. Generate the physical address of the page.
[0117]
The fine page base address retrieved from the TBT points to the 1024 entry fine page table, along with the level 2 table index bits from the modified virtual memory address. The output from this table is a level 2 descriptor, which contains a large, small, or tiny base address along with access permissions. A large page is 64 kilobytes, a small page is 4 kilobytes, and a tiny page is 1 kilobyte. If permission indicates access permission, the page base address is concatenated with the page index bit of the modified virtual address to make a large or small page in memory already mentioned, or one in memory. Generate the physical address of a kilobyte tiny page.
[0118]
Memory accessed as a result of the page table walk is either cache, internal memory, or external memory. Update TLB using physical address and permissions. If there is secure information, it is locked in the TLB as described above.
[0119]
The disadvantage of this two-level table walk procedure is that a considerable amount of on-chip memory is required for each table. As mentioned earlier, secure information must be in a memory area inside the system that is not accessible by unauthorized users. Therefore, some measures must be taken to efficiently store information requiring attention in handling such as a physical address conversion method in an available internal memory.
[0120]
The preferred embodiment of the integrated circuit 100 can greatly simplify the table walk process and significantly reduce the amount of memory required for the translation table. This is not only important from the standpoint of improving operating efficiency, but it also ensures that an insecure external memory is not used.
[0121]
Here, the memory space is divided into 256 megabyte regions, and each region is associated with a set of common access characteristics (eg, access permissions, cache possibilities, buffering possibilities). A second level page table is required for only one megabyte of this area. Therefore, since many memory areas have common access characteristics, a much smaller conversion table can be created in the available SRAM space.
[0122]
The access permission indicates whether the given information can be accessed from the corresponding memory block. The cacheability and bufferability attribute bits are used to determine whether the accessed information can be stored in a cache or transferred through a write buffer. For example, the contents of real hardware registers that control UARTs and other peripheral devices and I / O devices cannot generally be cached or buffered by the CPU subsystem. This can lead to inaccurate operation of these peripheral devices due to the actual timing of access.
[0123]
In addition, secure systems keep page / section table information within private area boundaries and move that information from memory to device pins that can be examined by a logic analyzer. It must be impossible.
[0124]
In the illustrated embodiment of partitioning the memory into 16 256 megabyte blocks, a 32-bit register is created to store level 1 AP bits, with each 2-bit pair corresponding to a 256 megabyte memory region. . For example, bits [1: 0] correspond to region 1, bits [3: 2] correspond to region 2, and so on. Set up a 16-bit register to hold a bit set that indicates the cacheability of each level 1 region. Set up another 16-bit register to hold either of the bit sets that indicate the cacheability of each region. These registers are pointed to by the contents of the translation base register in the MMU.
[0125]
FIG. 16A illustrates a procedure 1600 for updating these registers while simultaneously dealing with memory areas having unique characteristics and constants.
[0126]
For a given 256 megabyte region, step 1601 determines whether it has a common set of access characteristics. If the result of the determination is affirmative, step 1602 loads the AP bit corresponding to the corresponding entry in the global level 1 AP register. At

steps

1602 and 1603, the corresponding entries in the registers indicating global level 1 bufferability and cacheability are updated as well.
[0127]
In step 1605, the procedure returns to update the register entry of the next memory area (block) that needs to be updated. It is preferable to initialize / update the global access control register in a loop. In general, the value does not change, but can be updated during system processing if necessary. The full register value of the entry that is not synthesized is updated as appropriate during system operation. An update is necessary, for example, when a memory page is replaced with another page when it is “swapped” out to disk or similar mass storage device.
[0128]
For a memory block or register having a unique set of access characteristics including access permissions, bufferability and cacheability bits, and physical address bits at step 1601, a full 32-bit Load a complete level 1 descriptor into a register. The procedure loops back to step 1608 again. This descriptor may contain the address of a course or fine page (level 2) table.
[0129]
Otherwise, at step 1608, a constant is pointed at the hardwired gate. The stored constant may be a fixed value or a base address of the level 2 table. If step 1609 does not require a walk to level 2, the procedure loops back to step 1610. If not, step 1611 sets up the corresponding register in the level 2 synthesized table.
[0130]
A similar process is used for level 2 synthesis. Specifically, for each level 2 page, for a page and sub-block that have common characteristics, a register pointed to by a level 2 base address bit in the level 1 register and a global level 2 AP register Are set up as described above with level 2 bufferability and cacheability registers.
[0131]
At step 1612, for a given page or set of pages, it is determined whether it has a common set of access characteristics. If the determination is positive, step 1613 loads the appropriate AP bit into the corresponding entry in the global level 2 AP register. At

steps

1614 and 1615, the corresponding entries in the register indicating global level 2 bufferability and cacheability are updated as well.
[0132]
In step 1616, the procedure returns to step 1601 to update the register entry of the next memory area (block) that needs to be updated.
[0133]
For those level 2 pages, pages, blocks having a unique set of level 2 access characteristics including access permissions, bufferability and cacheability bits, and physical address bits in step 1612; Or for a set of registers, step 1618 loads a full level 2 descriptor into a full 32-bit register. If not, step 1618 points to a constant in the hardwired gate. The stored constant may be a fixed value, a base address, or the like. The procedure loops back to step 1601 at step 1619 again.
[0134]
An exemplary synthesized page table walk is shown in FIG. 12B. In step 1620, a table walk is requested. This request may be in response to a TLB and / or cache miss. In this example, consider the first case where step 1621 does not require the second level of the table walk. At step 1622, the level 1 register is pointed to by the translation base register in the MMU. The level 1 register entry is indexed using the table index bits of the virtual address (step 1623).
[0135]

Steps

1624 and 1625 determine whether the return from the indexed entry of the level 1 register is a complete descriptor or a constant. Consider first if the return is not a constant or a complete descriptor.
[0136]
At step 1626, access control bits (ie, AP, cacheability, and bufferability bits) in the first level global access register are retrieved. Step 1627 then converts the table index of the virtual address to a physical address by moving the bit position relative to the virtual address.
[0137]
In the preferred embodiment, the translated virtual address of a section entry is a table index bit (lookup word index bits 13: 2 into the level 1 page table of 4096 entries is the entry ( Result bit (31:20) for 1 megabyte memory area). The area of the section is defined by the memory location bits (13:10). In embodiments using an ARM920 or ARM720 MMU, several bits in the page table entry will always be a constant 0 or 1.
[0138]
At step 1628, the translated address bits and the retrieved access control bits are combined to form a level 1 descriptor. In step 1629, the composite descriptor for TLB and / or cache update is returned.
[0139]
Returning to

steps

1624 and 1625, the level 1 entry is also a complete descriptor (step 1630) or a constant (step 1631). At step 1632 the descriptor or constant can be used immediately.
[0140]
Next, assume that a level 2 table reference is requested in step 1621.
[0141]
The level 2 conversion is similar to the conversion performed when only level 1 reference is required. At step 1633, the level 2 register set up as described above is pointed to by the base address in the MMU. At step 1634, the specific index or entry is indexed using the table index bits of the virtual address.

Steps

1635 and 1636 determine whether the indexed register (entry) contains a complete descriptor or constant. If a descriptor is found, the descriptor is taken out at step 1637, and if a constant is found, the constant is taken out at step 1638. This descriptor or constant can be used immediately in step 1639.
[0142]
If neither constants nor descriptors are found in

steps

1635 and 1636, the second level access control register is accessed in step 1640 and the corresponding access using the page index bits of the virtual address in step 1641. Retrieve control bits. In step 1642, the page index bits of the virtual address are converted to physical addresses by shifting the bit positions. In step 1643, the physical address bits are combined with the retrieved access control bits to form a composite descriptor. At step 1644, this composite descriptor is returned for TLB update, memory execution on cache miss, or similar operation.
[0143]
Note that for the sake of brevity, the synthesized table walk has been described only with respect to level 1 and level 2 descriptor generation. However, it should be noted that iteratively applying the principles of the present invention can implement other levels of walk-through below the second level.
[0144]
In summary, according to the inventive concept, only a 32-bit AP register and a pair of 16-bit registers for bufferability and cacheability are required for the first level table. All that is required for each second level page that must be addressed is a second level table consisting of a small page AP register and a 1-bit cacheability and bufferability register. is there.
[0145]
The concept of the present invention also advantageously allows for address translation and TLB updates in the event of a cache miss, with memory register emulation similar to cache miss emulation. The cache and / or TLB entries can then be locked for security as described above. The preferred emulation process uses an alternative emulated memory so that the internal memory of the integrated circuit 100 can be allocated for other tasks. The memory address of the page table is preferably mapped inside the integrated circuit. A preferred procedure using such a concept is the emulated table walk / TLB update procedure 1700 shown in FIG.
[0146]
First, an emulated level 1 translation register (table) (EL1TR) containing either a level 1 descriptor or a level 2 base address is created (step 1701). In addition, an emulated level 1 index register (EL1IR) that holds an index to the entry in EL1TR is set up in an alternative memory space (step 1702). The translation base table (TTB) in the MMU is programmed to point to the emulated level 1 table. A request for this area receives the contents of EL1TR including an index to table match EL1IR. If the indexes do not match, the returned value is the entry that raises the exception.
[0147]
For address translation that continues past the second level, an emulated level 2 translation register (EL2TR) containing a level 2 descriptor is created in the alternate memory (step 1704) and the emulated level 2 index is created. The register holds the corresponding index (step 1705).
[0148]
In step 1706, the CPU 101 is instructed or a virtual address is generated using an external address generator. If the cache and TLB are flushed or cleaned, a cache / TLB miss occurs, so in step 1707 the table walk procedure is invoked using the emulated level 1 table pointed to by the MMU. The level 1 table index bits in the virtual address are compared with the table index bits in EL1IR and the corresponding level 1 information returned from EL1TR (step 1708).
[0149]
If the information is a descriptor in step 1708 (ie, level 2 conversion is not required), level 1 access is performed (step 1709), where permission in the descriptor is examined (step 1710). If permission is not granted, operation stops at step 1711. Otherwise, in step 1712, a physical address is generated from the section address bits in the level 1 descriptor and the section index in the virtual address. Step 1713 loads this physical address into the TLB and waits for the lock operation and the corresponding data or instruction to be loaded into the appropriate cache. If step 1714 determines that the current entry in the TLB is not the last entry to load, at step 1715, the procedure loops back to step 1706 to begin the next table walk. Otherwise, in step 1716, the TLB lock procedure is executed.
[0150]
If it is determined in step 1708 that the EL1TR information is the base address to level 2, a level 2 page walk is activated in step 1717. The EL2TR register is accessed using the base address of EL1TR (step 1718). The particular register is indexed using the contents of the corresponding EL2IR register by comparing against the index bits of the virtual address (step 1719). In step 1720, the permissions in the returned level 2 descriptor are examined. If access is not permitted, access is stopped in step 1721. Otherwise, in step 1723, a physical address bit in the level 2 descriptor and a virtual address index bit are used to generate a physical address. . In step 1723, the physical address is loaded into the TLB and a lock operation is awaited.
[0151]
If the current TLB entry is the last entry to load at step 1724, the TLB lock procedure can be invoked at step 1725; otherwise, at step 1726, the procedure jumps back to step 1706 and the next Start a table walk of TLB entries for
[0152]
In some embodiments of integrated circuit 100, it is possible to use a memory management unit (MMU) or a bare CPU that does not include a hardware cache. For example, the CPU core 101 can be based only on the ARM7tdmi processor 102 without using the cache 103 or the MMU 104. If this option is selected, all software must be stored in a flat memory space in memory. However, this may require the use of an external memory (eg, NOR flash, SRAM, DRAM). As noted above, the data in external memory has the major disadvantage that it can be accessed or analyzed by unauthorized end users.
[0153]
In an embodiment of the integrated circuit 100 that does not use a hardware cache or MMU, the security code is executed in supervisor mode. In supervisor mode, access to specific areas of memory and specific registers is checked against the supervisor's privileges. The security firmware is preferably executed from an internal memory such as SRAM. In supervised mode, all other software / firmware is interpreted as running in user mode and is therefore checked by supervisory privileges with secure software.
[0154]
Although the invention has been described with reference to particular embodiments, these descriptions are not to be construed in a limiting sense. Various modifications of the disclosed embodiments, as well as alternative embodiments of the invention, will become apparent to those skilled in the art upon reference to the description of the invention. Those skilled in the art will appreciate that the disclosed concepts and specific embodiments can be readily utilized as a basis for other structural modifications or designs to perform the same purposes of the present invention. Those skilled in the art will also recognize that such equivalent constructions do not depart from the spirit and scope of the invention as set forth in the appended claims.
[0155]
Accordingly, the claims are intended to cover variations or embodiments that fall within the true scope of the present invention.
[0156]

[0157]

[0158]

[0159]

[0160]

[0161]

[0162]

[0163]

[0164]

[0165]

[0166]

[0167]

[0168]

[0169]

[0170]

[0171]

[0172]

[0173]

[0174]

[0175]

[0176]

[0177]

[0178]

[0179]

[0180]

[0181]

[0182]

[0183]

[Brief description of the drawings]
FIG. 1A is a high level functional block diagram of an integrated circuit implementing the principles of the present invention.
FIG. 1B is a high level diagram of a second system implementing the concepts of the present invention.
FIG. 1C is a diagram of a third exemplary system in which the principles of the present invention can be applied in an advantageous manner.
FIG. 1D is a diagram showing two further forms.
FIG. 2 is a diagram of an integrated circuit 100 in a configuration that allows maximum utilization.
FIG. 3 is a high level functional block diagram of the processor of FIG. 1B.
FIG. 4A depicts an external clock that drives pin EXPCLK when the system enters a “waiting state” and the clock enable signal on pin CLKEN is asserted.
FIG. 4B is a diagram representing an external clock that drives pin EXPCLK when the clock enable signal on pin CLKEN is asserted and the system exits the “waiting state”.
FIG. 5 is a state diagram illustrating an operation of the state control circuit of FIG. 1A.
FIG. 6 is a block diagram of three serial interfaces including the serial interface block of FIG. 1A.
FIGS. 7A and 7B are timing diagrams representing SSI (ADC) operation in the context of a selected external device.
FIG. 8 is a timing diagram illustrating the operation of the codec interface of FIG.
9 is a block diagram of the serial interface block I in FIG. ² It is a functional block diagram which shows the interface between S ports.
FIG. 10I of FIG. ² It is a timing diagram showing operation of the interface of S.
FIG. 11 is a functional block diagram illustrating use of the SSI2 port of FIG. 6 in a master / slave configuration.
FIG. 12 is a flowchart illustrating system initialization at power-on reset.
FIG. 13 is a flowchart illustrating a procedure for locking private data in a TLB.
FIG. 14 illustrates a cache lockdown procedure for locking secure code in a cache.
FIG. 15 is a flow chart showing the procedure for an emulated cache miss.
FIG. 16A is a diagram of a preferred method for setting up a synthesized translation table.
16B is a flowchart illustrating a table walk through the composite table of FIG. 16A.
Figures 17A-17E illustrate a preferred procedure for performing an emulated table walk.

Claims

A method for preventing access and observation of cached information in a system including a memory, a cache, a translation index buffer, and a control circuit comprising :
Generating private information to be cached by the control circuit ;
And storing the private information in the memory by the control circuit,
And updating the translation lookaside buffer with the descriptor for location in said memory containing private information by the control circuit,
A step of causing forced cache misses at a selected location in the cache to be loaded a selected portion of the private information by the control circuit,
Using the corresponding descriptor translation lookaside buffer by the control circuit, retrieving the selected portion of the private information from said memory,
Loading the portion of the private information retrieved by the control circuit into a selected location in the cache;
Locking a selected portion of private information at a selected location in the cache by the control circuit ;
Locking a descriptor corresponding to a selected portion of private information in the translation index buffer;
With a method.

Selected location in the cache is associated with replacement counter counted from substituted counter based value, the step of performing the lock, a substituted counter based value associated with the selected location in the cache replacement The method of claim 1 including the substep of resetting to a value higher than the counter base value .

Updating the translation index buffer comprises:
A substep of setting up a conversion table containing an entry for generating a descriptor for the memory location that stores private information,
Updating the pointer stored in the replacement counter to point to the current translation index buffer entry to be filled;
Forcing a mistake in the current translation index buffer entry;
The substeps of running table walk of the conversion table by the control circuit to generate a descriptor associated with the private information in the memory,
A substep of loading the descriptor obtained from the table walk into the current translation index buffer entry;
The method of claim 1 comprising:

The method of claim 1, wherein loading the selected portion of the decoded information into a cache comprises loading into a cache line of an instruction cache.

The method of claim 1, wherein loading a selected portion of the private information into a cache includes loading into a cache line of a data cache.

The method of claim 1, wherein setting up the translation table includes setting up an emulated translation table.

Memory to store private information that should be safe,
A cache memory having a target location for caching the private information;
A translation index buffer having a location for storing a descriptor for accessing the private information from the memory;
Having a control circuit,
The control circuit
Force a miss on the target location in the cache,
Retrieves private information from the memory using the descriptor in the translation index buffer;
Load the retrieved private information into the target location in the cache,
Lock the retrieved private information in the target location in the cache ;
Ri operatively der to <br/> so to lock the descriptors of the translation lookaside buffer,
The locking of the private information in the target location in the cache and the locking of the descriptor in the translation index buffer hides the private information from observation .

The control circuit includes a counter that points to a target location and is operable to lock the private information in the target location in the cache by resetting a base value loaded into the counter. The processing system according to claim 7 .

8. The processing system of claim 7 , wherein the control circuit and the conversion index buffer include part of a microprocessor.

The processing system of claim 9 , wherein the cache includes a portion of a microprocessor.

The processing system of claim 10 , wherein the microprocessor includes a portion of a system on chip.

The processing system of claim 7 , wherein the cache includes an instruction cache.

The processing system of claim 7 , wherein the cache comprises a data cache.