JP4228275B2

JP4228275B2 - Information processing device

Info

Publication number: JP4228275B2
Application number: JP2002339323A
Authority: JP
Inventors: 太一平尾; 博一花木; 知史奥田
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2002-11-22
Filing date: 2002-11-22
Publication date: 2009-02-25
Anticipated expiration: 2022-11-22
Also published as: JP2004171458A

Description

【０００１】
本発明は情報処理装置に関し、特に、ＣＰＵが行う処理速度の向上をはかるための情報処理装置に関する。
【０００２】
【従来の技術】
近年、パーソナルコンピュータが普及し、そのパーソナルコンピュータの頭脳であるＣＰＵ（Central Processing Unit）の処理能力（処理速度）が向上しつつある。ＣＰＵの処理能力を向上させるために、ＣＰＵの他に、コプロセッサを設けといったことが行われている。（例えば、特許文献１乃至５参照）
【０００３】
【特許文献１】
特開平１１−７３３１４号公報
【特許文献２】
特開平９−３２５７５９号公報
【特許文献３】
特開平９−６９０４７号公報
【特許文献４】
特許第２８４９１８９号明細書
【特許文献５】
特許第２９０８０９６号明細書
【０００４】
【発明が解決しようとする課題】
ＣＰＵが演算を実行する際、必要なデータおよび命令を記憶装置にアクセスして取り出す必要がある。その記憶装置としては、例えば、ＣＰＵ内に備えられているキャッシュや、ＣＰＵの外部に備えられているメモリやレジスタなどがある．ＣＰＵがＣＰＵ内のキャッシュにアクセスする場合、そのＣＰＵとキャッシュ間で行われるデータ転送の速度は、ＣＰＵとメモリ間で行われるデータ転送速度と比較すれば早い。ＣＰＵがＣＰＵ外部のレジスタやメモリにアクセスする場合、そのアクセスの速度（データ転送の速度）は、ＣＰＵとメモリ（レジスタ）を接続しているバスが経由されるため、データ転送が完了するまでに時間がかかる。
【０００５】
これは、ＣＰＵが、データ要求信号および対象となっているデータが格納されているアドレスをメモリやレジスタに送り、その応答としてのレジスタやメモリからデータが転送されるまで、次の要求信号を送信することができないためである。
【０００６】
また、ＣＰＵ以外にもバスを使用してデータの転送などを行うデバイスがあるために、バスがビジー状態になってしまうことなどに起因して、ＣＰＵがバスを使用できるようになるまでに、待ち時間が必要となる場合がある。そのために、非常に演算量が多いときなどは、処理速度が遅くなり、リアルタイムでの処理が要求されるときには、その要求に答えることができなくなり、現状でのＣＰＵの性能では不充分なときがあった。
【０００７】
この問題への対策の一つとして、例えばＤＭＡ(Direct Memory Access)がある。このＤＭＡは、ＣＰＵの代わりにメモリとメモリ間などのデータ転送を行うものであり、ＣＰＵを介することなく高速なデータ転送が可能である。しかしながら、ＣＰＵは、データ転送をＤＭＡに移管するために、アドレスや転送サイズなどをＤＭＡコントローラに設定する必要があり、その設定するための処理が行われる間には、上述した問題が発生する可能性があった。
【０００８】
本発明はこのような状況に鑑みてなされたものであり、ＣＰＵの処理速度を向上させることを目的とする。
【０００９】
【課題を解決するための手段】
本発明の情報処理装置は、ＣＰＵのみがアクセスする第１の記憶デバイスと、前記ＣＰＵと前記ＣＰＵとは異なるデバイスがアクセスする第２の記憶デバイスと、前記ＣＰＵと前記第１の記憶デバイスを接続する第１のバスと、前記ＣＰＵと前記第２の記憶デバイスを接続する第２のバスとを備え、前記第１のバスを介した前記ＣＰＵと前記第１の記憶デバイスとの間で用いられる前記第１のバス用の命令セットは、レイテンシが固定長で、応答信号の１つであるAcknowledgeを有せず、オフセット値を有せず、即値またはポインタを用いて前記第１の記憶デバイスにアクセスするようにされ、前記第１の記憶デバイスに、前記ポインタを用いてアクセスする場合、前記第１のバス用の命令セットの前記ポインタに格納されている値のうち、所定の上位ビットにマスクを施し、前記所定の上位ビット以外の下位ビットのみを増減させるようにされている。
【００１０】
前記第１のバス用の命令セットは、前記第２のバスを介したＣＰＵと第２の記憶デバイスの間で用いられる第２のバス用の命令セットと同一の形式であるようにすることができる。
【００１１】
前記第１の記憶デバイスと前記第２のバスとを接続する第３のバスをさらに備えるようにすることができる。
【００１２】
前記第１のバスは、前記ＣＰＵのコプロセッサポートと接続されるようにすることができる。
【００１６】
本発明の情報処理装置においては、ＣＰＵとＣＰＵ専用のメモリが、専用のバスで接続され、そのバスにおける命令セットは、レイテンシが固定長で、Acknowledgeを有しないものとされているため、ＣＰＵが処理を実行する上で、連続的に命令を出すことが可能となり、もって、ＣＰＵの処理能力を向上させることが可能となる。また、命令セットは、オフセット値を有せず、即値またはポインタを用いてメモリにアクセスするようにされ、ポインタを用いてアクセスする場合、命令セットのポインタに格納されている値のうち、所定の上位ビットにマスクを施し、所定の上位ビット以外の下位ビットのみを増減させるようにされている。
【００１７】
【発明の実施の形態】
以下に、本発明の実施の形態について図面を参照して説明する。図１は、本発明を適用した情報処理装置の一実施の形態の構成を示している。図１に示した情報処理装置１０は、例えば、パーソナルコンピュータなどに組み込まれる。ＣＰＵ１１とローカルメモリ１３は、ローカルバスインタフェース１２およびローカルバス１４により、データの授受ができるように接続されている。ＣＰＵ１１とローカルバスインタフェース１２は、標準メモリバス１５により、レジスタ１６やメモリ１７とも接続されている。
【００１８】
ローカルバスインタフェース１２は、ＣＰＵ２１のコプロセッサポート２１と接続されている。
【００１９】
ここで、コプロセッサポート２１について説明する。コプロセッサポート２１は、本来、コプロセッサと接続するために設けられている。コプロセッサとは、ＣＰＵ１１を補完し、性能を強化するためのプロセッサであり、浮動小数点演算を行うコプロセッサが代表的なものとしてあげられる。
【００２０】
このコプロセッサを接続するためにＣＰＵ１１に備わっているコプロセッサポート２１に、本実施の形態においては、コプロセッサではなく、ＣＰＵ１１が単独で使用するローカルバスインタフェース１２を接続する。そして、ＣＰＵ１１は、ローカルバスインタフェース１２およびローカルバス１４を経由して、ローカルメモリ１３とデータ授受を行うことができるように構成されている。
【００２１】
ローカルメモリ１３は、ＣＰＵ１１が単独で使用するものである。ローカルメモリ１３が、ＣＰＵ１１が単独で使用するものであるため、ローカルバス１４も、ＣＰＵ１１が単独で使用するバスとされている。ＣＰＵ１１が単独で使用するローカルメモリ１３とのデータの授受を制御するために、ローカルバスインタフェース１２が設けられている。
【００２２】
ローカルメモリ１３に対して、ＣＰＵ１１も使用するが、ＣＰＵ１１以外のデバイスも、必要に応じて用いるのが、レジスタ１６やメモリ１７である。標準メモリバス１５には、ＣＰＵ１１以外のデバイスが必要とするデータも送受信されるが、ローカルバス１４には、ＣＰＵ１１が必要とするデータのみが送受信される。なお、ローカルバス１４においては、例えば、３２ビットでデータの授受が行われる。
【００２３】
ＣＰＵ１１とローカルメモリ１３との間でデータの授受が行われる際の動作について、図２のフローチャートを参照して説明する。ステップＳ１１において、ＣＰＵ１１は、命令を発行する。この発行された命令は、ローカルバスインタフェース１２に受信される。ローカルバスインタフェース１２は、ステップＳ１２において、受信した命令を解析し、処理対象となっているデータが格納されているまたは格納するアドレスのデータ、処理対象となるデータ、読み出し信号（read信号）又は書き込み信号（Write信号）などに変換する。
【００２４】
変換されたそれらのデータや信号は、ステップＳ１３において、ローカルバスインタフェース１２からローカルメモリ１３に対して転送される。ローカルメモリ１３は、ステップＳ１４において、受信したデータに対応する処理として、応答信号を生成する。その応答信号は、ローカルバスインタフェース１２に対して送信（出力）される。
【００２５】
ローカルバスインタフェース１２は、ステップＳ１５において、ローカルメモリ１３からの応答信号を、ＣＰＵ１１が処理できるデータに変換し、出力する。ＣＰＵ１１は、ステップＳ１６において、ローカルバスインタフェース１２からのデータを受信し、その受信したデータに基づく処理を開始する。
【００２６】
このようにして、ＣＰＵ１１とローカルメモリ１３との間で、データの授受（命令の発行と、その命令に対応する処理）が行われる。このＣＰＵ１１とローカルメモリ１３との間でのデータの授受、すなわち、ローカルバス１４を介してデータの授受が行われる場合と、標準メモリバス１５を介して、例えば、ＣＰＵ１１とメモリ１７との間でデータの授受が行なわれる場合とを比較する。
【００２７】
ローカルバス１４におけるデータの授受と、標準メモリバス１５におけるデータの授受との違いは、標準メモリバス１５を介するデータの授受は、ＣＰＵ１１が要求信号（命令）を出し、その信号がレジスタ１６やメモリ１７等の記憶装置に到達した後、ＣＰＵ１１は、記憶装置が応答信号として出したデータの受信を待つという動作が行われる。そのため、ＣＰＵ１１は、要求信号を出した後から応答信号を受信するまでの間、他の動作を行うことができない。
【００２８】
これに対し、ローカルバス１４におけるデータ授受のプロトコル（後述）は、応答信号の１つであるAcknowledgeを持たないレイテンシが固定長であるため、ＣＰＵ１１は、命令を出した後の応答信号を待つ必要がない。従って、要求信号を出した後、他の動作を行うことが可能である。例えば、要求信号を連続して発行するといったことが可能となる。このように、ローカルバス１４におけるデータの授受においては、ＣＰＵ１１は、信号を発行した直後に別の動作を行うことができ、効率的な動作ができるようになる。
【００２９】
ステップＳ１１において、ＣＰＵ１１が、ローカルバスインタフェース１２とローカルバス１４を介して、ローカルメモリ１３に対して発行する命令は、ＣＰＵ１１に通常備わっている命令セットを拡張し、ローカルメモリ１３を用いるために（ローカルバス１４を用いるために）独自に用意したコプロセッサ命令の様式である。
【００３０】
独自に用意したコプロセッサ命令の様式を用いることによって、レジスタ１６やメモリ１７といったＣＰＵ１１以外のデバイスもアクセスする記憶装置にアクセスする場合には標準の命令を使用し、ＣＰＵ１１専用に用意したローカルメモリ１３にアクセスする場合には、独自に用意したコプロセッサ命令の様式の命令を使用するといった、用いるバスによる区別を行うことが可能となる。ただし、命令の形式は、通常の命令と同形態になっているため、ＤＭＡコントローラへの設定のような特別な動作は不要である。
【００３１】
このようなローカルバス１４を用いたデータの授受を行うための拡張命令について説明する。図３に、ローカルバス１４でデータを授受するための拡張命令であるＬＤＬ(Load-Data-Local)とＳＤＬ(Store-Data-Local)の命令のデータ構成を示す。ＬＤＬは、ローカルメモリ１３からＣＰＵ１１へのread命令（リード命令）であり、ＳＤＬは、逆にＣＰＵ１１からローカルメモリ１３へのwrite命令（ライト命令）である。
【００３２】
命令は、例えば、３２ビット長で構成され、ＣＰＵ１１が備えている他の命令（標準メモリバス１５に出力する命令）と同様な構成で構成される。３１乃至２６ビットがオペコード、２０乃至１６ビットがターゲットレジスタであり、この部分に関しては、ＣＰＵ１１が備えている他の命令と同様に構成されている。
【００３３】
ローカルバス１４用の拡張命令は、オフセット値を持たない構成とされている。オフセット値を持たせないことにより、アドレス計算を行わないにすることができる。ローカルバス１４用の拡張命令は、オフセット値を持たない代わりに、即値（Immediate）またはポインタ（Pointer）を用いてローカルメモリ１３にアクセスするように構成されている。
【００３４】
図４に示したように、２５乃至２１ビットのLmodeのデータにより、即値を用いるかポインタを用いるかが指示されるように構成されている。同時に、即値が選ばれた場合（Lmodeが＄０乃至＄７の場合）には、その即値をポインタに代入するのか否か、ポインタが選ばれた場合（Lmodeが＄８乃至＄１３の場合）には、その値を増減するのか否かが選択できるようになっている。
【００３５】
例えば、アセンブラで、
LDL $10,256($0) は、
CPU.Reg[10]=LocalMem[256]; Pointer[0]=256; を意味する。
【００３６】
また，アドレス 0x4002 に Pointer[2] を割り当てると、
SDL $12,0x4002($12) は、
LocalMem[Pointer[2]]=CPU.Reg[12]; Pointer[2]--; を意味する。
【００３７】
また、各ポインタには、１対１で対応したマスクが用意されている。その一例を図５に示す。このようなマスクを用意することにより、ポインタを用いてアドレスを送出する際、所望の下位ビットを残し、それより上位のビットを変化させことなくアドレスの増減を実行することが可能となる。
【００３８】
次に、ローカルバス１４のプロトコルについて、図６乃至図９のタイミングチャートを参照して説明する。図６乃至図９において、１行目（ＨＣＬＫ）は、ＣＰＵ１１の動作クロックの信号の波形を示し、２行目（ＬＤＬまたはＳＤＬ）は、パイプラインステージを示し、３行目（enable）は、読み出し（read）または書き込み（write）のイネーブル信号を示し、４行目（address）は、読み出しまたは書き込みを行う対象となるアドレス（そのアドレスがローカルバス１４上を通るタイミング）を示し、５行目（H：read，L：write）は、読み出しまたは書き込みを指示する信号を示し、６行目（read data[31:0]）は、読み出しまたは書き込みの対象とされたデータ（そのデータがローカルバス１４上を通るタイミング）を示している。
【００３９】
図６乃至図９における２行目のパイプラインステージにおいて、ＩＦは、Instruction Fetch、ＲＦは、Register Fetch、ＥＸは、Execution、ＤＦは、Data Fetch、ＷＢは、Write Backを、それぞれ示す。また、図７の４行目におけるＲＡは、Read Addressを示し、６行目のＲＤは、Read Dataを示す。同様に、図８の４行目におけるＷＡは、Write Addressを示し、６行目のＷＤは、Write Dataを示す。
【００４０】
図６は、ＣＰＵ１１からローカルメモリ１３へのread命令が出される時のものであり、２行目にその命令であるＬＤＬ命令のパイプラインステージを示している。標準メモリバス１５（通常のバス）に対しては、命令に含まれているレジスタの内容を読み出し、オフセットを加えるなどの処理をした後に、４番目のステージであるＤＦで要求が出される。
【００４１】
それに対しローカルバス１４に対しては、命令中にアドレスの即値しか含まれていないため、２番目のステージであるＲＦで要求（読み出すデータのアドレスなど）を出すことができる。そのため、ＬＤＬ命令の後にＬＤＬ命令を続けて出してもインターロックが起きるようなことがなく、遅延のない連続したデータ転送が可能となる。このような連続的にＬＤＬ命令が出されたときの状態を、図７に示す。
【００４２】
図８は、ＣＰＵ１１からローカルメモリ１３へのwrite命令が出される時のものであり、２行目にその命令であるＳＤＬ命令のパイプラインステージを示している。ＳＤＬはＬＤＬと異なり、５番目のステージであるＷＢで要求を出し、そのステージで書き込みが完了する。要求をもっと早い段階で出さない理由は、４番目のステージであるＤＦにおいて割り込みが発生する可能性があるからである。このような場合も、ＬＤＬと同様に、ＳＤＬ命令を連続して出しても、インターロックが起きることなくデータの転送を行うことができる。このような連続的にＳＤＬ命令が出されたときの状態を、図９に示す。
【００４３】
このようなローカルバス１４を用いることにより、ＣＰＵ１１の高効率の動作が可能になるが、このローカルバス１４を通るデータ等は、通常のデバッガでは検知することができない。一般的なデバッガは、標準メモリバス１５を通してメモリ１７などの記憶装置にアクセスし、デバッグ信号は、コプロセッサポート２１を使用しないためである。
【００４４】
このような問題を解決するために、標準メモリバス１５とローカルバスインタフェース１２をつなぐバイパスを用意する。このデバッグ用バイパス経由で、ローカルメモリ１２または図示していないローカルレジスタといった記憶装置とのデータ転送を行うことができ、その結果としてデバッグも可能となる。
【００４５】
ＣＰＵ１１とメモリなどの記憶装置間におけるデータ転送は、従来では、要求信号に対する応答信号を待たなければ次の信号を発行することができず（次の動作にうつることができず）、記憶装置にアクセスし、データが転送されるまでの数サイクルの待ち時間が必要であった。
【００４６】
しかしながら、上述した本実施の形態を適用することにより、ＣＰＵ１１に、コプロセッサの代わりにローカルバス１４を用いることで、ＣＰＵ内のキャッシュにアクセスするのと同程度の速度でのデータ転送が可能となり、高効率にデータの転送を行うことが可能となり、高効率にＣＰＵ１１が動作することが可能となる。このため、例えば、オーディオ信号やビデオ信号のリアルタイムエンコーダとして、図１に示したような情報処理装置１０を含む装置を使用すれば、大きな影響を与え、特別な処理装置を別途用意することなく、高速な動作が可能となる。
【００４７】
なお、本明細書において、媒体により提供されるプログラムを記述するステップは、記載された順序に従って、時系列的に行われる処理は勿論、必ずしも時系列的に処理されなくとも、並列的あるいは個別に実行される処理をも含むものである。
【００４８】
【発明の効果】
本発明の情報処理装置によれば、ＣＰＵと記憶装置間におけるデータの転送を行うことができる。
【００４９】
また、本発明の情報処理装置によれば、ＣＰＵと記憶装置間におけるデータの転送を、より高効率に行うことができ、もって、データ処理速度を向上させることができる。
【図面の簡単な説明】
【図１】本発明を適用した情報処理装置の一実施の形態の構成を示す図である。
【図２】情報処理装置の動作について説明するためのフローチャートである。
【図３】拡張命令について説明するための図である。
【図４】拡張命令について説明するための図である。
【図５】マスクについて説明するための図である。
【図６】ローカルバスプロトコルについて説明するための図である。
【図７】ローカルバスプロトコルについて説明するための図である。
【図８】ローカルバスプロトコルについて説明するための図である。
【図９】ローカルバスプロトコルについて説明するための図である。
【符号の説明】
１０情報処理装置，１１ＣＰＵ，１２ローカルバスインタフェース，１３ローカルメモリ，１４ローカルバス，１５標準メモリバス，１６レジスタ，１７メモリ[0001]
The present invention relates to an information processing apparatus, particularly relates to an information processing apparatus for improving the processing speed performed by the CPU.
[0002]
[Prior art]
In recent years, personal computers have become widespread, and the processing capability (processing speed) of a CPU (Central Processing Unit) which is the brain of the personal computer is being improved. In order to improve the processing capability of the CPU, a coprocessor is provided in addition to the CPU. (For example, see Patent Documents 1 to 5)
[0003]
[Patent Document 1]
Japanese Patent Laid-Open No. 11-73314 [Patent Document 2]
JP 9-325759 A [Patent Document 3]
JP-A-9-69047 [Patent Document 4]
Patent No. 2849189 [Patent Document 5]
Japanese Patent No. 2990896 [0004]
[Problems to be solved by the invention]
When the CPU executes an operation, it is necessary to access necessary data and instructions by accessing the storage device. Examples of the storage device include a cache provided in the CPU, a memory and a register provided outside the CPU, and the like. When the CPU accesses the cache in the CPU, the data transfer speed between the CPU and the cache is faster than the data transfer speed between the CPU and the memory. When the CPU accesses a register or memory outside the CPU, the access speed (data transfer speed) is via the bus connecting the CPU and the memory (register). take time.
[0005]
This is because the CPU sends the data request signal and the address where the target data is stored to the memory or register, and sends the next request signal until the data is transferred from the register or memory as a response. This is because it cannot be done.
[0006]
In addition to the CPU, there is a device that uses the bus to transfer data, etc., so that the bus can be used due to the bus becoming busy, etc. Wait time may be required. For this reason, when the amount of computation is very large, the processing speed becomes slow, and when real-time processing is required, it becomes impossible to answer the request, and the current CPU performance is insufficient. there were.
[0007]
As one of countermeasures against this problem, for example, there is DMA (Direct Memory Access). This DMA performs data transfer between memories instead of the CPU, and enables high-speed data transfer without going through the CPU. However, in order to transfer the data transfer to the DMA, the CPU needs to set an address, a transfer size, and the like in the DMA controller, and the above-described problem may occur during the process for the setting. There was sex.
[0008]
The present invention has been made in view of such a situation, and an object thereof is to improve the processing speed of a CPU.
[0009]
[Means for Solving the Problems]
The information processing apparatus of the present invention connects a first storage device accessed only by a CPU, a second storage device accessed by a device different from the CPU and the CPU, and the CPU and the first storage device. And a second bus that connects the CPU and the second storage device, and is used between the CPU and the first storage device via the first bus. The instruction set for the first bus has a fixed length, does not have an Acknowledge as one of response signals, does not have an offset value, and uses an immediate value or a pointer to the first storage device. When accessing the first storage device using the pointer, a predetermined value out of values stored in the pointer of the instruction set for the first bus is used. Masked upper bits are to increase or decrease only the lower bits other than the predetermined upper bits.
[0010]
The instruction set for the first bus may have the same format as the instruction set for the second bus used between the CPU and the second storage device via the second bus. it can.
[0011]
A third bus that connects the first storage device and the second bus may be further provided.
[0012]
The first bus may be connected to a coprocessor port of the CPU.
[0016]
In the information processing apparatus of the present invention, the CPU and the memory dedicated to the CPU are connected by a dedicated bus, and the instruction set in the bus has a fixed latency and does not have an Acknowledge. In executing the processing, it is possible to issue instructions continuously, thereby improving the processing capability of the CPU. In addition, the instruction set has no offset value, and the memory is accessed using an immediate value or a pointer. When accessing using a pointer, a predetermined value out of the values stored in the pointer of the instruction set is used. The upper bits are masked so that only lower bits other than the predetermined upper bits are increased or decreased .
[0017]
DETAILED DESCRIPTION OF THE INVENTION
Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 shows the configuration of an embodiment of an information processing apparatus to which the present invention is applied. The information processing apparatus 10 illustrated in FIG. 1 is incorporated in, for example, a personal computer. The CPU 11 and the local memory 13 are connected by a local bus interface 12 and a local bus 14 so that data can be exchanged. The CPU 11 and the local bus interface 12 are also connected to the register 16 and the memory 17 by a standard memory bus 15.
[0018]
The local bus interface 12 is connected to the coprocessor port 21 of the CPU 21.
[0019]
Here, the coprocessor port 21 will be described. The coprocessor port 21 is originally provided to connect to the coprocessor. The coprocessor is a processor that complements the CPU 11 and enhances the performance. A typical example is a coprocessor that performs floating-point arithmetic.
[0020]
In this embodiment, a local bus interface 12 that is used independently by the CPU 11 is connected to the coprocessor port 21 provided in the CPU 11 for connecting the coprocessor. The CPU 11 is configured to exchange data with the local memory 13 via the local bus interface 12 and the local bus 14.
[0021]
The local memory 13 is used by the CPU 11 alone. Since the local memory 13 is used by the CPU 11 alone, the local bus 14 is also a bus used by the CPU 11 alone. A local bus interface 12 is provided to control data exchange with the local memory 13 that the CPU 11 uses independently.
[0022]
Although the CPU 11 is used for the local memory 13, the registers 16 and the memory 17 are also used as necessary for devices other than the CPU 11. Data required by devices other than the CPU 11 is transmitted / received to / from the standard memory bus 15, but only data required by the CPU 11 is transmitted / received to / from the local bus 14. In the local bus 14, for example, data is exchanged with 32 bits.
[0023]
The operation when data is exchanged between the CPU 11 and the local memory 13 will be described with reference to the flowchart of FIG. In step S11, the CPU 11 issues a command. The issued instruction is received by the local bus interface 12. In step S12, the local bus interface 12 analyzes the received instruction and stores data to be processed or data at an address to be stored, data to be processed, a read signal (read signal) or a write Convert to signal (Write signal).
[0024]
The converted data and signals are transferred from the local bus interface 12 to the local memory 13 in step S13. In step S14, the local memory 13 generates a response signal as a process corresponding to the received data. The response signal is transmitted (output) to the local bus interface 12.
[0025]
In step S15, the local bus interface 12 converts the response signal from the local memory 13 into data that can be processed by the CPU 11, and outputs the data. In step S16, the CPU 11 receives data from the local bus interface 12, and starts processing based on the received data.
[0026]
In this manner, data exchange (issue of an instruction and processing corresponding to the instruction) is performed between the CPU 11 and the local memory 13. When data is exchanged between the CPU 11 and the local memory 13, that is, when data is exchanged via the local bus 14, for example, between the CPU 11 and the memory 17 via the standard memory bus 15. Compare with the case where data is exchanged.
[0027]
The difference between the exchange of data on the local bus 14 and the exchange of data on the standard memory bus 15 is that the exchange of data via the standard memory bus 15 is performed when the CPU 11 issues a request signal (command), and the signal is sent to the register 16 or the memory. After reaching the storage device 17 or the like, the CPU 11 performs an operation of waiting for reception of data output from the storage device as a response signal. Therefore, the CPU 11 cannot perform other operations until the response signal is received after issuing the request signal.
[0028]
On the other hand, the data transmission / reception protocol (described later) in the local bus 14 has a fixed latency with no Acknowledge as one of the response signals, so the CPU 11 needs to wait for the response signal after issuing the command. There is no. Therefore, it is possible to perform other operations after issuing the request signal. For example, request signals can be issued continuously. As described above, in the data exchange on the local bus 14, the CPU 11 can perform another operation immediately after issuing a signal, and can perform an efficient operation.
[0029]
In step S11, the instruction issued by the CPU 11 to the local memory 13 via the local bus interface 12 and the local bus 14 is to extend the instruction set normally provided in the CPU 11 and use the local memory 13 ( This is a coprocessor instruction format prepared uniquely (to use the local bus 14).
[0030]
By using a coprocessor instruction format prepared independently, a standard instruction is used when accessing a storage device such as the register 16 or the memory 17 that is accessed by devices other than the CPU 11, and the local memory 13 dedicated for the CPU 11 is used. When accessing, it is possible to make a distinction according to the bus to be used, such as using an instruction of a coprocessor instruction format prepared independently. However, since the instruction format is the same as that of a normal instruction, a special operation such as setting to the DMA controller is not necessary.
[0031]
An extension command for transferring data using the local bus 14 will be described. FIG. 3 shows the data structure of LDL (Load-Data-Local) and SDL (Store-Data-Local) instructions, which are extended instructions for exchanging data on the local bus 14. LDL is a read command (read command) from the local memory 13 to the CPU 11, and SDL is a write command (write command) from the CPU 11 to the local memory 13.
[0032]
The instruction is, for example, 32 bits long and has the same configuration as other instructions (instructions output to the standard memory bus 15) provided in the CPU 11. 31 to 26 bits are an opcode, and 20 to 16 bits are a target register, and this part is configured in the same manner as other instructions provided in the CPU 11.
[0033]
The extension instruction for the local bus 14 is configured not to have an offset value. By not having an offset value, it is possible to prevent address calculation. The extension instruction for the local bus 14 is configured to access the local memory 13 using an immediate value (Immediate) or a pointer (Pointer) instead of having an offset value.
[0034]
As shown in FIG. 4, the Lmode data of 25 to 21 bits indicates whether to use an immediate value or a pointer. At the same time, when an immediate value is selected (when Lmode is $ 0 to $ 7), whether or not the immediate value is assigned to the pointer, or when a pointer is selected (when Lmode is $ 8 to $ 13) The user can select whether to increase or decrease the value.
[0035]
For example, with an assembler
LDL $ 10,256 ($ 0) is
CPU.Reg [10] = LocalMem [256]; Pointer [0] = 256;
[0036]
If Pointer [2] is assigned to address 0x4002,
SDL $ 12,0x4002 ($ 12) is
LocalMem [Pointer [2]] = CPU.Reg [12]; Pointer [2]-;
[0037]
In addition, a mask corresponding to one-to-one is prepared for each pointer. An example is shown in FIG. By preparing such a mask, when sending an address using a pointer, it is possible to leave the desired lower bits and increase or decrease the addresses without changing the higher bits.
[0038]
Next, the protocol of the local bus 14 will be described with reference to the timing charts of FIGS. 6 to 9, the first line (HCLK) indicates the waveform of the operation clock signal of the CPU 11, the second line (LDL or SDL) indicates the pipeline stage, and the third line (enable) indicates A read (read) or write (write) enable signal is shown, and a fourth line (address) indicates an address to be read or written (a timing at which the address passes on the local bus 14). (H: read, L: write) indicates a signal instructing reading or writing, and the sixth line (read data [31: 0]) indicates data to be read or written (the data is a local bus). 14).
[0039]
6 to 9, IF indicates Instruction Fetch, RF indicates Register Fetch, EX indicates Execution, DF indicates Data Fetch, and WB indicates Write Back. Further, RA in the fourth line in FIG. 7 indicates Read Address, and RD in the sixth line indicates Read Data. Similarly, WA in the fourth line in FIG. 8 indicates Write Address, and WD in the sixth line indicates Write Data.
[0040]
FIG. 6 shows a case where a read instruction is issued from the CPU 11 to the local memory 13, and the pipeline stage of the LDL instruction which is the instruction is shown in the second line. For the standard memory bus 15 (ordinary bus), after reading the contents of the register included in the instruction and performing processing such as adding an offset, a request is issued in the fourth stage DF.
[0041]
On the other hand, since only the immediate value of the address is included in the instruction to the local bus 14, a request (address of data to be read, etc.) can be issued by the second stage RF. Therefore, even if the LDL instruction is issued after the LDL instruction, no interlock occurs, and continuous data transfer without delay is possible. FIG. 7 shows a state when such an LDL instruction is issued continuously.
[0042]
FIG. 8 shows a case where a write instruction is issued from the CPU 11 to the local memory 13, and the pipeline stage of the SDL instruction as the instruction is shown on the second line. Unlike SDL, SDL issues a request at WB, which is the fifth stage, and writing is completed at that stage. The reason why the request is not issued at an earlier stage is that an interrupt may occur in the fourth stage DF. In such a case, similarly to LDL, even if SDL instructions are issued continuously, data can be transferred without causing an interlock. FIG. 9 shows a state when such SDL instructions are issued continuously.
[0043]
By using such a local bus 14, the CPU 11 can operate with high efficiency. However, data or the like passing through the local bus 14 cannot be detected by a normal debugger. This is because a general debugger accesses a storage device such as the memory 17 through the standard memory bus 15 and the debug signal does not use the coprocessor port 21.
[0044]
In order to solve such a problem, a bypass connecting the standard memory bus 15 and the local bus interface 12 is prepared. Via this debugging bypass, data transfer with the storage device such as the local memory 12 or a local register (not shown) can be performed, and as a result, debugging is also possible.
[0045]
Conventionally, data transfer between the CPU 11 and a storage device such as a memory cannot issue the next signal (it cannot move to the next operation) without waiting for a response signal to the request signal. Several cycles of waiting time were required to access and transfer data.
[0046]
However, by applying the above-described embodiment, by using the local bus 14 instead of the coprocessor, the CPU 11 can transfer data at the same speed as accessing the cache in the CPU. Data can be transferred with high efficiency, and the CPU 11 can operate with high efficiency. For this reason, for example, if an apparatus including the information processing apparatus 10 as shown in FIG. 1 is used as a real-time encoder of an audio signal or a video signal, it has a great influence, without preparing a special processing apparatus separately. High speed operation is possible.
[0047]
In this specification, the steps for describing the program provided by the medium are performed in parallel or individually in accordance with the described order, as well as the processing performed in time series, not necessarily in time series. The process to be executed is also included.
[0048]
【The invention's effect】
According to the information processing apparatus of the present invention, data can be transferred between the CPU and the storage device.
[0049]
Further , according to the information processing apparatus of the present invention, data transfer between the CPU and the storage device can be performed with higher efficiency, and the data processing speed can be improved.
[Brief description of the drawings]
FIG. 1 is a diagram showing a configuration of an embodiment of an information processing apparatus to which the present invention is applied.
FIG. 2 is a flowchart for explaining the operation of the information processing apparatus.
FIG. 3 is a diagram for explaining an extension instruction;
FIG. 4 is a diagram for explaining an extension instruction;
FIG. 5 is a diagram for explaining a mask.
FIG. 6 is a diagram for explaining a local bus protocol;
FIG. 7 is a diagram for explaining a local bus protocol;
FIG. 8 is a diagram for explaining a local bus protocol;
FIG. 9 is a diagram for explaining a local bus protocol;
[Explanation of symbols]
10 information processing device, 11 CPU, 12 local bus interface, 13 local memory, 14 local bus, 15 standard memory bus, 16 registers, 17 memory

Claims

A first storage device accessed only by the CPU;
A second storage device accessed by a device different from the CPU and the CPU;
A first bus connecting the CPU and the first storage device;
A second bus for connecting the CPU and the second storage device;
The instruction set for the first bus used between the CPU and the first storage device via the first bus is:
Latency is fixed length, does not have Acknowledge as one of the response signals ,
Having no offset value and accessing the first storage device using an immediate value or a pointer;
When accessing the first storage device using the pointer, a mask is applied to a predetermined upper bit of the value stored in the pointer of the instruction set for the first bus, and the predetermined storage An information processing apparatus configured to increase or decrease only lower bits other than upper bits .

The instruction set for the first bus has the same format as the instruction set for the second bus used between the CPU and the second storage device via the second bus. The information processing apparatus according to 1.

The information processing apparatus according to claim 1, further comprising a third bus that connects the first storage device and the second bus.

The information processing apparatus according to claim 1, wherein the first bus is connected to a coprocessor port of the CPU.